In a previous blog, I highlighted the growing challenge of managing unstructured data and the difficulty organisations face in achieving a single, actionable view for decision-making. I suggested that technical metadata could be the answer to start building the picture with achieving this goal.
Recently, I’ve been having more frequent conversations with organisations that are struggling to drive valuable outcomes for their frameworks and initiatives when dealing with unstructured data.
As we all know, not all data is equal. Therefore, it’s essential to focus attention and resources on data that is either critical, strategic, or shared (i.e. valuable to an organisation’s purpose) – but how can we accurately determine which data falls into these categories, and how can tooling make this information more accessible for unstructured data?
Background: Turning Vision into Action
I was recently part of a conversation with an ambitious organisation about gaining a broader ‘grip’ on their data – identifying unstructured data as the biggest challenge. They had initiatives aimed at improving data hygiene, addressing data disposition, and advancing their Gen AI ambitions and were struggling to get clear and actionable insights from their unstructured data pools to help drive these forward. Their previous data management efforts had been driven by a specific regulatory compliance outcome, the information within their existing tooling was neither easily consumable nor suited to their unstructured data needs.
However, these efforts did succeed in providing a clear picture of their platforms, systems, ownership, and overall sense of purpose – a great place to start. The real nub of the issue was, how could they determine if the unstructured data they hold is valuable and how could they get their teams to efficiently start validating
The Solution
While technology is just one piece of the puzzle – alongside people, processes, and a clear sense of purpose – I’ve found that to help build a picture of value for unstructured data, technology is an essential component to help with speed, scale & efficiency of gaining this judgement. Most importantly of all, output from technology should be easily understandable for humans with minimal effort required to interpret. There are varying tools, techniques, and approaches to start to understand, consume, and generate value from your data better. However, much of the tooling I see in the Data Management space is more effective in handling structured data sets than with unstructured data sets when it comes to the discovery process of uncovering hidden value.
Starting with technical data discovery is an effective way to compound your understanding of what data you have, but also a way to identify potential value. For unstructured data, I recommend beginning in a simpler manner with best-of-breed capability focused on rapid discovery, using data points from existing technical metadata that users can easily interpret. This approach helps accelerate time to value and reduces the amount of unstructured data that remains ‘dark’ within an organisation.
Unstructured data inherently comes with technical metadata such as timestamp, access, type, size, owner, and obviously a file has a name associated with it. All of this is information is understandable by humans if presented correctly, so why not use it?
By using tooling that can rapidly create a single index of these vast pools of data, incorporates search and tagging, and allows for historic views, data owners can gain insights more efficiently and cost-effectively. This consolidated view can then feed into other techniques, such as introspection (content inspection of files, which is slower and typically more costly for unstructured data). In turn, this can drive more valuable outcomes for data use cases like cataloguing, warehousing, analytics, AI, ESG, and data disposition – or it can reveal that the data is indeed Redundant, Obsolete, or Trivial (ROT), aiding IT in day-to-day management tasks.
This can then be adjusted to meet changing needs and re-run to demonstrate progress, becoming a useful mechanism to bring light and value to your unstructured data.
Real Life Context
One of our customers successfully advanced their data disposition initiatives by using this approach – quickly profiling their unstructured data pools at a domain level. This provided data owners with crucial information on age, access, and profile. The ability to create custom queries and dashboards enabled the organisation to focus on key areas, identifying data valuable to their disposition efforts- namely, completing the removal and purging of certain data types which was carried out automatically by tooling already in place. An additional benefit that was uncovered was the ability to report on file names, such as those containing terms like “pay” or “salary,” which helped identify and rectify the storage of sensitive information in violation of company policy.
In summary, I believe that this capability can help peel back the covers on unstructured data, bringing it into the light and helping organisations gain a grip with their data initiatives. Helping to understand if there is any value hidden in their vast unstructured data stores.
At Nephos, we combine technical expertise and the strategic business value of traditional professional service providers to deliver innovative data solutions. Whether you’re looking to improve data hygiene, advance your AI ambitions, or ensure regulatory compliance, our data discovery and classification service helps you uncover hidden value, streamline data management processes, and drive valuable outcomes for your initiatives.