Removing unnecessary data is becoming increasingly worthwhile, whether to mitigate the risks of sensitive information falling into the wrong hands or to cut storage costs. This isn’t a new challenge. Many organisations have attempted to trim their data, but I am yet to encounter one that has truly mastered it.
With the growing emphasis on and maturity of data governance, we’re now better positioned to tackle the core issue that has long made this so difficult: viewing it solely as an IT problem.
Understanding the challenge
In my experience, it’s entirely possible to analyse your data to understand what’s there, assess its sensitivity, and even identify what may no longer be needed. However, what often seems to be missing is a clear mechanism to make the ultimate decision: “Yes, delete this.”
Historically, this is because data deletion has been treated as an IT project. IT teams often lack the context to fully appreciate what the data represents, the impact of deleting it, and whether it’s still in use – it’s not their data, after all.
Even when decision-making authority is placed with the right people (those who understand the data’s value and can confidently determine it’s no longer needed), technical complexities can still create a barrier. If the details are presented in a way that’s hard to grasp, uncertainty can lead to the default choice of, “Not sure, so leave it.”
Governance is good
Without diving into a lengthy debate about what data governance is or isn’t, the principles of effective governance apply here as it does everywhere. With a solid understanding of your environment and a good dose of common sense, you can ensure your efforts are well-directed and inefficiencies are minimised.
When it comes to data disposition, this means:
- Decision-making rests with the right people – those who create, understand, and use the data are best placed to decide what should be kept or removed.
- A clear mechanism exists for identifying candidates for removal, making it easier to search and act on unnecessary data.
Where to start
As always, it comes down to return on investment – where you can achieve the biggest impact quickly with minimal effort.
Impact
The impact depends on your objective. If your goal is to reduce your data footprint and cut IT costs, focus on high-volume data stored on the most expensive infrastructure. If the priority is mitigating the risk of sensitive data leaks, target areas where the most sensitive data is most likely to be exposed.
Effectiveness
Start small and build gradually – whether you prefer ‘low-hanging fruit’ or ‘don’t boil the ocean,’ the idea remains the same. When identifying categories of data for removal, consider these factors:
- Ease of execution: Simple criteria, like categorising files by type or age, are straightforward to implement. In contrast, processing files to detect specific words, people, or content (e.g., in text, images, or audio) can be computationally intensive.
- Return on investment: A policy that removes just a handful of files may not justify the effort, while one that clears terabytes of sensitive data clearly delivers greater value. However, overly broad policies can introduce uncertainty and caveats, making it harder to gain approval for deletion.
To avoid these pitfalls, focus on subsets of data you can confidently act on. For example, instead of tackling all data over 7 years old, where exceptions are likely, start with more specific subsets where you can delete more confidently, such as data over 15 years old or files of a specific type older than 7 years.
Chicken or egg?
The logical approach to a project like this is to define a policy for a specific category of data to remove, identify where that data exists, and then delete it.
While this makes sense in theory, technological constraints often get in the way. Can you accurately classify which data fits the criteria? Some data is far easier to identify than others, and accuracy can vary. Factoring these limitations into the policy-making process ensures that the policies you define are actually actionable.
Starting from scratch can also be challenging. Instead, gaining an initial understanding of your data can spark conversations about good removal candidates and help test hypotheses about which policies are likely to deliver meaningful results.
Safety blankets
Data deletion often feels less risky because of the safety net of backups – most data you consider deleting is likely backed up, perhaps multiple times. We tend to hold onto backups “just in case,” often for years. If second thoughts arise after deleting data, chances are it’s still in last night’s backup.
For added assurance, consider quarantining candidate data instead of deleting it immediately. This provides a buffer period where you can monitor for any requests or concerns, allowing for a quick recovery if needed.
Keep the momentum
Data deletion should be a continuous, iterative process – not a one-off initiative. Start with quick wins: focus on policies targeting data that’s easy to identify and simple to justify removing. Over time, you can move from using an axe to a scalpel, carefully addressing more nuanced categories of unnecessary data.
This shouldn’t be treated as a finite project with a clear start and end. New redundant or risky data will inevitably accumulate, making it essential to introduce ongoing controls. By addressing policy violations as they occur, you can prevent the need for another large-scale clean-up down the line.
At Nephos, we combine technical expertise and the strategic business value of traditional professional service providers to deliver innovative data solutions. With our automated data retention service, we make data deletion simple, secure, and stress-free – helping you stay in control while unlocking real value.