I think it is fair to say that Data Lifecycle Management (DLM) on different storage tiers is something that pretty much every business needs to consider in some shape or form. Or is it? New storage vendors in the market claim that a tiered storage system is a thing of the past. Is this the answer to everyone’s prayers? For certain use cases, yes, this is possible, providing the cost of such systems stack up, but for most clients’ use cases, having a single-tiered storage platform is not a cost-effective or viable solution.
One example of this is companies with multiple locations, architects, for example, who need to collaborate on files between the different locations and need distributed access to their data. Previously, companies would have to have multiple copies of their data at each site, which replicates between the other sites, typically on fast expensive storage systems, which need to be backed up somehow. In today’s world, with many companies now onboard with using cloud infrastructure, they can take advantage of simplified tiered storage solutions that can use a resilient centralised, slower, and cheaper cloud object storage tier, where all data gets synced to, then with smaller physical or virtual cache filers at each office location to provide fast file access to users. File data at each site automatically gets backed up and protected to the centralised object storage tier, where other users at different office locations can then access and collaborate. In this instance, a tiered storage solution works out to be more cost-efficient than traditional local siloed storage systems as the amount of on-premise infrastructure is massively reduced, and the replication/duplication of data that goes with a singular tier reduces the overall data volume.
Another example is one of our genomics clients, with over 85 Petabytes of genome data storage. The research conducted requires frequent access to the entire data set and must allow researchers to query the data in a highly randomised fashion. Therefore, data must be stored in a single global-namespace storage system that is highly durable and resilient, with highly performant access for their user’s compute requirements, and of course, as cost-effective as possible. We implemented a two-tier architecture with around 5% of the data set delivered on high-performance NVMe-based flash storage to support the working data sets, which automatically tiers to a much larger (85 Petabytes) secondary tier based on object storage. The object storage platform distributes data across three geographical sites to provide the highest level of data protection while minimising the storage overheads of needing to keep the data durable.
In both scenarios, the client needs a long-term data backup and archive repository for their data to be held for many years most cost-effectively.
These are just two examples where a single storage tier simply isn’t a viable approach. And in the genomics client example, if you did entertain a single-tier solution, they would have needed to double up their 85 Petabyte of genome storage to keep a full copy at a second site. If that was the case, does site to site replication count as still being a single-tier? The second site copy would potentially be less performant or even on tape.
I’m sure we’d all rather have one singular storage tier to cover every eventually in an ideal world. Some people may be lucky enough to accomplish that, but, in many cases, once you understand the data that you’re holding, the economics of a single infrastructure when you add in the data protection aspect may still not stack up.
Wondering what data storage strategy is right for your organisation? We could be of help: