The Deduplication Swindle

The Deduplication Swindle?
Go back to 2009 and NetApp announced it was about to buy the rising star of Deduplication market Data Domain. The announcement was premature and EMC eventually pulled the rug from under their feet and ended up winning a hard fought battle consuming Data Domain into their Data Protection offering.
No doubt it has been a smart move for EMC, the battle was hard fought for a reason, Deduplication endorsed the EMC mantra of no tape and from published information it seems that Data Domain is a successful and growing revenue stream.
So if there is a swindle, where is it? Based on nothing more than anecdotal conversations with users of various deduplication technology, we see the problem tied up in the dark art of deduplication sizing.
When you look at the raw $ per raw disk capacity that is paid for a deduplication the price is huge. Of course if you understand deduplication its not about buying raw capacity its about compression ratios as they apply to backups.
The promise of deduplication is to store months or years worth of backup jobs on disk with ratios as high as 70:1.
However the truth is that it seems no one can ever say with any accuracy how the ratios will actually pan out and THAT is the issue. The high cost of a deduplication appliance is usually justified by showing savings over long periods such as five years. Experts “size” your requirements using methods that do not seem to be shared or open and come back with a capacity you will need that will house your backup data for the next 3 years or 5 years.
However, at Data&Storage Asean we have come across numerous IT managers that have made a deduplication investment based on this somewhat “mystical” projection only to find that the 5 years capacity they though they bought turned out to run out of space after a much shorter period.
It seems that the deduplication vendors are selling a future that cannot be substantiated, but once you invest in deduplication it is difficult to break out. If capacity runs out earlier than you had been led to believe your only option is to buy more deduplicated capacity or let more tape creep back into your backup processes.
Note, we have not done any objective study into this claim and we cannot prove that any specific deduplication vendor is “over promising” when they sell the technology.  The suggestion that it may be happening is only based on anecdotal conversations.  So if any of the de-duplication vendors can give us a different perspective we will be very happy to review and potentially publish it. Likewise if you are an IT manager that has experienced implementing deduplication and would like to share that experience (good or bad), please do contact us.
Martin Lee
Contributor – Data&Storage Asean

share us your thought

1 Comment Log in or register to post comments's picture

Hi Martin, Sounds like you aren’t hearing enough from dedupe vendors. Actually, I’ve spoken to customers who would agree with you – planning five years out for deduplication capacity is hard. That’s why at Quantum we have adopted a “capacity on demand” approach to scalability for our DXi-Series offerings. By using a licensing key instead of adding more hardware to grow dedupe capacity, customers don’t need a crystal ball to predict their growth – they can simply scale as growth demands. I know, not every vendor takes this approach – I can think of one nameless dedupe sales leader that would regularly undersize implementations as a sales tactic. It’s also important to consider how proactive management tools can help with planning capacity to provide customers with greater predictability for their data protection requirements. These tools are increasingly becoming a “must have” for customers implementing deduplication. While deduplication addresses a wide range of issues, it isn’t right for every kind of data. For many data centers, unstructured content is growing in both size and value, and much of it – such as video - simply isn’t suited for dedupe. IT needs to start thinking in terms of different storage workflows – pulling unstructured content out of the backup workflow, and creating a new workflow that keeps content protected and accessible to business line owners - an active archive approach. New technologies and new strategies now make this approach more viable. We’re seeing our customers dramatically reduce storage costs AND get more value out of their data because content remains accessible for analysis, repurposing and remonetizing. For a more detailed exploration of some of these topics, I’d encourage readers to check out Eric Bassier’s recent article Where to Deduplicate, and How to Keep it Simple (