Data deduplication efforts have traditionally focused on backups and archives, but new unstructured data types and the continuing surge in data volumes are straining hardware and other resources. By applying deduplication when the data elements are created, you can improve storage efficiency, although you risk slowing the performance of your applications.
The more storage options your company uses, the more likely you're paying to store duplicate copies of text files, media, and other resources. This makes data deduplication more important than ever for efficient storage management. At the same time, finding and removing duplicate data has never been more challenging.
Deduplication has long been the province of data backups and archives. But storage executive Arun Taneja recommends deduping data when it is created. Taneja is quoted in a October 2014 TechTarget article written by Alex Barrett.
Organizations may hesitate to run deduplication on their primary data, but the advantages in terms of space savings and processor efficiency make the practice worthwhile. For example, text files and virtual desktop environments can realize 40:1 deduplication rates, although 6:1 deduplication rates are considered a good average by storage vendors.
On the other end of the spectrum, encrypted backups and video files aren't amenable to hash-based in-line deduplication. And compressed files in general dedupe poorly or not at all.
In-line deduplication puts efficient storage front and center
Running deduplication on primary data introduces the possibility that applications will be made to wait as data is written or read. If application performance is paramount, deduplication is shifted to post-processing. This maximizes processor cycles, but it requires more primary storage, which negates one of the principal advantages of deduplication: storage savings.
In a September 2014 TechTarget article, Brien Posey offers two reasons why in-line, software-based deduplication is worth the possible performance hit: first, your existing hardware likely can handle the added processing load without any noticeable degradation; and second, any performance reduction that results may be worth the savings in other areas, particularly in improved data transmission rates.
Today's deduplication challenges will only become more thorny as new types of data -- and much more of it -- find their way into your business's day-to-day operations. In an April 24, 2014, post on Wired, Sepaton executive Amy Lipton points out that efficient data management now requires scalability beyond simply buying more hardware to expand existing data silos.