With Windows Server 2012, Microsoft entered the world of deduplication. With Server 2012 fully supported by Veeam Backup&Recovery it's time to make some tests whether the dedup function really is a good thing or rather keep it aside and still use the dedup and compression features by the backup software itself.
I saw some nice tiny tests with Veeam and Server 2012 dedup and saw astonishing dedup ratios reducing a 250GB Veeam backup file to only 7GB! This is great but I want to see it myself in an environment I built up myself and where I can run some real-world test szenarios I will probably see at customer's site.
So here is my test setup:
To have compareable client systems to be backed up I created a new VM (johnnybravo) with two vDisks (20GB for OS, 100GB for data, running on a Datacore SSY-V system over 1GBit iSCSI) running some file services on it. After I put some test data on the system I cloned this VM (johnnybravo2) and started the clone without network connection. Additional data was transferred during the tests by using ISO images mounted to both VMs and copied the same data to the VM's disks.
My Veeam server is a VM running Windows Server 2012 with latest patches. The VM has 4 vCPUs, 12GB reserved vRAM and the backup disks coming from an exclusively used 3PAR StoreServ7200 system with 12x300GB SAS 15k disks in RAID1 config. So storage shouldn't be the limiting factor.
One of the two backup disks is configured to use MS dedup with manual optimization schedule and a setting to dedup all files older than 0 days. This will include every new file stored on the backup disk.
The second backup disk is a non deduped standard disk to store all the backups, that are compressed/deduped by Veeam. Both disks are 300GB in size.
All VMs are running on two 8-core vSphere 5.1 servers with 32GB of RAM. During the tests I shutdown most of the other test VMs to have no resource contemption.
So first of all I had to create two new Veeam backup jobs. Job1 is called "deduped", the VM to be backed up is "johnnybravo", we have dedup-friendly compression and dedup turned off. Backup type is incremental and application-aware processing is enabled. Incremental backups run every single hour.
Job2 is called "nondeduped", the VM to be backed up is "johnnybravo2", compression is optimal, dedup is turned on. Backup type, application-aware processing and schedule is the same as with job1.
Let's start with the first full backups.
Even with compression set to dedup-friendly and dedup turned off, we still see a high CPU load on the backup proxy.
Job1 finished after ~11 minutes with a disk footprint of ~19GB
This is nearly the same size as all the data on the VM has in sum. So there is nearly no compression done or the data is not suitable for compression. We will see that a bit later.
Let's start the second job.....
Quiet the same picture, roughly 100% CPU load but this time we have Veeam's compression turned on and dedup is also done by Veeam.
Nearly the same time spent for the backup job, the same amount of data and the disk footprint:
Wow, it seems that the data is quite suitable for compression, because data is reduced by 7GB.
I copied some more test data (~6GB) on both VMs and waited another hour to get the first incremental backup resulting in two backup files on each backup disk. The incremental backup file on the deduped store has a footprint of 6200MB, the Veeam compressed file on the non-deduped backup store has ~5800MB in size. This time the test data is not that compressable.
Now it's time to send Windows dedup in the ring. I manually started the optimization on the dedup store "Start-DedupJob -volume e: -type optimization". The process responsible for deduplication is "fsdmhost.exe" and I set it to run with normal priority and therefore use resources as much as it needs.
As you can see the dedup process only took ~30% of CPU and nearly no additional RAM. It took ~1h to dedup the two data files with ~25GB in sum. This is surely optimizeable because MS says, the dedup process is capable of ~800GB/h.
Nevertheless, after the dedup finished I saw a very good dedup ratio
So after the first two backup jobs we had a better space saving by using MS dedup (~17,1GB) compared to Veeam's compression and deduplication (18,1GB) but we have to keep in mind that the dedup is separate process that needs CPU cycles and time whereas the Veeam compression is inline.
Let's give the backup jobs some more time to create additional incrementals. Two days later....
We now have 46 restore points on each backup disk consuming 18,5GB on the nondeduped store and 30,8GB on the deduped store.
Well, it seems that the MS dedup isn't that efficient as it seemed to be during the first two backups.... but WAIT, this is only the view in Windows Explorer. The amount of data shown here is in un-deduped state.
To be sure to have all data deduped (remember, I set the dedup schedule to manual), I restarted the optimization job. Here is the result
A bit better to read.....
This is astonishing. MS dedup outperforms Veeam compression and deduplication regarding to used space on the backup storage.
Since I only have a single VM with a few changes only, this result will probably even better with more VMs. And, of course, the more the VMs look the same, the more space you can save by deduplication.
To prove I created a new backup job containing 8 VMs with different OS (Linux, Windows) and different file structure. The job transferred 77,3GB which converts to a ~78GB backup file on my dedup store. Running another optimization job and check status with "get-dedupVolume"....
Great result! Jumping from formerly 15GB SavedSpace to 64GB with a single 78GB backup file.
So my conclusion for part I is that as long as you only look at the space saving MS dedup is a great combination with Veeam. In part II we will have a look at the problems data dedup can have on your backups/restores with Veeam.