I've done several Veeam projects until now but I was never faced with the requirements the last project had.
The given situation was a backup environment where Veeam was used in combination with HP DataProtector. Not unusual as I configured those combinations myself during the last years to get the best both products can offer. But here, the customer asked to remove DP completely as the maintenance was running out and a renewal would be quite expensive. DP was used to backup physical systems and databases, most of them SQL systems but also some Oracle systems already virtualized under vSphere.
DP was also used to copy backup-to-disk data from itself as well as the Veeam repository data to tape to get some kind of long term backups. As the legal requirements forces the customer to keep backups for a long period of time (up to 5 years), tape was the only solution to keep up with the high amount of data.
I thought about the problem and came to the point where I was unable to completely remove DP from the scene but in my opinion this isn't neccessary at all. Veeam won't be able to backup all kind of systems the customer has but if we get as close as possible to 100% virtualization, it would reduce the importance of DP extremely. This way DP can be used for all applications and systems Veeam won't offer a perfect solution for and the number of those systems will probably quite low. So perhaps we can't remove DP completely but we can reduce it's footprint and thus it's costs to an absolute minimum.
Said and done, the first step was to virtualize most of the systems. Fortunately the customer had enough spare capacity in it's virtualization stack so ressources weren't a problem. Licensing was also okay and in the end a few systems (4 or 5) remained physical, the rest was virtual and coverable by Veeam.
The exsting Veeam installation ran on an old ML350 G6 system with 2 QuadCore CPUs and 16GB of RAM. Direct SAN conenction was done via FibreChannel and the backup storage was build from two HP D2000 systems with 24x600GB SAS running in RAID50 mode. It was the only backup storage so far and was able to store most of the systems on a regular base for approx. 7 days. In the meantime, DP was moving the data to tape so every backup older than 7 days was overwritten and had to be restored from tape by DP and imported by Veeam before restore was possible.
So upgrading the hardware was a MUST but how? The move of DP backups to Veeam raised the requirements for disk storage and even if we use tape support in Veeam to push the data to tape, data on disk is massively restricted and restore would be painful slow if they need to come from tape. To make things worse, the long term archiving still was an essential requirement.
Starting with the old backup hardware we decided to make a guess about the needed storage capacity by running full backups of all VMs and calculate from that base. We ended up with a guess of ~400TB of storage capacity needed for short and long term backups. The 400TB were realistic as we only counted the VMs really needed to be protected by long term backups as well as ignored future growth at this point. 500-600TB would have been a better guess.
Even with falling storage prices, 400TB is a HUGE capacity and can easily reach the 100.000€ barrier. Another thing to keep in mind is that this capacity requires rather a midrange or high-end storage system or several entry-level systems but you still have to keep in mind that the data sitting on this storage has to be secure. Using a low-cost, white-box storage solution couldn't be the way to go. Simply using highest capacity drives isn't the way to go either. Rebuild times, chance of failing several drives in a RAID group at the same time, all these aspects made my stomach rumble and I decided that this won't be the solution I can really offer to the customer.
But as always budget was was one of the key constraints. New hardware for the Veeam server, additional primary storage, new archive storage.....what the customer needed was a new concept. Here is what I have done.
The Veeam server hardware was the easiest part. A brand new DL380 Gen9 with two 10core CPUs running at 2,3GHz and 64GB of RAM was a good starting point. The backup storage where the backups are stored during the real backup phase was doubled by using two D3700 enclosures fully packed with 50 600GB SAS disks. SAS disks because we want to use Instant Recovery and IR is painfully slow on SATA or even MDL SAS disks. The storage was attached to a P441 controller with 4GB of cache. As the P441 doesn't support packing 50 disks within a single RAID group (and even if it would, I would never do such crazy stupid things) I configured 4 RAID5 arrays with 12 disks each leaving two disks as hot spare.
I use Windows Server 2012 R2 on the new Veeam server because it has some tiny little feature we can perfectly use in our new concept. The OS version isn't really important for Veeam if you backup vSphere VMs but is essential if you have Hyper-V. Then you have to use the same OS version on the Veeam backup server as you do on your Hyper-V parent partition. Just as a side note.
Windows Server has a capability that is called storage spaces. Storages spaces are a bit like dynamic disks in the days before Server 2012 but in contrast to dynamic disks I really like storage spaces. As I don't want to have 5 basic disks used as repositories for Veeam, I created a pool across the 4 RAID sets and carved a big volume out.
This 24TB volume will be the home for all my backups. I will refer to this volume as the "primary storage" throughout this article. The volume is thick provisioned but as I use storage spaces I can easily add more physical disks and extend the volume size. Flexibility is one of my key requirements in this project.
Okay, server hardware new, new primary storage, storage spaces for flexibility, time to bring Veeam into play. I use the latest version of Veeam B&R that is 8 update 2b. The six vSphere hosts are imported and the backup jobs are configured. I use reverse incremental even if Veeam says it's slower than forward incremental but in this environment the primary backup storage is that fast, I don't care if the backup needs 5min more or less. Additionally I like the fact that the latest reverse incremental is always a full and I can start recovering the VM from the latest snap with IR really instantly.
After all initial backups completed successfully I saw that our guess about needed capacity was a good guess. Nearly 18TB of data was stored in compressed and inline-deduped format. Doesn't sound that much, how the hell does he come to 400TB of needed space when he only has 18TB of backup data and Veeam does incrementals only? Well, the requirements for long term backup are to keep a complete FULL of each backup on a weekly base for 8 weeks, on a monthly base for 12 month and on a yearly base for 5 years. And if I say FULL I mean FULL and not a single FULL and some incrementals Veeam can build it's synthetic full from. The reason behind this requirement is data integrity. If I always create fulls, a single full can fail and I can use all other versions to restore data from. On an incremental base, only a single failed incremental can cause the complete backup chain to fail rendering ALL backups after the rotten incr unuseable. Nothing you should consider for a long term backup solution. Even if we only include the most important data into this GFS backup schema the end will be 400TB (or more)
A short step back to the primary storage: the capacity of 24TB is enough to save one full of each backup and all incrementals for 14 days. That's what we defined as the default "back in time" range a user normally requires a restore. So within 2 weeks we can restore on a daily base, older backups can have a lower granularity (RPO) and RTO.
The old Veeam server can still be used in this setup. We installed Windows Server 2012 R2, configured SAN access to the VMFS datastores and added the system as backup proxy to the primary Veeam server. The old system doesn't have storage but it has a dedicated GbE NIC to send it's compressed and deduplicated data to the repository on the primary Veeam server.
Let's come to the hardest part, the long term storage. As already said, 400TB of disk storage was far out of budget some creativity here. We thought about hardware deduplication appliances like DataDomain or FTS Eternus systems but they all have one thing in common: they are really high priced and a waste of money as a archive storage solution for Veeam.
To cut a long story short, we decided to use a HP P2000 entry level strorage system attached via FC to the Veeam backup server. A P2000 theoretically could handle a capacity of 400TB but I wouldn't even think about it. We did another approach and configured the P2000 with "only" 100TB of disk space. The capacity comes from 36x4TB harddisks configured in 3 RAID6 sets with 12 drives each. The third RAID set only uses 10 of the 12 disks and the last two disk run as spare disks. That way we could server 100TB splitted in 3 volumes to the Veeam server.
The easiest thing to do now is to import the three volumes and create three repositories out of them but we didn't do that. I don't want three fixed volumes as repositories as with this approach flexibility isn't provided. Additionally 100TB isn't enough to store all our data. You remember the 400TB note above? So currently we have to less capacity and too less flexibility. Here is the next part of the solution.
I already talked about storage spaces, the new feature with Windows Server 2012. The next new feature is software deduplication. Deduplication would be perfectly suited to solve our capacity problems as our archiving schema requires us to save multiple complete fulls where deduplication could be extremely effective. Unfortunately Windows Server deduplication isn't that effective at all. It's not the dedup engine itself it's rather the very limited scalability of the engine. Dedup runs on a volume base and is single threaded (I already wrote about that limitations several times on this blog) so if we simply use a BIG volume for all our archive storage, dedup will never finish. Next thing to remember is the officially supported maximum size of files to be deduped. With Server 2012 R2 this limit is 2TB per file. So we should remember to configure our archie jobs that way the backup files never grow beyond the 2TB limit. And still we need flexibility as we don't know what will the environment look like in 2,6,12 months and we definetly don't want to change the complete setup in the next years.
Bringing it all together I configured the three volumes from the P2000 into a storage space pool. From this pool I carved out several 10TB volumes but this time I don't use thick but rather thin (in the screenshot below it's a german OS where thin is called "dünn") provisioned volumes. On every volume deduplication is active for files older than 3 days.
With this setup I can easily add storage capacity, increase the size of each volume, scale deduplication by serving more volumes and restrict the file size of the backup archives near to the limit of 2tb to make Windows dedup work smoothly.
This setup has some culprits as you need CPU power and RAM for the dedup processes to work. In my environment not a problem as I have 20 cores plus Hyperthreading giving me a enormous number of 40 cores for Veeam and dedup. 64GB of RAM isn't that much but is okay for my current setup and can easily be upgraded if needed.
Back to Veeam, I configured all of my thin-provisioned, dedup-enabled 10TB volumes as repositories and setup Backup copy jobs.
Veeam copy jobs are not really designed to be used as archive jobs because you can't simply tell the job to copy the latest full on a given schedule to the archive storage (that would be possible as I use reverse incrementals so I always have the right format to simply copy). You always have to copy a full first and then copy at least one incremental and let Veeam create a synthetic full from the latest full and the incremental. Really silly in my environment as I already have a reverse incremental, then the copy job "extracts an incremental" out of the reverse incremental synthetic full, copies the data to the archive storage and then transforms the latest full with the new incremental to a new full. Not really efficient but there is no other method to get the data to the archibe storage and have Veeam all information about the data residing on both storages. Fortunately my copy jobs only run two times a week so I reduced the amount of synthetic full creation operations to the minimum.
So my copy jobs look like this: two restore points in simple mode and in GFS mode I have 8 weekly, 12 monthly, 0 quarterly and 5 yearly fulls.
The scheduler is set to run the copy job every 3 days. This way I make sure the two restore points are really the only data beside the synthetic fulls lying on the archive storage.
The windows dedup engine runs in high priority mode nearly 15h a day to finish in a reasonable time. Over all my currently 9 configured archive repositories I get an average dedup ratio of 70% resulting in a space saving of currently nearly 68TB.And all without spending a single dollar or Euro or whatever on high-priced deduplication hardware.
That is unbelievable.
The Windows dedup jobs have no negative performance impact on the copy jobs, even synthetic fulls get build quite fast. Restores from archive repositories are smooth too.
Next step will be to implement copy to tape for the archive repositories as Windows dedup is still a quite new product and especially for long term backup data integrity is the most important thing. Having a backup of the archive every month will easy my mind.
To come to the end: using large backup repositories combined with archives based on Windows dedup storage spaces is possible and, if configured correctly, a highly scalable, stable and economical solution.