Virtual Tape Libraries

Choosing the right VTL solution isn't as easy as choosing a new tape library. There are several questions to ask before choosing the right solution for your environment. These questions are:

  • Vendor. There are several vendors on the market that offer VTL solutions. Big vendors like HP, FTS, IBM etc. or smaller ones like Overland, Quantum and so on
  • Features. These can be deduplication, number of virtual libraries and drives etc.
  • Commercial vs. open-source. There are several open-source projects that deal with turning standard hardware into VTLs like MHVTL or QuadStor VTL
  • Physical vs. virtual. Since VTL software is more or less a Linux system that emulates tape by software you can also use it in form of a virtual machine
  • Connection. FC or iSCSI-based
  • Upgrade path. In which steps is your VTL solution upgradeable. All vendors have different granularity when it comes to add additional storage
  • Performance. Different models have different performance characteristics and you shouldn't have a VTL bought for a lot of money that is slower than your former tape drives
  • Initial size. The most important thing to define as the initial size is what has direct influence on the model and therefore the price you have to pay

The initial size isn't quite easy to determine especially if you plan to use deduplication. Deduplication not only has direct influence on the space needed to save all your date, it will probably also have an influence on how you setup your backup jobs. With tape backup you probably rely massively on multiplexing which joins multiple backup streams to a single stream to deliver the data fast enough to your physical drives. Multiplexing on the other hand is a "killer" for deduplication as it lowers the efficiency by 30-50% (if you believe the whitepapers of most of the vendors).

So how to get an idea of how big your VTL has to be initially? HP, and probably other vendors too, have trial virtual appliances that you can use to backup data to. The HP StoreOnce VSA can be used for 60 days without any additional cost and has all features enabled that you need to determine how good your backup data can be deduped. There are only three limitations:

  1. the VSA is limited to 10TB of useable disk space so if your backup data is more than 10TB of deduped space you have to roll out more VSAs
  2. If you have more than one VSA or more than one virtual library in a VSA the dedup is limited to the virtual library. There is no "global" deduplication inside a VSA or acorss several VSAs
  3. the VSA can (by design) only present tape via iSCSI. If you rely on FC you have to create an iSCSI network first

To get an idea of how much initial capacity you need simple store two or more full backups of your environment to the VSA and check how well it can be deduped. Even if you use multiplexing, in first step, leave the multiplexed jobs as they are and store it to the VTL. Even multiplexed jobs can be deduped, the question is only how well.

If your backup jobs are bigger than the capacity of a single VSA then you have a problem. Adding a second VSA gives you more capacity but deduplication can't work across VSAs so on the second VSA the dedup process starts from scratch. That way you can't estimate how much capacity you really need. You can only give a guess.

So what to do now? In one project HP gave us a physical StoreOnce to test at customers site but it was quite hard to get and I don't think HP will throw hundreds of VTLs as test systems on the market. So HP decided to release a special version of the VSA that works in "no data mode". That means, all data written to the VSA will be discarded as soon as it is analyzed by the dedup engine. That way, the appliance doesn't need much space (1-2TB is enough to store metadata) and you can throw billions of bytes on the VSA without running out of space.

This is quite cool because now you can test your environment as long as you wish and get exact data on the dedup ratio. With this knowledge you can now size your VTL exactly to your needs.

Since no one can abstain on the productive backup just to have enough ressources to run the dedup tests we always recommend to configure the StoreOnce VSA as a secondary target for backup. That way, your regular backup runs as before to the tape but additionally to the VSA. Now you only have to keep two more things in mind:

  1. you need enough licences of your backup software to double the number of target drives
  2. you need a bigger backup windows as the VSA even with no data mode is much slower than a physical LTO3/4 drive

The bottom line: VTLs are perfect to replace tape libraries in the first line. With deduplication enabled you can store hugh amounts of data and have them in fast, direct access. But as VTLs aren't really cheap one have to figure out exactly what to buy. 

