The discussion on EMC's VTL has got StorageZilla and other EMC'ers a little excited. It stems from Barry's post on the DL6000, a Godzilla of a VTL as I've previously discussed before.
I think it is worthwhile reviewing the exact point of having a VTL in the first place.
Tape has issues and we all know it. They break, wear out, develop errors and most importantly get lost, compromising their content and potentially leading to embarrassment and large fines for the companies involved. Tape has the advantage that it is cheap, portable and (other than keeping it at the right temperature) costs a minimal amount to store.
Point 2; why do we do backups? They're done to recover from inadvertent data loss - logical corruption, user error, hardware failure and so on. Depending on company size, it is possible that tape may be used in the DR process but for large companies, they will probably either run a second site or use a third party DR company. It is likely that these companies will *not* rely on restoring from tape in the event of a complete site outage or disaster. BC/DR processes for these companies will be ultimately more complex than calling back the tapes from Iron Mountain.
Backups have also become a method of archiving data in the absence of a proper tool for archiving. Restoring data for archive purposes from backups depends on staff who have specific knowledge in order to know what to restore. The format of backup data precludes the ability to easily search and index content and mine it for competitive purposes. It might be *possible* but is it *practical* and can the effort justify the rewards gained from the data? Unlikely.
VTLs have developed to answer some of the issues relating to tape, namely the failure rates of backups and the time to access to get data back on restores. VTL's do not answer the issues of tape loss - let's face it, if you don't send the tapes offsite, you can't lose them. VTLs *have* to be located offsite from the main data otherwise you are at risk so why not just put the tape library offsite instead?
There are plenty of other techniques available to ensure you don't have to go running to backups for data recovery. Point-in-time copies, CDP, remote replication, snapshots, VSS all provide the ability to get data back and get it back quickly. Tape never did that - why should we expect a VTL to do so?
Enterprise-scale customers will have multiple sites and already write data to a remote location, perhaps across IP or DWDM/dark fibre with fibre channel connections. They will have tape libraries with automation in place and not need to ship tapes offsite. Their issues with tape will revolve around getting backup successes to 100%, eliminating those failures which occur from faulty media and ensuring that when restores take place, then tape drives are available to enable as many restores to occur as possible. VTLs enable that. But I don't believe VTLs should enable that at any price.
80-90% or more of data on tape is at rest. Tape data expires and the tapes are reused in cycles. Tape is effective because the 90% of inactive media can be put on a shelf and left alone. If you are realistic, then you'd say even automated tape libraries are not effective and they should only contain perhaps only the last 6 months or so of backup data. As tapes are cheap, multiple copies can be kept. If one tape is damaged or fails, it doesn't affect the content on the remaining tapes.
VTLs need to offer the same level of functionality:
- You want power consumption to be as low as possible.
- You want TCO to be as low as possible.
- You don't want component failure to affect the whole archive.
- You want granular access to your data to enable you to restore what you want when you want.
My point in the previous post was that the Copan product has been designed to address these requirements. It only powers up 25% of the disks because you never read and write from all your tapes at any one time - you couldn't do it because you never had the drives available to mount the tapes! Also, tape data is usually multiple copies of the same thing going back over months or years, so you would only restore the *last* copy in a DR situation, which is likely to be much less than 25% of the data on tape (or even a VTL). A point worthy of note here; the Copan system doesn't block access to data on drives that are powered down. It simply powers down another drive and powers up the one needed to provide access to the data so all the user sees is a delay in access.
The Copan system writes data on shelves which are treated as individual virtual libraries. Any one can be powered down individually to replace failed drives, so the whole system doesn't have to be taken down. That also helps to meet the issue of a hardware failure not affecting all the content. Data is not spread across the whole system so is not all at risk if a shelf did fail. The system also performs periodic data validation/drive scrubbing to ensure drives which are going to fail can be easily identified.
I'd like to end on one thought. If I was implementing ILM, I would implement a strategy which puts the most valuable data on Enterprise arrays. "Valuable" would mean both risk of losing and also risk of time to access; I'd want to access the data 24/7 and not lose it. As data value reduces then it goes on to less expensive technology where the tradeoff of cost versus availability is met. For instance, development data on modular storage. Finally, backup data would sit on the least expensive hardware platform. EMC are suggesting I keep that data on their most *expensive* platform!
Think of it this way - if you have a DMX3 already, why bother buying a DMX3 VTL solution? Just shove some 500GB drives into your existing DMX3 and backup straight to disk - it will be exactly the same in terms of availability and reliability but without the VTL licence cost!