Saturday 26 May 2007

Using VTLs

The discussion on EMC's VTL has got StorageZilla and other EMC'ers a little excited. It stems from Barry's post on the DL6000, a Godzilla of a VTL as I've previously discussed before.

I think it is worthwhile reviewing the exact point of having a VTL in the first place.

Tape has issues and we all know it. They break, wear out, develop errors and most importantly get lost, compromising their content and potentially leading to embarrassment and large fines for the companies involved. Tape has the advantage that it is cheap, portable and (other than keeping it at the right temperature) costs a minimal amount to store.

Point 2; why do we do backups? They're done to recover from inadvertent data loss - logical corruption, user error, hardware failure and so on. Depending on company size, it is possible that tape may be used in the DR process but for large companies, they will probably either run a second site or use a third party DR company. It is likely that these companies will *not* rely on restoring from tape in the event of a complete site outage or disaster. BC/DR processes for these companies will be ultimately more complex than calling back the tapes from Iron Mountain.

Backups have also become a method of archiving data in the absence of a proper tool for archiving. Restoring data for archive purposes from backups depends on staff who have specific knowledge in order to know what to restore. The format of backup data precludes the ability to easily search and index content and mine it for competitive purposes. It might be *possible* but is it *practical* and can the effort justify the rewards gained from the data? Unlikely.

VTLs have developed to answer some of the issues relating to tape, namely the failure rates of backups and the time to access to get data back on restores. VTL's do not answer the issues of tape loss - let's face it, if you don't send the tapes offsite, you can't lose them. VTLs *have* to be located offsite from the main data otherwise you are at risk so why not just put the tape library offsite instead?

There are plenty of other techniques available to ensure you don't have to go running to backups for data recovery. Point-in-time copies, CDP, remote replication, snapshots, VSS all provide the ability to get data back and get it back quickly. Tape never did that - why should we expect a VTL to do so?

Enterprise-scale customers will have multiple sites and already write data to a remote location, perhaps across IP or DWDM/dark fibre with fibre channel connections. They will have tape libraries with automation in place and not need to ship tapes offsite. Their issues with tape will revolve around getting backup successes to 100%, eliminating those failures which occur from faulty media and ensuring that when restores take place, then tape drives are available to enable as many restores to occur as possible. VTLs enable that. But I don't believe VTLs should enable that at any price.

80-90% or more of data on tape is at rest. Tape data expires and the tapes are reused in cycles. Tape is effective because the 90% of inactive media can be put on a shelf and left alone. If you are realistic, then you'd say even automated tape libraries are not effective and they should only contain perhaps only the last 6 months or so of backup data. As tapes are cheap, multiple copies can be kept. If one tape is damaged or fails, it doesn't affect the content on the remaining tapes.

VTLs need to offer the same level of functionality:

  • You want power consumption to be as low as possible.
  • You want TCO to be as low as possible.
  • You don't want component failure to affect the whole archive.
  • You want granular access to your data to enable you to restore what you want when you want.

My point in the previous post was that the Copan product has been designed to address these requirements. It only powers up 25% of the disks because you never read and write from all your tapes at any one time - you couldn't do it because you never had the drives available to mount the tapes! Also, tape data is usually multiple copies of the same thing going back over months or years, so you would only restore the *last* copy in a DR situation, which is likely to be much less than 25% of the data on tape (or even a VTL). A point worthy of note here; the Copan system doesn't block access to data on drives that are powered down. It simply powers down another drive and powers up the one needed to provide access to the data so all the user sees is a delay in access.

The Copan system writes data on shelves which are treated as individual virtual libraries. Any one can be powered down individually to replace failed drives, so the whole system doesn't have to be taken down. That also helps to meet the issue of a hardware failure not affecting all the content. Data is not spread across the whole system so is not all at risk if a shelf did fail. The system also performs periodic data validation/drive scrubbing to ensure drives which are going to fail can be easily identified.

I'd like to end on one thought. If I was implementing ILM, I would implement a strategy which puts the most valuable data on Enterprise arrays. "Valuable" would mean both risk of losing and also risk of time to access; I'd want to access the data 24/7 and not lose it. As data value reduces then it goes on to less expensive technology where the tradeoff of cost versus availability is met. For instance, development data on modular storage. Finally, backup data would sit on the least expensive hardware platform. EMC are suggesting I keep that data on their most *expensive* platform!

Think of it this way - if you have a DMX3 already, why bother buying a DMX3 VTL solution? Just shove some 500GB drives into your existing DMX3 and backup straight to disk - it will be exactly the same in terms of availability and reliability but without the VTL licence cost!

1 comment:

Unknown said...

Storagezilla says:
EMC isn't suggesting that customers put their backup data on EMC's largest and most expensive platform, *customers* told EMC that's what they were looking to do.

When EMC introduced the DL 300 & 700 based on the CX2 in April 2004 the first big request was a two controller option. That's what spurred the development of the DL 740. After that the next major development request was for a DMX 3 model, and I'm not even going to get into what some of them asked for last week but you can guess that one of the shortlist items was even greater capacity.

We can argue the power thing until the Sun (Or Sun Microsystems) explodes, I personally don't like Copan's approach and would prefer to be able to run everything if required but spin down what isn't being used until it's required, as it'll allow me to maximise stream performance across as many LUNs as I require (Without risking a complete erasure using some form of LUN spraying approach. -cough- NetApp), but the value add of a VTL over backing up to a FS is the fact that you can offload cloning & replication operations from the backup server to an embedded storage node or such running directly on the VTL itself. It's an intelligent appliance after all.

All in all however there's nothing wrong with backing up to lower cost drives in another or the same array, though people did think I was crazy when I used to mention EDM using Symmetrix storage as a target ages ago.

Times change.