Saturday, 26 May 2007

Using VTLs

The discussion on EMC's VTL has got StorageZilla and other EMC'ers a little excited. It stems from Barry's post on the DL6000, a Godzilla of a VTL as I've previously discussed before.

I think it is worthwhile reviewing the exact point of having a VTL in the first place.

Tape has issues and we all know it. They break, wear out, develop errors and most importantly get lost, compromising their content and potentially leading to embarrassment and large fines for the companies involved. Tape has the advantage that it is cheap, portable and (other than keeping it at the right temperature) costs a minimal amount to store.

Point 2; why do we do backups? They're done to recover from inadvertent data loss - logical corruption, user error, hardware failure and so on. Depending on company size, it is possible that tape may be used in the DR process but for large companies, they will probably either run a second site or use a third party DR company. It is likely that these companies will *not* rely on restoring from tape in the event of a complete site outage or disaster. BC/DR processes for these companies will be ultimately more complex than calling back the tapes from Iron Mountain.

Backups have also become a method of archiving data in the absence of a proper tool for archiving. Restoring data for archive purposes from backups depends on staff who have specific knowledge in order to know what to restore. The format of backup data precludes the ability to easily search and index content and mine it for competitive purposes. It might be *possible* but is it *practical* and can the effort justify the rewards gained from the data? Unlikely.

VTLs have developed to answer some of the issues relating to tape, namely the failure rates of backups and the time to access to get data back on restores. VTL's do not answer the issues of tape loss - let's face it, if you don't send the tapes offsite, you can't lose them. VTLs *have* to be located offsite from the main data otherwise you are at risk so why not just put the tape library offsite instead?

There are plenty of other techniques available to ensure you don't have to go running to backups for data recovery. Point-in-time copies, CDP, remote replication, snapshots, VSS all provide the ability to get data back and get it back quickly. Tape never did that - why should we expect a VTL to do so?

Enterprise-scale customers will have multiple sites and already write data to a remote location, perhaps across IP or DWDM/dark fibre with fibre channel connections. They will have tape libraries with automation in place and not need to ship tapes offsite. Their issues with tape will revolve around getting backup successes to 100%, eliminating those failures which occur from faulty media and ensuring that when restores take place, then tape drives are available to enable as many restores to occur as possible. VTLs enable that. But I don't believe VTLs should enable that at any price.

80-90% or more of data on tape is at rest. Tape data expires and the tapes are reused in cycles. Tape is effective because the 90% of inactive media can be put on a shelf and left alone. If you are realistic, then you'd say even automated tape libraries are not effective and they should only contain perhaps only the last 6 months or so of backup data. As tapes are cheap, multiple copies can be kept. If one tape is damaged or fails, it doesn't affect the content on the remaining tapes.

VTLs need to offer the same level of functionality:

  • You want power consumption to be as low as possible.
  • You want TCO to be as low as possible.
  • You don't want component failure to affect the whole archive.
  • You want granular access to your data to enable you to restore what you want when you want.

My point in the previous post was that the Copan product has been designed to address these requirements. It only powers up 25% of the disks because you never read and write from all your tapes at any one time - you couldn't do it because you never had the drives available to mount the tapes! Also, tape data is usually multiple copies of the same thing going back over months or years, so you would only restore the *last* copy in a DR situation, which is likely to be much less than 25% of the data on tape (or even a VTL). A point worthy of note here; the Copan system doesn't block access to data on drives that are powered down. It simply powers down another drive and powers up the one needed to provide access to the data so all the user sees is a delay in access.

The Copan system writes data on shelves which are treated as individual virtual libraries. Any one can be powered down individually to replace failed drives, so the whole system doesn't have to be taken down. That also helps to meet the issue of a hardware failure not affecting all the content. Data is not spread across the whole system so is not all at risk if a shelf did fail. The system also performs periodic data validation/drive scrubbing to ensure drives which are going to fail can be easily identified.

I'd like to end on one thought. If I was implementing ILM, I would implement a strategy which puts the most valuable data on Enterprise arrays. "Valuable" would mean both risk of losing and also risk of time to access; I'd want to access the data 24/7 and not lose it. As data value reduces then it goes on to less expensive technology where the tradeoff of cost versus availability is met. For instance, development data on modular storage. Finally, backup data would sit on the least expensive hardware platform. EMC are suggesting I keep that data on their most *expensive* platform!

Think of it this way - if you have a DMX3 already, why bother buying a DMX3 VTL solution? Just shove some 500GB drives into your existing DMX3 and backup straight to disk - it will be exactly the same in terms of availability and reliability but without the VTL licence cost!

Friday, 25 May 2007

Mine's Bigger than yours - Part Deux

Old Barry's at it again with the DMX pitch, this time for VTL.

Here's a quick comparison with Copan's Revolution 220TX system and the specification of the DL6100 from EMC's website.

Copan Revolution 220TX

  • Single Cabinet
  • 896 drives (500GB SATA)
  • Power (max): 6368 watts
  • Throughput: 5.2TB/hour
  • Capacity: 448TB max
  • Emulation: 56 libraries, 56 drives, 8192 virtual cartridges

EMC DL6000

  • 7 cabinets (max configuration)
  • 1440 drives (500GB LCFC)
  • Power (max): 49100 watts
  • Throughput: 6.4TB/hour
  • Capacity: 615TB max
  • Emulation: 256 libraries, 2048 drives, 128000 virtual cartridges

So, you'd need 1.5 Copan devices to match the EMC kit and yes, it doesn't scale as well in terms of virtual components, but there are some big issues here. For instance, EMC's device isn't green - the power demands are huge and not surprising, as the drives are all spinning all the time (Copan have only 25% max if theirs in use at any time). Floor density is not good in the DMX - why? Because the drives are in use all the time and it is an Enterprise array, so timely replacement of disk failures is important - but less so for a virtual tape system which can tolerate downtime.

So what would you go for? My money would be on a DMX for Enterprise work, which it is great at - and not as a "one size fits all" system, which it quite plainly is not.

Tuesday, 22 May 2007

Trusting TrueCopy

For those people who use TrueCopy on a daily basis, you'll know that the assignment of a TrueCopy pair is based on a source and target storage port, host storage domain and LUN. This means a LUN has to be assigned out to replicate it.

The part that has always worried me is the fact that the target LUN does not need to be in a read-only status and can be read-write.

Please, HDS to save my sweaty palms, change the requirement to make the target volume read-only before it can be a TrueCopy target....

Thin Provisioning - Cisco Style

There have been so many discussions on thin provisioning since the Hitachi USP-V announcement. When a major player takes on a specific technology, all of a sudden we realise that everyone else has already been doing it. Tony Asaro's post probably provides the best summary of those who do it today.

One vendor not appearing on the list is Cisco. Not surprising as they don't produce storage systems, however they do produce fibre channel switches which also implement thin provisioning.

I talked about the issue not that long ago here. Now I'm laying out ports for real and its not as clear as it seems. The low down is if you put ports in "dedicated" mode, you get the port speed reserved (or 4Gbps if you set to auto, regardless of the speed negotiated) and in "shared" mode you have a minimum requirement, 4Gbps ports need 0.8Gbps reserved for instance, and the figure reduces in proportion for 2 and 1Gbps ports. More details can be found here. This means not all combinations are possible and you get "Bandwidth Not Available" messages when you don't expect it. As this was confusing me, I've put together a port speed calculator, you can pick it up at Cisco Rate Calculator.

Thursday, 17 May 2007

New Product Announcement,,,

There's another new product out this week from HP - the XP24000... It seems to have some similarities to another product launch, 224 ports, thin provisioning, partitioning... :-)

Wednesday, 16 May 2007

Poll Results

Thanks to everyone who voted in the poll on Optimiser/Cruise Control; seems like most people like the recommendations but don't trust it to make automatic changes.

I've got a new poll running for the next month - tell me what you think about USP-V. Now I'm *sure* there will be lots of opinion on this....

Port Indexes on Cisco Switches

I spent today building some Cisco MDS9513 switches. It's good to get into the nuts and bolts of technology once in a while. They are big beasts - 2nd generation technology and with 11 usable slots, the biggest available line cards (48 ports), 528 ports in a single chassis. However the build had one fly in the ointment. As part of the (inherited) design, the chassis included a "generation 1" eight ethernet port IP line card for implementing FCIP.

The issue revolves around a feature called port indexes, which are used to track the ports installed in a MDS chassis. Generation 1 chassis have a maximum of 252 port indexes, generation 2 technology supports up to 1020 port indexes. However, a generation 1 line card inserted into a generation 2 chassis dumbs down the whole switch to generation 1 port indexes. So, with 252 indexes, 32 taken up by the FCIP card, only 240 port indexes are left which directly translates to 240 ports - less than half the switch capacity! If line cards are installed that take the port count above 252 then these additional cards won't come online - they will initially power up then power down.

In this instance the solution will be to move IP services to another (smaller) switch and hopefully Cisco will bring out a generation 2 version of the IP blade soon. There is a bigger problem though, and that is for any customers looking to take 9513s and use them with the SSM module, which supports products such as EMC Invista and Kashya (now EMC RecoverPoint). As far as I am aware, there's no plans for a generation 2 SSM module any time soon, so using the SSM module in the 9513 chassis will create the same port restriction issues.

I don't see why Cisco couldn't simply produce a gen 2 version of the old gen 1 blades which did nothing more than re-jig the port indexing. Come to think of it, surely they could patch the firmware on the gen1 line-cards to fix the problem. Obviously it is not that simple, or perhaps not that many people are using Invista and RecoverPoint to make it worthwhile.

USP-V...and another thing

I hate it when I write a post then think of other things afterwards. One more USP-V thought (I'm sure there will be more). One of the drawbacks of the current hardware with virtualisation is the effort to remove/upgrade the USP. Was there anything in the announcements on Monday to cater for this? I was hoping HDS would announce USP "clustering". Although they don't think it is necessary from a resiliency perspective, it certainly is if you want to upgrade and haven't done a 1 for 1 passthrough on LUNs (i.e. presented "big" LUNs to the array and then carved them up in the USP).

So HDS, did you do it?

Oh, and another one...for customers who've just purchased a standard USP, will there be a field upgrade to USP-V?

Tuesday, 15 May 2007

USP-V - bit of a let down?

It seems from the posts seen so far on the blogosphere that the USP release is causing a bit of a stir (10 points for stating the obvious I think). So, here’s my take on the announcements so far.

First of all, it’s called USP-V – presumably because of the “Massive 500-Percent Increase in Virtualized Storage Port Performance for External Storage”. I'm not sure what that means - possibly more on that later.

As previously pointed out, the USP-V doesn’t increase the number of disks it supports. It stays at 1152 and disappointingly the largest drive size is still 300GB and only 146GB for 15K drives. I assume HDS intends to suggest that customers should be using virtualisation to connect to lower cost, higher capacity storage. That’s a laudible suggestion and only works if the Universal Volume Manager licence is attractive to make it work. In my experience this is an expensive feature and unless you’re virtualising a shed-load of storage, then it probably isn’t cost effective.

There have been some capacity increases; the number of logical LUNs increases 4-fold. I think this has been needed for some time, especially if using virtualisation. 332TB with 16384 virtual LUNs meant an average of 20GB per LUN, obviously now it is only 4GB. Incidentally, the HDS website originally showed the wrong internal capacity here: http://www.hds.com/products/storage-systems/capacity.html, showing the USP-V figures the same as the base USP100. It’s now been corrected!

Front-end ports and back-end directors have been increased. For fibre-channel the increase is from 192 to 224 ports (presumably 12 to 14 boards) and back-end directors increase from a maximum of 4 to 8. I’m not sure why this is if the number of supportable drives hasn’t been increased (do HDS think 4 was insufficient or will be see a USP-V MKII with support for more drives?). Although these are theoretical maxima, the figures need to be taken with a pinch of salt. For example, the front-end ports are 16-port cards in the USP and there are 6 slots. This provides 96 ports, the next 96 are provided by stealing back-end directors (this is similar to DMX-3 – 64 ports maximum which can be increased to 80 by removing disk director cards). Surprisingly, throughput hasn’t been increased. Control bandwidth has, but not cache bandwidth. Does the control bandwidth increase provide the 500% increase in virtualisation throughput for external storage?

What about the “good stuff”? So, far, all I can see is Dynamic (thin) Provisioning and some enhancements to virtual partitions. The thin provisioning claims to create virtual LUNs and spread data across a wide number of array groups. I suspect this is simply an extension of the existing copy-on-write technology, which if it is, makes it hardly revolutionary.

I’d say the USP-V is an incremental change to the existing USP and not quite worthy of the fanfare and secrecy of the last 6 months. I’d like to see some more technical detail (HDS feel free to forward me whatever you like to contradict my opinion).

One other thought. I don’t think the DMX-3 was any less or more radical when it was released....

Monday, 14 May 2007

100 Up


I've reached 100! No, not 100 years old, but my 100th post. I've surprised myself that I've managed to keep going this long.


So a bit of humour - I'll save the USP-V discussions until later - who remembers Storage Navigator for the 9900 series?


Well, the left-hand toolbar has various icons on it. One of them I've reproduced here;

it looks like a fried egg and a spanner. But what is it meant to represent (btw, it's the icon for LUSE/VLL).

Answers on a postcard please....

Friday, 11 May 2007

Mine's bigger than yours - do we care?

Our resident storage anarchist has been vigorously defending DMX - here. It's all in response to discussions with Hu regarding whether USP is better than DMX. Or, should I say DMX-3 and Broadway (whoops, I mentioned the unmentionable, you'll have to shoot me now).

I have to say I enjoyed the technical detail of the exchanges and I hope there will be a lot more to come. Any insight into how to make what are very expensive storage subsystems work more effectively has to be a good thing.

But here's the rub. Do we care about how much faster DMX-3 is over USP? I doubt the differences are more than incremental and as I've both installed, configured and provisioned storage on 9980V/USP/8730/8830/DMX/DMX2/DMX3, I think I've enough practical experience to qualify it. (By the way, I loved StorArch's comment about how flexible BIN file changes are now. Well, they may be, but in reality I've found EMC cumbersome to release configuration changes).

Finally I'll get round to the point of this post; most large enterprise subsystems are of the same order of magnitude of performance. However I've yet to see any deployment where performance management is executed to such a degree that, hand on heart, the storage admins there can claim they sequeeze 100% efficient throughput. I'd make an estimate that things probably run 80% efficient, with the major bottlenecks being port layout, backend array layout and host configuration.

So the theoretical bantering on who is more performant than the other is moot; now, EMC, HDS or IBM, come up with a *self tuning* array then you've got a winner...

Wednesday, 9 May 2007

Simulator Update

I managed to get a copy of the Celerra Simulator last week and I've just managed to get it installed. Although it is simple, it is quite specific on requirements - it runs under VMware ACE as a Linux program. It needs an Intel processor and can't run on a machine with VMware already installed. Fortunately my test server fits the bill (once I uninstalled VMware Server). Once up, you administer through a browser.

At this stage that's as far as I've got - however it looks good. More soon.

Port Oversubscription

Following on from snig’s post, I promised a blog on FC switch oversubscription. It’s been on my list for some time and I have discussed it before, however it has also a subject I’ve discussed with clients from a financial perspective and here’s why; most people look at the cost of a fibre channel switch on a per port basis, regardless of the underlying feature/functionality of that port.

Not that long ago, switches from companies such as McDATA (remember them? :-) ) provided full non-blocking architecture. That is, they allowed full 2Gb/s for any and all ports, point to point. As we moved to 4Gb/s, it was clear that Cisco, McDATA and Brocade couldn’t manage (or didn’t want) to deliver full port speed as the port density of blades increased. I suspect there were be issues with ASIC cost and fitting the hardware onto blades and cooling it (although Brocade have just about managed it).

For example, on Generation 2 Cisco 9513 switches, the bandwidth per port module is an aggregate 48Gb/s. This is regardless of the port count (12, 24 or 48), so, although a 48 port blade can (theoretically) have all ports set to 4Gb/s, the ports on average only have 1Gb/s of bandwidth.

However the configuration is more complex; ports are grouped into port groups, 4 groups per blade of 12Gb/s each, putting even more restriction on the ability to use available bandwidth across all ports. Ports can be dedicated or use shared bandwidth within a port group. In a port group of 12 ports, set three to dedicated bandwidth of 4Gb/s and the rest are (literally) unusable. Whilst I worked at a recent client, we challenged this option with Cisco. As a consequence, 3.1(1) of SAN-OS allows the disabling of all restrictions so you can take the risk and set all ports to 4Gb/s and then you’re on your own.

How much should you pay for these ports? What are they actually worth? Should a 48-port line card port cost the same as a 24-port line card port? Or, should they be rated on bandwidth? Some customers choose to use 24-port line cards for storage connections or even the 4-port 10Gb/s cards for ISLs. I think they are pointless. Cisco 9513’s are big beasts; they eat a lot of power and need a lot of cooling. Why wouldn’t you want to cram as many ports into a chassis as possible?

The answer is to look at a new model for port allocation. Move away from the concepts of core-edge and edge-core-edge and mix storage ports and host ports on the same switch and where possible within the same port group. This would minimise the impact of moving off-blade, off-switch or even out of port group.

How much should you pay for these ports? I’d prefer to work out a price per Gb/s. From the prices I’ve seen, that makes Brocade way cheaper than Cisco.

Tuesday, 1 May 2007

Tuning Manager CLI

I've been working with the HiCommand Tuning Manager CLI over the last few days in order to get more performance information on 9900 arrays. Tuning Manager (5.1 in my case) just doesn't let me present data in a format I find useful, and I suppose that's not really surprising as, unless you're going to add a complete reporting engine into the product, then you'll be wanting to get the data out of the HTnM database and build your own bespoke reports.

So I had high hopes for the HTnM CLI, but I was unfortunately disappointed. Yes, I can drag out port, LDEV, subsystem (cache etc) and array group details, however I can only extract one time period of records at a time. I can display all the LDEVs for a specific hour, or a day (if I've been aggregating the data) but I can't specify a date or time range. This means I've had to script extracting and merging the data and the result is, it is sloooow. Really slow. One other really annoying feature - fields that report byte throughput sometimes report as "4.2 KB" sometimes as "5 MB" - which programmer thought a comma delimited output would want a unit suffix?

I'm expecting delivery of HTnM 5.5 (I think 5.5.3 to be specific) this week and here's what I'm hoping to find; (a) the ability to report over date/time range (b) the database schema to be exposed for me to extract data directly. I'm not asking much - nothing much more than other products offer. Oh, and hopefully something considerably faster than now.