Sunday 21 May 2006

Hybrid disk drives

Samsung have announced a hybrid hard disk for use with Windows Vista. The announcement is here http://www.samsung.com/PressCenter/PressRelease/PressRelease.asp?seq=20060517_0000255266

Basically, the disk just has more cache memory, 128MB or 256MB, but crucially, can use the cache as an extension of the hard disk as a staging area. This allows (dependent on traffic type) for the hard disk to be left spun down for longer periods. Samsung are pitching the device as offering benefits to laptops and to reduce boot speeds for Windows systems.

That's all great for Windows, but what about wider usage? It seems to me that the benefit of having a larger more intelligent cache is great for a PC based operating system where the working set of data on a large disk may only be a few hundred megabytes. Larger systems will probably have access profiles of more random read/write or even significant sequential access.

It may be that with intelligent microcode, midrange and enterprise arrays can benefit from the ability to leave devices spun down, potentially saving power and cooling. That would be a great benefit in today's datacentres.

Thursday 18 May 2006

The Case for 10Gb/s

Fibre channel storage speeds are now up to 10Gb/s as I'm sure we're all aware. Brocade, McDATA and Cisco all have 4Gb/s products. Question is, are they really necessary?

Pushing a full 1 or 2Gb/s of data across a fibre channel connection at a sustained rate requires some decent processing power, so why move things on to 4 and 10? Well, 4Gb/s and 10Gb/s certainly prove useful as ISL connections. They reduce the number of ports required and subsequently the cabling. But with faster connections comes a price; cabling distance for multimode fibre drops significantly. Check it out here : http://storagewiki.brookend.dyndns.org/ow.asp?Fibre%5FChannel

So faster speeds yes, but shorter distances. Wasn't one of the benefits of fibre channel to remove us from the 25m SCSI cable restriction?

One other thing to bear in mind. If we start consolidating traffic into less but faster ISLs, I'd want to be very sure that the aggregation of traffic gives me adequate quality of service for each of the traffic types sharing the link. Cisco can do that; McDATA are talking about it; Brocade, not sure.

So what do I think, are the faster speeds necessary? Well, I can see the benefit of 4Gb/s for storage ports, less so for host ports. I can also see the limited benefit of using 10Gb/s for ISLs between locally placed switches, but I think that's where it ends for 4 & 10Gb/s. Hosts and storage systems can't push that degree of data around and switch backplanes can't move the data around either. So for me, 1/2Gb/s will be the norm, 4/10Gb/s will be for special occasions.

Tuesday 9 May 2006

Managing Hard Disk Capacities

It has been 50 years since the first hard disk drive was invented by IBM. It had a capacity of a mere 5MB. Contrast that today with the latest 500GB drive from Hitachi. In 50 years, that's an increase of 2000-fold capacity each year!

There's no doubting disk drives have also become more faster and more reliable as time has passed, however the recent increases in capacities have brought with them new challenges.

First of all, two streams of hard drive technology have developed based on the interface type they support; the SCSI/Fibre Channel/Serial Attached SCSI and the Serial ATA/Parallel ATA formats. SCSI-based are generally more reliable than SATA drives due to the components used and build quality, however SCSI-based drives carry a price premium. In addition, SCSI devices usually have faster rotational speed and lower capacities than SATA. Compare the Hitachi Ultrastar 15K147 drive which spins at 15,000rpm with a capacity of 147GB to the Deskstar 7K500 which spins at 7200rpm with a capacity of 500GB. That's half the speed with three times the capacity. Clearly these drive types have very different uses; the Ultrastar is more suited to high performance random workloads while the Deskstar is more of a low-cost low-activity archive type device.

The increase in capacity brings into question the subject of reliability. The Ultrastar drive quotes a reliability of 1 in 10E15 read/writes. This is only 250 complete reads of a single drive and for a heavily used device could be easily achieved in a short time. Deskstar drives are 10 times worse and would fail after only 20 complete reads. Obviously this is not acceptable and since 1978 we've had RAID and since 1988 more advanced versions (including RAID-5) which uses multiple disks to protect data in case of a single disk failure.

RAID works well, but as disk sizes have increased, so has the risk of a second disk failure during the rebuild of a failed disk in a RAID group. Have a look at Dave Hitz's article on the Netapp website. This seems a little simplistic, however it sobering reading and it is clear why double parity (or RAID 6, or 6+2) is a required evolution of standard RAID-5 technology. This provides for a second disk failure without data loss in a RAID group, statistically decreasing the failure risk for a RAID group to almost infinitesimal levels. I don't think Dave's calculation is fair as most enterprise arrays will predictively "spare out" a potentially failing disk before it actually fails. This rebuilds a new disk from suspected failing disk itself, still providing the other disks in the RAID group to use if the disk does actually fail.

Personally if I had the option I would choose 6+2 protection, subject to one proviso; each RAID array group on 300GB disks will be 2.4TB of storage at a time. This is not a small amount!

Hard disk drives will continue to grow relentlessly. Seagate have already announced a 750GB drive using perpendicular recording. A single disk subsystem can hold a huge amount of intellectual capacity. I'll discuss this another time.

The Green Datacentre

I've been thinking about green issues this week after we received our "bottle box" from the local council to encourage us to recycle glass bottles and jars. In particular, my thoughts were drawn to the environmental friendliness of datacentres, or not as the case may be.

Datacentres are getting more demanding in their power and cooling requirements. It's now possible for more space to be set aside for plant covering the provision of electricity and air conditioning than actual datacentre space and the usage of space is having to be carefully planned to ensure high demand equipment can be catered for. Take for example fabric switches, or more specifically directors as they are more large scale and environmentally hungry. Let'st start with the new Cisco beast, the 9513. In a full configuration, this has 528 ports in a 14U frame but takes 6000 watts of power and outputs the same amount of heat requiring cooling. That's 11.36W per port, or 11.36W per 1Gb/s of bandwidth (the Cisco in a full configuration provides 48Gb/s per 48-port card.

For McDATA, compare the i10K. This offers up to 256 ports, again in 14U of space and requires 2500W of power. That equates to 9.76W per port, but as the i10K offers full bandwidth of 2Gb/s on those ports, that's 4.88W per 1Gb/s of bandwidth, twice as good as Cisco.

Finally, Brocade. Looking at the Silkworm 48000, this offers up to 256 ports in a 14U chassis all at up to 4Gb/s bandwidth. Power demand is quoted in VA (volt amps) and assuming 1VA=1W (and OK, that's a big assumption but I'm assuming the power factor is 100% here), then the maximum power requirement is 750W for a full chassis, or a remarkable 2.93W per port or 0.73W per 1Gb/s of bandwidth.

Brocade seems to offer a far better environmental specification than the other two manufacturers and that translates to more than just power consumption per port. Power consumption (and more importantly) power dissipation or cooling per rack has a direct impact on how many directors could be placed in a single cabinet and therefore the amount of floorspace required to house the storage equipment. All three vendors could rack up 3 units in a single 42U cabinet but could you power and cool that much equipment? With Brocade probably, with Cisco I doubt it (in any case you probably couldn't cable the Cisco directors - imagine nearly 1600 connections from a single cabinet!). What that means is either a lot of empty space per rack or other equipment in the same rack that doesn't need anywhere near as much power.

So what's my point? Well clearly there's a price to pay for storage connectivity over and above the per port cost. Datacentre real estate is expensive and you want to make best use of it. What I'm saying is that the technology choice may not purely be driven by feature and hardware cost but on the overall TCO of housing and powering storage networking equipment (and obviously the feature set the products offer). Incidentally, I've also done a similar calculation on enterprise storage frames from IBM, EMC and HDS. I'll post those another time.

Tuesday 2 May 2006

Storage Virtualisation

This week I've been thinking a bit more about storage virtualisation. I'm planning to implement virtualisation using the HDS USP product. The decision to use HDS has been based on the need to supply large amounts of lower tiered storage (development data) but retain the functionality to production tiers for data mobility. Using a USP or NSC55 as a "head" device enables the features of Truecopy to be provided on cheaper storage. In this instance the data quantities can justify using a USP as a gateway. A USP isn't cheap and in smaller implementations using the USP in this fashion wouldn't be practical.

So using this as a starting point, what is available in the virtualisation space? OK, start with HDS. The USP and NSC55 (Tagmastore) products both enable external storage (i.e. storage connected to a Tagmastore) to be presented out as if it was internal to the array itself. This means the functionality offered by the array can be retained on cheaper storage, for example the AMS range. The presentable storage is not limited to HDS products and therefore the Tagmastore range is being touted as a system for migration as well as virtualisation. However there are some downsides. LUNs from the underlying storage system are passed straight through the Tagmastore so the size characteristics are retained. This implementation is both good and bad; good because it is possible to take out the Tagmastore and represent the disk directly to a host, bad because you are restricted to the underlying LUN size of the cheaper device, which if it can't present LUNs exactly as you'd like, could mean Truecopy and ShadowImage functionality just won't work. There's also the issue of cache management. The Tagmastore device will accept I/O and confirm it to the host after it is received into cache - the preferred method. It could be possible for the Tagmastore to receive large volumes of write I/O which needs to be destaged to the underlying storage. If this is a lower performance device, then I/O bottlenecks could occur. Finally consider the question that should always be asked; how to I upgrade or get off the virtualisation product? Moving from Tagmastore is reasonably painless as the underlying data can be unpicked and represented to a new host; obviously that may not be practical if the Tagmastore functionality has been used.

If virtualisation is not done in the array itself, then it could be managed by an intermediate device such as the IBM SVC (SAN Volume Controller). The SVC controls all the storage on the underlying arrays and chooses how virtual LUNs are stored. Data is therefore spread over all the available arrays as the SVC sees fit. This approach again, is good and bad. Good; data is well spread giving theoretically even performance. Bad; Where is the data? What happens if the SVC fails? Asking the "how do I get off this product" is a bit more taxing. The only option (unless IBM offer another bespoke solution) is to do host-based migration from the SVC to new storage. This may be hugely impractical if the SVC storage usage is tens or hundreds of terabytes.

Option 3 is to virtualise in the fabric. This is the direction favoured by the switch manufacturers (no surprise there) and companies such as EMC with their InVista product. Fabric virtualisation opens up a whole new way of operating. All of the underlying subsystem functionality (remote replication, PIT copies) are pushed up to the fabric switches and associated device. This creates issues over performance and throughput and also the ability to operate in a multi-fabric environment, the standard model for all Enterprise companies.

From an architects view, all of these options are viable and the $64,000 question to ask is; "what am I looking to achieve through virtualisation?" Once this question is answered and quantified then the solution becomes more apparent. My recommendation for virtualisation is; think what you're looking to achieve, then match the available solution to your requirements. Think more about the future tie-in with the product than the current benefits as the costs in the long run will be removing a virtualisation solution rather than migrating to it.