Tuesday 9 May 2006

Managing Hard Disk Capacities

It has been 50 years since the first hard disk drive was invented by IBM. It had a capacity of a mere 5MB. Contrast that today with the latest 500GB drive from Hitachi. In 50 years, that's an increase of 2000-fold capacity each year!

There's no doubting disk drives have also become more faster and more reliable as time has passed, however the recent increases in capacities have brought with them new challenges.

First of all, two streams of hard drive technology have developed based on the interface type they support; the SCSI/Fibre Channel/Serial Attached SCSI and the Serial ATA/Parallel ATA formats. SCSI-based are generally more reliable than SATA drives due to the components used and build quality, however SCSI-based drives carry a price premium. In addition, SCSI devices usually have faster rotational speed and lower capacities than SATA. Compare the Hitachi Ultrastar 15K147 drive which spins at 15,000rpm with a capacity of 147GB to the Deskstar 7K500 which spins at 7200rpm with a capacity of 500GB. That's half the speed with three times the capacity. Clearly these drive types have very different uses; the Ultrastar is more suited to high performance random workloads while the Deskstar is more of a low-cost low-activity archive type device.

The increase in capacity brings into question the subject of reliability. The Ultrastar drive quotes a reliability of 1 in 10E15 read/writes. This is only 250 complete reads of a single drive and for a heavily used device could be easily achieved in a short time. Deskstar drives are 10 times worse and would fail after only 20 complete reads. Obviously this is not acceptable and since 1978 we've had RAID and since 1988 more advanced versions (including RAID-5) which uses multiple disks to protect data in case of a single disk failure.

RAID works well, but as disk sizes have increased, so has the risk of a second disk failure during the rebuild of a failed disk in a RAID group. Have a look at Dave Hitz's article on the Netapp website. This seems a little simplistic, however it sobering reading and it is clear why double parity (or RAID 6, or 6+2) is a required evolution of standard RAID-5 technology. This provides for a second disk failure without data loss in a RAID group, statistically decreasing the failure risk for a RAID group to almost infinitesimal levels. I don't think Dave's calculation is fair as most enterprise arrays will predictively "spare out" a potentially failing disk before it actually fails. This rebuilds a new disk from suspected failing disk itself, still providing the other disks in the RAID group to use if the disk does actually fail.

Personally if I had the option I would choose 6+2 protection, subject to one proviso; each RAID array group on 300GB disks will be 2.4TB of storage at a time. This is not a small amount!

Hard disk drives will continue to grow relentlessly. Seagate have already announced a 750GB drive using perpendicular recording. A single disk subsystem can hold a huge amount of intellectual capacity. I'll discuss this another time.

No comments: