Wednesday, 14 March 2007

Standards for Measuring Performance

I had a bit of banter following a post I made on ITToolbox earlier in the week. The question was posed as to why major disk array vendors (other than Netapp) don't produce performance statistics. There are benchmarks; the Standard Performance Evaluation Corporation and Storage Performance Council for instance. However SPEC only covers NAS and neither HDS and EMC sign up for SPC.

Producing a consistent testing standard is extremely difficult. Each storage array vendor has designed their hardware to have unique selling points and more importantly, each vendor can't make their systems too close to their rivals otherwise there'd be more money made by lawyers than the vendors themselves.

Let's pick a few examples. In terms of architecture, DMX (EMC) and USP (HDS) have a similar design. They both have front-end channel adaptor cards, central cache and back-end loops on which (probably identical) hard disk drives are connected. However the way in which the physical storage is carved up is totally different.

HDS uses the concept of an array or parity group; a 6+2 RAID group has 8 disks of the same size which can be carved up into logical units (LUNs). The address at which these LUNs are mapped is up to the user, but typically LUNs will be dispersed across multiple array groups to ensure that consecutive LUNs are not mapped to the same physical disks. This process ensures that data is hitting as many spindles as possible.

EMC chooses another method. Each physical drive is divided into hypers, or logical slices. These are then recombined to make LUNs. RAID 1 LUNs have 2 hypers, RAID5 LUNs have 4 hypers. Each hyper is taken from a different back-end drive loop to improve performance and resiliency.

Now, the comments earlier in the week referred to Netapp. Their architecture comes from a NAS file serving design, which uses RAID 4 and has all the operations for RAID calculations handled in memory. LUNs are carved out of RAID "volumes" or aggregates.

So what is a fair configuration to use? Should it be the same number of spindles? How much cache? How many backend loops and how many front-end ports? Each vendor can make their equipment run at peak performance but choosing a "standard" configuration which sets a level playing field for all vendors is near impossible. Not only that, some manufacturers may claim their equipment scales better with more servers. How would that be included and tested for?

Perhaps rather than doing direct comparisons, vendors should submit standard style configurations, based on a GB capacity on preset LUN sizes against which testing is performed using common I/O profiles. This, with list price would let us make up our own minds.

Either way, any degree of testing won't and shouldn't stop healthy discussion!

No comments: