Monday, 9 July 2007

Performance Part III

Next under discussion for performance is array groups.

First the background (and as usual, apologies to those who already know all this). HDS enterprise arrays lay their disks out in array groups, either RAID-1/0, RAID-5 and RAID-6. Variable size LUNs are then carved out of the array groups for presentation to hosts. Obviously it makes sense to ensure that every host has their LUNs selected from as many array groups as possible. This means that large volumes of read and write data can be serviced quickly and on writes, the data written into cache can be destaged to disk quickly.

Pick any hour in the production day and use something like Tuning Manager and you'll likely see (especially on unbalanced systems) the standard exponential curve for IOPS or MB/s written. The 80/20 rule applies; around 20% of the array groups will be doing 80% of the I/O. Ideally it would be best to have workload balanced evenly across all array groups, however the effort of rebalancing the data probably doesn't justify the returns, unless of course you have some very busy array groups. Personally I'd look to ensure no single array group exceeds 50% active and I'd want 25-50% read hits (remember that you need to make sure you have some spare capacity to cater for disk sparing). Any array groups exceeding these metrics are candidates for data movement. I've recently used Cruise Control and I found it a disappointment - EMC's Optimiser swaps two LUNs using a third temporary LUN to manage the exchange. Cruise Control expects you to provide a free LUN as a target of migration. This may be difficult if a very quiet array group has been fully allocated. Therefore I tend to recommend manual exchanges, especially if hosts have a Logical Volume Manager product installed.

Balancing workload will mean checking array groups and moving any "hot" LUNs away from each other. This is where knowing your data becomes important and if possible knowing how the data is mapped on the host to make sure LUNs aren't just busy with transient data (for example, multiple LUNs that comprise a single concatentate volume on a host may be busy over time as more data is allocated to the volume). It is also equally possible that a single host may be able to overload one array group, so in that instance, moving the LUN will provide no benefit and the data layout on the host will need to be addressed.

It's possible to spend hours looking at array group balancing. The key is to make sure the effort is worth the result.

1 comment:

Alex said...

If you want something to help find those "hot" or "cold" array groups and setup a migration path automatically - HDS's Tiered Storage Manager is your answer. It allows to use some of the performance metrics like ArrayGrpBusy% to be used in the selection of LDEVs for swapping.