The Storage Architect: July 2007

Wednesday 25 July 2007

Getting more from Device Manager

I complained on a previous post about the lack of features in Device Manager. Consequently I've started writing some software to alleviate this situation. Here's the first of hopefully a series of tools to plug some of the gaps.

HDvM Checker will query a host running Device Manager & HDLM agent and return the version. Just enter a host name and click check. This is an early tool and I've done limited testing. I'd be grateful for any feedback. You can download it here.

*Disclaimer* - if you decide to use this software it is entirely at your own risk.

SATA in Enterprise Arrays

In a previous post on DMX-4 I discussed the use of SATA drives in enterprise arrays. A comment from our Storage Anarchist challenged my view of the resilience of SATA drives compared to FC.

Now unless I've been sleeping under a rock, the storage industry has over the last 5 years pummelled us with the warning that SATA are not enterprise arrays, the technology having derived from PC hardware. They were good for 8/5 rather than 24/7 work and not really suitable for large volumes of random access.

Has that message now changed? Were we fooled all along or is this a change of tack to suit the marketers?

What do you think? Are SATA drives (and I mean an *entire* array) in an enterprise array acceptable now?

Tuesday 24 July 2007

EMC posts higher earnings

EMC posted higher earnings today. Some 21% up on the same quarter last year. It's amazing they've been able to manage double-digit growth for some 16 quarters.

Interestingly (as reported by The Register) the shares ended the day down. However the shares have risen by over 100% over the last 12 months and since March have risen steadily, so investors can't complain. Wish I'd bought some last year!

VTL Poll Results

The VTL poll is finished; results are:

We've had it for years - 41%
We've done a limited implementation - 14%
We're evaluating the technology - 27%
VTL doesn't fit within our strategy - 14%
We see no point in VTL technology - 5%

Obviously this poll is highly un-scientific but it seems most people agree that VTL is worth doing and are either doing it or planning doing it.

A new poll is up relating to green storage.

EMC Power - link?

Thanks to Barry/Mark for their posts/comments on power usage. Call me stupid (and people do) but I can't find the EMC Power Calculator for download on Powerlink as 'zilla suggests (although I did find a single reference to it).

Can you post the link guys? Is it EMC internal only? If so, any chance on sharing it with us? If I can get a copy I'll do more detailed calculations on the components too.

Friday 20 July 2007

DMX-4 Green or not?

After the recent EMC announcements on DMX-4, I promised I would look at the question of whether the new DMX-4 is really as green as it claims to be. I did some research and the results are quite interesting.

Firstly we need to set the boundaries. One of the hardest part of comparing hardware from different manufacturers is that they are intrinsically different (if they were too similar, the lawyers would be involved) and that makes it difficult to come up with a fair comparison. So, I've divided the comparisons into controller and disk array cabinets. Even this is difficult. The DMX has a central controller cabinet which contains only batteries, power supplies, interface boards and so on. The USP however uses half of the central controller for disks. The DMX has 240 drives per cabinet, however the USP has 256 disks per cabinet. This all needs to be taken into consideration when performing calculations.

Second, I want to explain my sources. I've tried to avoid the marketing figures for two reasons; firstly they usually refer to a fully configured system and secondly they don't provide enough detail in order to break down power usage by cabinet and by component. This level of detail is necessary for a more exact comparison. So, for the USP and USP-V, I'm using HDS's own power calculation spreadsheet. This is quite detailed, and allows each component in a configuration to be specified in the power calculation. For EMC, I'm using the DMX-3 Physical Planning Guide. I can't find a DMX-4 Planning Guide yet, however the figures on the EMC website for DMX-4 are almost identical to those for DMX-3 and it's as close as I can get.

DMX-3/4

The DMX figures are quite simple; the controller cabinet (fully loaded) takes 6.4KVA and a disk cabinet 6.1KVA. A fully configured controller cabinet has 24 controller slots, up to 8 global memory directors and 16 back and front-end director (FED) cards. A typical configuration would have eight 8-port FED cards and 8 BED cards connecting to all 4 disk quadrants. EMC quote the disk cabinet figures based on 10K drives. Looking at Seagate's website and standard 10K 300GB FC drives, each requires 18W of power in "normal" operation, so 240 drives requires 4.32KVA. The difference between this figure and the EMC value will cover when drives are being driven harder and the power supplies and other components which need powering within a disk cabinet. We can therefore work on an assumption of 25.4W per drive on average.

Now the figures for the controller cabinet are interesting. Remember EMC have no drives in the controller cabinet so all the power is for controllers, charging batteries and cache. So all that 6.4KVA is dedicated to keeping the box running.

USP

The HDS power calculator spreadsheet is quite detailed. It allows specific details of cache, BEDS, FEDs and a mix of 73/144/300GB array groups. A full USP1100 configuration has 1152 drives, 6 FEDs, 4 BEDs and 256GB of cache. This full configuration draws 38.93KVA (slightly more than the quoted figure on the HDS website. Dropping off 64 array groups (an array cabinet) reduces the power requirement to 31.50 KVA or 7.43KVA for the whole cabinet. This means the controller cabinet draws 9.21KVA and in fact the spreadsheet shows that a full configuration minus disks draws 5.4KVA. The controller cabinet has up to 128 drives in it, which should translate to about 3.7KVA; this is consistent with the 9.21KVA drawn by a full controller cabinet. The 7.43KVA in a cabinet translates to 29W per drive, making the HDS "per drive" cost more expensive.

This is a lot of data, probably not well presented but it shows a number of things;

There's an inescapable power draw per drive which can't be avoided; this equates to about 20W per drive.
The controller frame needs about 6KVA and this varies only slightly depending on the number of controllers and cache.
The HDS controller is slightly more efficient than the EMC.
The HDS disk array is slightly less efficient than the EMC.

Neither vendor can really claim their product to be "green". EMC are playing the green card by using their higher density drives. There's no doubting that this does compute to a better capacity to power ratio, however these green power savings come at a cost; SATA drives are not fibre channel and not designed for 24/7 workloads. Whilst these drives provide increased capacity, they don't provide the same level of performance and DMX systems are priced at a premium so you want to get the best bang for your buck. However, if EMC were to price a SATA-based DMX competitively, then the model is compelling, but surely that would take business away from Clariion. What's more likely to happen is customers choosing to put some SATA drives into an array and therefore see only modest incremental power savings.

So what's the future? Well, 2.5" drives currently offer up to 146GB capacity at 10K and only half the power demands, which also translates into cooling savings. Is anyone using these in building arrays? Hybrid drives with more cache should allow drives to be spun down periodically, also saving power. Either way, these sorts of features shoudn't come at the cost of the levels of performance and availability we see today.

One final note of interest...HDS are quoting figures for the USP-V. These show a 10% saving over the standard USP, despite the performance improvements...

Tuesday 17 July 2007

DMX-4

I've had a quick look over the specifications of the new DMX-4 compared to the DMX-3. There aren't really a lot of changes. The backend director connectivity has been upped to 4Gb/s and presumably that's where the 30% throughput improvement comes from (with some Enginuity code changes too I guess).

There are a number of references to energy efficiency, however the "old" and "new" DMX cooling figures are the same and power figures are almost identical. The improved energy efficiency I think is being touted due to the availability of 750GB SATA drives for DMX (not now but later) but in reality that's not going to be a significant saving unless you're filling your entire array with SATA drives. One statement I want to validate is the following:

"Symmetrix DMX-4 is the most energy efficient enterprise storage array in the world, using up to 70 percent less power than competitive offerings."

There are some security enhancements - but there would have to be in order to justify the RSA purchase....

On the positive side, having the option of SATA drives is a good thing - I'd use them for Timefinder copies or dump areas. I wouldn't fill an array with them though.

Perhaps the most surprising announcement is (in green for extra emphasis):

In addition, EMC plans to introduce thin provisioning capabilities for Symmetrix DMX in the first quarter of 2008, enabling customers to further improve storage utilization and simplify storage allocation while continuing to improve energy efficiency.

Whoa there, I thought from all the recent posts (especially this) that Virtualisation/Thin Provisioning was something to be used with care. It will be interesting to see how EMC blogkets this one...

Performance Part V

Here's the last of the performance measurements for now.

Logical Disk Performance - monitoring of LDEVs. There are three main groups Tuning Manager can monitor; IOPS, throughput (transfer) and response time. The first two are specific to particular environments and the levels for those should be set to local array performance based on historical measurement over a couple of weeks. Normal "acceptable" throughput could be anything from 1-20MB/s or 100-1000 IOPS. It will be necessary to record average responses over time and use these to set preliminary alert figures. What will be more important is response time. I would expect reads and writes to 15K drives in a USP to perform at 5-10ms maximum (on average) and for 10K drives to perform up to 15ms maximum. Obviously synchronous write response will have a dependency on the latency of writing to the remote array and that overhead should be added to the above figures. Write responses will also be skewed by block size and number of IOPS

Reporting every bad LDEV I/O response could generate a serious number of alerts, especially if tens of thousands of IOPS are going through a busy array. It is sensible to set reporting alerts high and reduce them over time until alerts are generated. These can then be investigated (resolved as required) and the thresholds reduced further. LDEV monitoring can also benefit from using Damping. This option on an Alert Definition allows an alert to be generated only if a specific number of occurrences of an alert are received within a number of monitoring intervals. So, for instance, an LDEV alert could be created when 2 alert occurrences are received within 5 intervals. Personally I like the idea of Damping as I've seen plenty of host IOSTAT collections where a single bad I/O (or handful of bad I/Os) are highlighted as a problem when 1000s of good fast IOPS are going through the same host.

This is the last performance post for now. I'm going to do some work looking at the agent commands for Tuning Manager, which as has been pointed out here previously, can provide more granular data and alerting (actually I don't think I should have to run commands on an agent host, I think it should all be part of the server product itself, but that's another story).

Monday 16 July 2007

Bring it on!!

I'm glad to see EMC have upped the ante with their series of announcements today. I've not had time to digest them all but I did read Storagezilla's refreshing post summarising a lot of the announcements. Once I've read things in a bit more detail then I'll comment however, I like the fact EMC have moved the DMX on in performance and eventually capacity. I'm not sure though whether this should be DMX-4 or DMX-3.5 as I don't see a lot of new features. Hopefully that's just me not reading things fully.

I think it's about time for a 3-way product comparison. I just need to find the time...!

Performance Part IV

Next on the performance hitlist is port tracking. This one is slightly more tricky to collect in Tuning Manager as HTnM uses absolute values for port throughput (Port IOPS and Port Transfer) in alerting, rather than relative values like % busy. This is a problem because the figures of both IOPS and throughput (KB/MB/s) will vary wildly depending on the traffic profile.

For instance, a high number of very small blocksize I/O will overwhelm the processor managing the port (resulting an seemingly low throughput) however a large blocksize could max out the storage port in terms of throughput (i.e. pushing a 2Gb/s FC link to its limit). As processor busy isn't available for alerting either, then I'd set a threshold limit on Port Transfer based on the capacity of the fibre channel interface. For example, with a 2Gb/s port, throughput should never be over 50% (if you want 100% redundancy in case of path loss) so I'd set figures of 40% of capacity for warning and 50% for critical in HTnM alerting. That's 40MB/s and 100MB/s respectively. These are high figures and sustained throughput at this level may require some load balancing or attention. If these levels generate no alerting then it may be possible to reduce them to start highlighting peaks.

Choosing a figure for IOPS is more problematic. I'd suggest picking an arbitrary level based on a few week's data from all ports. Set a limit based on historical data that's likely to trigger the occasional altert but will not create critical errors continually. Alerts can then be monitored and if there are alert trends, action can be taken. Hopefully it should be then possible to start to reduce alerting thresholds as more issues are solved.

Although TrueCopy initiators can't be monitored, RCU Targets can, so the same logic can be applied to throughput values.

On a totally unrelated subject, I have a recommendation to never de-seed chillis without gloves. I've just removed the seeds from my first crop of the year (including the wonderful looking purple tiger) and my entire face is glowing where I've wiped it. I have no idea how I'm now going to get my contact lenses out...

Tuesday 10 July 2007

A couple of interesting comments

A couple of comments have been posted which look quite interesting so I think they're worthy of being re-mentioned in case you missed them.

First of all, at www.hds.com/webtech is a list of up coming webinars. The list looks a little "lite" however the first should be of interest (getting better alerting from Tuning Manager) bearing in mind my recent postings. Thanks to Rick for that one.

Second, a new site at www.lunmonkey.com. I haven't tried it, but it looks to be a site which will help autogenerate symconfigure commands based on a map of your storage configuration. I'd be interested to see if anyone's tried this.

While I'm on that topic, I think we need more Open Source type storage products. Clearly the vendors aren't going to produce products which provide real cross-system support (for obvious reasons). What we need is a Linux for storage. Despite all the talk, I don't think anyone is really doing it yet.

Monday 9 July 2007

Performance Part III

Next under discussion for performance is array groups.

First the background (and as usual, apologies to those who already know all this). HDS enterprise arrays lay their disks out in array groups, either RAID-1/0, RAID-5 and RAID-6. Variable size LUNs are then carved out of the array groups for presentation to hosts. Obviously it makes sense to ensure that every host has their LUNs selected from as many array groups as possible. This means that large volumes of read and write data can be serviced quickly and on writes, the data written into cache can be destaged to disk quickly.

Pick any hour in the production day and use something like Tuning Manager and you'll likely see (especially on unbalanced systems) the standard exponential curve for IOPS or MB/s written. The 80/20 rule applies; around 20% of the array groups will be doing 80% of the I/O. Ideally it would be best to have workload balanced evenly across all array groups, however the effort of rebalancing the data probably doesn't justify the returns, unless of course you have some very busy array groups. Personally I'd look to ensure no single array group exceeds 50% active and I'd want 25-50% read hits (remember that you need to make sure you have some spare capacity to cater for disk sparing). Any array groups exceeding these metrics are candidates for data movement. I've recently used Cruise Control and I found it a disappointment - EMC's Optimiser swaps two LUNs using a third temporary LUN to manage the exchange. Cruise Control expects you to provide a free LUN as a target of migration. This may be difficult if a very quiet array group has been fully allocated. Therefore I tend to recommend manual exchanges, especially if hosts have a Logical Volume Manager product installed.

Balancing workload will mean checking array groups and moving any "hot" LUNs away from each other. This is where knowing your data becomes important and if possible knowing how the data is mapped on the host to make sure LUNs aren't just busy with transient data (for example, multiple LUNs that comprise a single concatentate volume on a host may be busy over time as more data is allocated to the volume). It is also equally possible that a single host may be able to overload one array group, so in that instance, moving the LUN will provide no benefit and the data layout on the host will need to be addressed.

It's possible to spend hours looking at array group balancing. The key is to make sure the effort is worth the result.

Wednesday 4 July 2007

Performance - Part II

Next on the performance list - Sidefile. Sidefile is only relevant if you are using asynchronous replication. Cache is used to store write I/O requests (which have been committed locally) until they have been confirmed by a remote array in a TrueCopy pair. Both the local and remote arrays use sidefile cache to store replication recordsets which must be processed in seqence order in order to maintain consistency. The benefit of having Sidefile cache is that it minimises the effect of replication latency on TrueCopy write I/O to a local array. Sidefile usage rises and falls as write activity rises and falls, however if (for instance) replication is being managed across a shared IP network then other IP traffic could increase latency and affect the amount of sidefile cache used.

HDS recommends not letting Sidefile cache rise to higher than 10%, however there are a number of parameters which can be set to control sidefile usage. Probably most serious is Pending Update Data Rate (defaulting to 50%), which if breached causes primary array I/O delay and eventually TrueCopy pair suspension. There are also two other parameters, I/O Delay Start and I/O Delay Increase. Breaching these thresholds causes I/O delay, however it isn't clear how much of an impact this has on a host.

Now, I don't know where the 10% threshold comes from when sidefile controls by default start at 30%. Doing a simple calculation on an array with 20 active ports doing 5MB/s write IOPS each on average produces 100MB/s. With 48GB of cache, it takes less than 1 minute to reach 10% of sidefile cache, easily possible if replication goes across a congested network. I imagine that HDS are recommending that sidefile problems should be alerted as early as possible.

Monday 2 July 2007

Hardware Replacement Lifecycle Update

Marc makes a good comment here about the use of virtualisation within the replacement lifecycle and indeed products such as USP, SVC and Invista can help in this regard. However at some stage even the virtualisation tools need replacing and the problem remains, although in a different place. I was hoping as part of the USP-V announcement HDS would indicate how they intend to help customers migrate from an existing USP which is virtualising storage, but alas it didn't happen.

Perhaps we need virtual WWNs which work in a similar way to DNS, where a small, easily replaceable and duplicatable device holds an alias of logical to physical WWNs. This appliance would be vendor independent and multiple devices would exist in the fabric as do today's DNS servers in IP. All allocations would be made once and for all via virtual WWN (vWWN) and if the storage is then virtualised that would be up to the vendor.

Of course, the difficulty with this model is how to determine which DNS entry points to the current valid copy of data... Oh well, it was only a 30 second idea....

HiCommand CLIEX

I've been using the HiCommand CLIEX (the extended CLI) today. It differs from the normal CLI in that it talks directly to the array via a command device and bypasses both Storage Navigator and Device Manager. HDS have needed this way of communicating with an array for a long time.

On the one hand, CLIEX is a good thing; the functions are more akin to those offered by EMC's Solutions Enabler in that I can perform direct configuration on the array without an intermediate product. Some of the things I've tried include extracting the actual configuration of an array (which can be obtained in XML format) and creating Host Storage Domains and assigning LDEVs. Although the commands aren't lightning quick (no pun intended), they are certainly quicker than the corresponding commands through Device Manager or Storage Navigator and obviously remove the need to install Device Manager in its entirety. As I'm running tests against a new array, I want to create lots of HSDs (one per FEP) and assign a separate LDEV to each. CLIEX saved me a lot of time (especially as I haven't got a Device Manager up yet).

On the negative side, the commands are powerful and "delete" functions give no warning or final check; so at your peril use the delete functions! I checked removing an LDEV while running (a lot) of I/O to a LUN and the command was bounced, however when the I/O was reduced (but the LUN still technically in use), I was able to pull the LDEV away with the obvious consequences. Also, these commands talk directly to the array and appear to bypass the configuration on the SVP. If you make a CLIEX change followed up by a Storage Navigator change without first having refreshed SN, then you simply overwrite the CLIEX changes (I know, I tried it).

One little bugette I found... The CLIEX command hdvmcliex is actually a batch file, so if you want to call the command from within a batch script you need to prefix with the "CALL" command.

I can see the CLIEX interface could be extremely useful. I intend to use it to mass configure a USP from scratch. In addition, I can also see how it could be useful to create dynamic failover scripts (more on this in a future post) without the need for Device Manager to be running. However HDS need to beef up the security around the product to prevent inadvertent allocation and deallocation gotchas. They also need to consider moving the locking mechanism (which allows a reserve to be taken on the SVP and prevent SN/CLIEX/Device Manager clashing configuration changes) to being a specific CLIEX command rather than relying on the Device Manager locking function.

One final thought... I'm not sure of any additional security to prevent any user with a command device from installing and running CLIEX and trashing an entire array. Unless you know otherwise?

The Storage Architect

Wednesday 25 July 2007

Getting more from Device Manager

SATA in Enterprise Arrays

Tuesday 24 July 2007

EMC posts higher earnings

VTL Poll Results

EMC Power - link?

Friday 20 July 2007

DMX-4 Green or not?

Tuesday 17 July 2007

DMX-4

Performance Part V

Monday 16 July 2007

Bring it on!!

Performance Part IV

Tuesday 10 July 2007

A couple of interesting comments

Monday 9 July 2007

Performance Part III

Wednesday 4 July 2007

Performance - Part II

Monday 2 July 2007

Hardware Replacement Lifecycle Update

HiCommand CLIEX

My Personal Profile

My Company

What Am I Doing?

Blog Archive

FEEDJIT Live Page Popularity

FEEDJIT Live Traffic Map

FEEDJIT Live Traffic Feed

Wednesday 25 July 2007

Tuesday 24 July 2007

Friday 20 July 2007

Tuesday 17 July 2007

Monday 16 July 2007

Tuesday 10 July 2007

Monday 9 July 2007

Wednesday 4 July 2007

Monday 2 July 2007

My Personal Profile

My Company

Subscribe To

What Am I Doing?

Blog Archive

FEEDJIT Live Page Popularity

FEEDJIT Live Traffic Map

FEEDJIT Live Traffic Feed