The Storage Architect: 2006

Thursday, 21 December 2006

Understanding Statistics

I've been reading a few IDC press releases today. The most interesting (if any ever are) was that relating to Q3 2006 revenue figures for the top vendors. It goes like this:

Top 5 Vendors, Worldwide External Disk Storage Systems Factory Revenue 3Q2006 (millions)

EMC: $927 (21.4%)
HP: $760 (17.6%)
IBM: $591 (13.7%)
Dell: $347 (8.0%)
Hitachi: $340 (7.9%)

So EMC comes out on top, followed by the other usual suspects. EMC gained market share from all other vendors including "The Others" (makes me think of Lost - who are those "others"?). However, IDC also quote the following:

Top 5 Vendors, Worldwide Total Disk Storage Systems Factory Revenue 3Q2006 (millions)

HP: $1406 (22.7%)
IBM: $1250 (20.2%)
EMC: $927 (15.0%)
Dell: $507 (8.2%)
Hitachi: $348 (5.6%)

So what does this mean? IDC defines a Disk Storage System as at least 3 disk drives and the associated cables etc, to connect them to a server. This could mean 3 disks with a RAID controller in a server. Clearly EMC don't ship anything other than external disks as their figures are the same in each list. HP make only 50% of their disk revenue from external systems, the rest presumably are disks shipped with servers, IBM even less as a percentage, Dell about $160m. The intruiging one was Hitachi - what Disk Storage Systems to they sell (other than external) which made $8m of revenue? The source of my data can be found here: http://www.idc.com/getdoc.jsp?containerId=prUS20457106

What does this tell me? It says there's a hell of a lot of DAS/JBOD stuff still being shipped out there - about 30% of total revenue.

Now, if EMC were to buy Dell or the other way around, between them they could (just) pip HP to the post. Are EMC and Dell merging? I don't know, but I don't mind starting a rumour...

Oh, another intesting IDC article I found referred to how big storage virtualisation is about to come. I've been saying the word "virtualisation" for about 12 months in meetings just to get people to even listen to the concept, even when the meeting has nothing to do with the subject. I'm starting to feel vindicated.

Wednesday, 20 December 2006

Modular Storage Products

I’ve read a lot of posts recently on various storage related websites asking for comparisons of modular storage products. By that I’m referring to “dual controller architecture” products such as the HDS AMS, EMC Clariion and HP EVA. The questions come up time and time again, usually comparing IBM to EMC or HP and little comparison to HDS, but lots of people recommending HDS.

So, to be more objective, I’ve started compiling a features comparison of the various models from HDS, IBM, EMC and HP. Before anyone starts, I know there are other vendors out there – 3PAR, Pillar and others come to mind. At some stage, I’ll drag in some comparisons to them too, but to begin with this is simply the “big boys”. The spreadsheet attached is my first attempt. It has a few gaps where I couldn’t determine the comparable data, mainly on whether iSCSI or NAS is a supported option and the obvious problem of performance throughput.

So, from a simple physical perspective, these arrays are pretty simple to compare. EMC and HDS give the highest disk capacity options, EMC, HDS and IBM offer the same maximum levels of cache. Only HDS offers RAID6 (at the moment), most vendors offer a range of disk drives and speeds. Most products offer 4Gb/s front-end connections and there are various options for 2/4Gb/s speeds at the back end.

Choosing a vendor on physical specifications alone is simple using the spreadsheet. However there are plenty of other factors not included here. First, there’s performance. Only IBM (from what I can find) offers their arrays to scrutiny by the Storage Performance Council. Without a consistent testing method, any other figures offered by vendors are completely subjective.

Next, there’s the thorny subject of feature sets. All vendors offer variable LUN sizes, some kind of failover (I think most are active/passive), multiple O/S support, replication and remote copy functionality and so on. Comparing these isn’t simple, though as the implementation of what should be common features can vary widely.

Lastly there’s reliability and the bugs and gotchas that all products have and which the manufacturers don’t document. I’ll pick an example or two; do FC front-end ports share a multiprocessor? If so, what impact does load on one port have on the other shared port? What downtime is required to do maintenance, such as code upgrades? What is the level of SNMP or other management/alerting software?

The last set of issues would prove more difficult to track so I’m working on a consistent set of requirements from a product. In the meantime, I hope the spreadsheet is useful and if anyone can fill the gaps or wants to suggest other comparable mid-range/modular products, let me know.

You can download the spreadsheet here: http://www.storagewiki.com/attachments/modular%20products.xls

Tuesday, 19 December 2006

New RAID

Lots of people are talking about how we need a new way to protect our data and that RAID has had it. Agreed, going RAID6 gives some benefits (i.e. puts off the inevitable failure by a factor again), however the single problem to my mind with RAID today is the need to read all the other disks when a real failure occurs. Dave over at Netapp once calculated the risk of re-reading all those disks in terms of the chance of a hard failure.

The problem is, the drive is not involved in the rebuild process - it dumbly responds to the request from the controller to re-read all the data. What we need are more intelligent drives combined with more intelligent controllers; for example; why not have multiple interfaces to a single HDD? Use a hybrid drive with more onboard memory to cache reads while the heads are moving to obtain real data requests. Store that data in memory on the drive to be used for drive rebuilds. Secondly, why do we need to store all the data for all rebuilds across all drives? Why with a disk array of 16 drives can't we run multiple instances of 6+2 RAID across different sections of the drive?

I'd love to be the person who patents the next version of RAID....

Tuesday, 5 December 2006

How low can they go!

I love this picture. Toshiba announced today that they are producing a new 1.8" disk drive using perpendicular recording techniques. This drive has a capacity of 100GB!

It will be used on portable devices such as music players; its only 54 x 71 x 8 mm in size, weighs 59g and can transfer data at 100MB/s using an ATA interface.

I thought I'd compare this to some technology I used to use many years ago - the 3380 disk drive.

The model shown on the right is the 3380 CJ2 with a massive 1.26GB per unit and an access time similar to the Toshiba device. However the transfer rate was only about 3MB/s.

I couldn't find any dimensions for the 3380, but from the picture of the lovely lady, I'd estimate it is 1700 x 850 x 500mm which means 23,500 of the Tosh drives could fit in the same space!

Where will we be in the next 20 years? I suspect we'll see more hybrid drives, with NAND memory used to increase HDD cache then more pure NAND drives (there are already some 32GB drives announced). Exciting times...

Monday, 4 December 2006

Is the Revolution Over?

I noticed over the weekend that http://www.storagerevolution.com's website was down. Well it's back up today, but the forums have disappeared. I wonder, is the revolution over for JWT and pals?

Thursday, 30 November 2006

A rival to iSCSI?

I've said before, somethings just pass you by, and so it has been with me and Coraid. This company uses ATA-over-Ethernet, a lightweight IP storage protocol to connect hosts to their storage products.

It took me a while before I realised how good this protocol could be. Currently iSCSI uses TCP/IP, which encapsulates SCSI commands within TCP packets. This creates an overhead but does allow the protocol to be routed over any distance. AoE doesn't use TCP, it uses its own protocol so doesn't suffer the TCP overhead, but can't be routed. This isn't that much of an issue as there are plenty of storage networks which are locally deployed.

Ethernet hardware is cheap. I found a 24 port GigE switch for less than £1000 (£40 a port). NICs can be had for less than £15. Ethernet switches can be stacked providing plenty of redundancy options. This kills fibre channel on cost. iSCSI can use the same hardware; AoE is more efficient.

AoE is already being bundled with Linux, drivers are available for other platforms. Does this mean the end of iSCSI? I don't think so; mainly because AoE is so different and also more importantly it isn't routable. However what AoE does offer is another alternative which can use standard cheap, readily available products. From what I've read, AoE is simple to configure too; certainly with less effort than iSCSI. I hope it has a chance.

Wednesday, 22 November 2006

Brocade on the up

Brocade shares were up 10.79% today. I know they posted good results but this is a big leap for one day. I wish I hadn't sold my shares now! To be fair, I bought at $4 and sold at $8 so I did OK - and it was some time ago.

So does this bode well for the McDATA merge? I hope so. I've been working on Cisco and the use of VSANs and I'm struggling to see what real benefit I can get out of them. Example: I could use VSANs to segregate by: Line of Business; Storage Tier; Host Type (Prod/UAT/DEV). Doing this immediately gives me 10's of combinations. Now the issue is I can only assign a single storage port to one VSAN - so I have to decide; do I want to segment my resources to the extent I can't use an almost unallocated UAT storage port when I'm desperate for connectivity for production? At the moment I see VSANs as likely to create more fragmentation than anything else. There's still more thinking to do.

I'd like to hear from anyone who has practical standards for VSANs. It would be good to see what best practice is out there.

Tuesday, 14 November 2006

Backup Trawling

I had a thought. As we back up more and more data (which we must be, as the amount of storage being deployed increases at 50-100% per year, depending on who you are) then, there must be more of an issue finding the data which needs to be restored. Are we going to need better tools to trawl the backup index to find what we want?

Virtually There

VMware looks like it is going to be a challenge; I'm starting to migrate standard storage tools into a VMware infrastructure. First of all, there's a standard VMware build for Windows (2003). Where should the swap file go? How should we configure the VMFS that it sits on? What about the performance issues in aligning the Windows view of the (EMC) storage with the physical device so we get best performance? What about Gatekeeper support? and so on.

It certainly will be a challenge. First thing that I need to solve, is if I have a LUN presented as a raw device to my VM guest and that LUN ID is over 128, although the ESX server can see it, when ESX exports it directly to the Windows guest, it can't be seen. I've a theory that it could be the generic device driver on Windows 2003 that VMware uses. I can't prove it (yet) and as yet no-one on the VMware forum has answered the question. Either they don't like me (lots of other questions posted after mine have been answered) or people don't know....

Monday, 13 November 2006

Continuous Protection

I've been looking at Continuous Data Protection products today, specifically those which don't need the deployment of an agent and can use Cisco's SANTap technology. EMC have released RecoverPoint (a rebadged product from Kashya) and Unisys have SafeGuard.

SANTap works by creating a virtual initiator, duplicating all of the writes to a standard LUN (target) within a Cisco fabric. This duplicate data stream is directed at the CDP appliance which stores and forwards it to another appliance located at a remote site where it is applied to a copy of the original data.

As the appliances track changed blocks and apply them in order, they allow recovery to any point in time. This potentially is a great feature; imagine being able to back out a single I/O operation or to replay I/O operations onto a vanilla piece of data.

Whilst this CDP technology is good, it introduces a number of obvious questions on performance, availability etc, but for me the question is more about where this technology is being pitched. EMC already has replication in the array in the form of SRDF which comes in many flavours and forms. Will RecoverPoint complement these or will, in time, RecoverPoint be integrated into SRDF as another feature?

Who can say, but at this stage I see another layer of complication and decision on where to place recovery features. Perhaps CDP will simply offer another tool in the amoury of the Storage Manager.

Wednesday, 18 October 2006

Cisco, Microwaves and Virtualisation

So I've not posted in October so far and the month is half over. To my defence I spent a week in South Africa doing an equipment (Cisco) installation. No it wasn't a holiday but I did see the fantastic table mountain, which is spectacular and well worth a visit.

Part of the installation involved deployment of SRDF between two symmetrix arrays using Cisco 9216i switches. These have two IP ports and can be used for iSCSI or in this case FCIP as we had to provide the SRDF links across IP. Whilst this all sounds fine, the IP link was in fact over a microwave connection rather than a fixed line installation. Telecoms are prohibitively expensive in SA and sometimes microwave is the only option due to the high cost of fixed line installations.

Installation and configuration of the links was simple, including enabling SRDF. However my big concern is performance (I should note that this is no way a synchronous implementation, but an async one). The IP link is not dedicated to storage and being used for other purposes so fcping responses from the Cisco switches were very variable. At this point we haven't fully enabled SRDF but the next stage is to test performance of the link. As I see it, we need to monitor SRDF stats with ECC Performance Monitor, SRDF/A response times and the lag time of unwritten I/Os not committed to the remote site. The solution needs significant monitoring to ensure all links are active; with SRDF it is possible to monitor the status of a remote SRDF link using the "symrdf ping" command; this needs to be being issued reguarly if not every 5 minutes or less. More updates as I have them.

I spoke to an analyst today (it's Storage Expo in the UK) on the subject of virtualisation and in particular HDS' Universal Volume Manager. He was asking my view on where virtualisation is headed. I still think the switch is the right place, long term, for virtualisating the storage infrastructure. In the short term though, discrete hardware array virtualisation is a good thing. UVM can provide cost savings (on significant virutalised volumes of data) and additional functionality, however HDS will have to up their game to retain the virtualisation crown as time progresses. Specifically they need to address the issue of a failure in the USP when so much storage could be dependent on one subsystem. A clustered solution could be the answer here. Time will tell.

Thursday, 28 September 2006

To Copy or not to Copy?

Sony have just announced the availability of their next generation of AIT; version 5. More on AIT here http://www.storagewiki.com/ow.asp?Advanced%5FIntelligent%5FTape. Speeds and feeds; 400GB native capacity, 1TB compressed and 24MB/s write speed. The write speed seems a little slow for me but the thing that scares me more are the capacity figures. 1TB - think about it 1TB! That's a shedload of data. Whilst that's great for density in the datacentre, it isn't good if you have a tape error. Imagine a backup failing after writing 95% of the data of a 700GB backup - or more of an issue, finding multiple read errors on a TB of data on a single cartridge.

No-one in their right mind would put 1TB of data onto a disk and hope the disk would never fail. So why do we do it with tape? Well probably because tape was traditionally used as a method of simply recovering data to a point in time. If one backup wasn't usable, you went back to the previous one. However, the world is a different place today. Increased regulation means backups are being used to provide data archiving facilities in the absence of proper application based archival. This means that every backup is essential as it indicates the state of data at a point in time. Data on tape is therefore so much more valuable than it used to be.

So, I would always create duplicate backups of those (probably production) applications which are most valuable and can justify the additional expense. That means talking to your customers and establishing the value of backup data.

Incidentally, you should be looking at your backup software. It should allow restarting backups after hardware failure. It should also allow you to easily recover the data in a backup from a partially readable tape. I mean *easily*, not oh, it can be done, but its hassle and you have to do a lot of work. Alternatively, look at D2D2T (http://www.storagewiki.com/ow.asp?Disk%5FTo%5FDisk%5FTo%5FTape).....

Monday, 25 September 2006

It's not easy being green

OK, not a reference to a song by Kermit the Frog, but a comment relating to an article I read on The Register recently (http://www.theregister.co.uk/2006/09/22/cisco_goes_green/). Cisco are attempting to cut carbon emissions by cutting back on travel, etc. Whilst this is laudible, Cisco would get more green kudos if they simply made their equipment more efficient. A 10% saving in power/cooling for all the Cisco switches in the world would make the reduction on corporate travel look like a drop in the ocean.

Sunday, 24 September 2006

Standards for Shelving

Just found this the other day: http://www.sbbwg.org/home/ - a group of vendors working to get a common standard for disk shelves. Will we see a Clariion shelf attached to a Netapp filer head?

Common Agent Standards

The deployment of multiple tools into a large storage environment does present problems. For example, EMCs ECC product claims to support HDS hardware and it does. However it didn't support the NSC product correctly until version 5.2 SP4. Keeping the agents up to date for each management product to get it to support all hardware is a nightmare. I haven't even discussed host issues. Simply having to deploy multiple agents to lots of hosts presents a series of problems; will they be compatible, how much resource will they all demand from the server, how often will they need upgrading, what level of access will be required?

Now the answer would be to have a common set of agents which all products could use. I thought that's what CIM/SMI was supposed to provide us, but at least 4 years after I read articles from the industry saying the next version of their products would be CIM compatible I still don't see it. For instance, looking at the ECC example I mentioned above, ECC has Symm, SDM, HDS and NAS agents to manage each of the different components. Why can't a single agent collect for each and any subsystem?

Hopefully, someone will correct me and point out how wrong I am, however in the meantime, I'd like to say what I want:

A single set of agents for every product. This would also be a single set of hosts agents. None of this proxy agent nonsense where another agent has to sit in the way and manage systems on behalf of the system itself.
A consistent upgrade and support model. All agents should work with all software, however if the wrong version of an agent is installed, then it simply reports back on the data available.
The ability to upgrade any agent to introduce new features without direct dependence on software upgrades.

Thursday, 21 September 2006

Pay Attention 007...

Sometimes things you should spot just pass you by when you're not paying attention. So it is for me with N-Port ID Virtualisation. Taking a step back; mainframes had virtualisation in the early 90s. EMIF (ESCON Multi-Image Facility) allowed a single physical connection to be virtualised and shared across multiple LPARs (domains). When I was working on Sun multi-domain machines a few years ago, I was disappointed to see I/O boards couldn't share their devices between domains, so each domain needed dedicated physical HBAs. More recently I've been looking at having large servers with lots of storage using a data and tape connection - hopefully through the same physical connection, but other than using dual port HBAs, it wasn't easily possible without compromising quality of service. Dual port HBAs don't really solve the problems because I still have to pay for extra ports on the SAN.

Now Emulex have announced their support of N-Port ID Virtualisation (NPIV). See the detail here http://www.emulex.com/press/2006/0918-01.html. So what does it mean? For the uninitiated, when a fibre channel device logs into a SAN, it registers its physical address (WWN, World Wide Name) and requests a node ID. This ID is used to zone HBAs to target storage devices. NPIV allows a single HBA to request more than one node ID. This means if the server supports virtual domains (like VMware, XEN, or MS Virtual Server) then each domain can have a unique WWN and be zoned (protected separately). Also this potentially solves my disk/tape issue allowing me to have multiple data types through the same physical interface. Tie this with virtual SANs (like Cisco VSANs) and I can put quality of service onto each traffic type at the VSAN level. Voila!

I can't wait to see the first implementation; I really hope it does what it says on the tin.

Tuesday, 19 September 2006

RSS Update

I've always gone on and on about RSS and XML and how good technologies they are; XML as a data exchange technology and RSS to provide information feeds in a consistent format. My ideal world is to have all vendor information published via RSS. By that I mean not just the nice press releases and how they've sold their 50 millionth hard drive this week to help orphanages, but useful stuff like product news, security advisories and patch information.

Finally vendors are starting to realise this is useful. Cisco so far seems the best although IBM look good as do HP. McDATA and Brocade are nowhere, providing their information in POHF (Plain Old HTML Format). Hopefully all vendors of note will catch up, meanwhile I've started a list on my Wiki (http://www.storagewiki.com/ow.asp?Vendor%5FNews%5FFeeds), feel free to let me know if I've missed anyone.

Friday, 15 September 2006

A Mini/McDATA Adventure?

Apparently one in 6 cars sold by BMW is a mini. It is amazing to see how they have taken a classic but fading brand and completely re-invented it into a cool and desirable product. McDATA have just released an upgraded version of their i10K product, branded the i10K Xtreme. http://www.mcdata.com/about/news/releases/2006/0913.html This finally brings 4Gb/s speed and VSAN/LPAR technology allowing resources in a single chassis to be segmented for improved managability and security. I understand that the Brocade acquisition of McDATA is proceeding at a pace. It will be interesting to see how McDATAs new owners will treat their own McDATA "mini" going forward.

Wednesday, 30 August 2006

More on solid state disks

I'm working on a DMX-3 installation at the moment. For those who aren't aware, EMC moved from the multi-cabinet Symmetrix-5 hardware (8830/8730 type stuff) to the DMX which was fixed size - DMX1000, 2000 and 3000 models of 1, 2 and 3 cabinet installations respectively. I never really understood this move; yes it may have made life easier for EMC Engineering and CEs as the equipment shipped with its final footprint configuration, only cache/cards and disk could be added; but you had to think and commit upfront to the size of the array you wanted, which might not be financially attractive. Perhaps the idea was to create a "storage network" where every bit of data could be moved around with the minimum of effort; unfortunately that never happened and is unlikely to in the short term.

Anyway, I digress; back to DMX-3. So, doing the binfile (configuration) for the new arrays, I notice we lose a small amount of data on each of the first set of disks installed. This is for vaulting. Track back to previous models; if power was lost or likely to be lost, an array would destage all uncommitted tracks to disk after blocking further I/O on FAs. Unfortunately in a maximum configuration of 10 cabinets of 240 disks, it simply wouldn't be possible to provide battery backup for all the hard drives to destage the data. A quick calculation shows a standard HDD consumes 12.5 watts of power (300GB model), so that's 3000W per cabinet or 30,000W for a full configuration. Imagine the batteries needed for this, just on that rare off chance that power is lost. Vaulting simplifies the battery requirements by creating a save area on disk to which the pending tracks are written. When the array powers back up, the contents of cache, the vault and disk are compared to return the array back to the pre-power loss position.

This is a much better solution than simply shoving more batteries in.

So moving on from this, I thought more about exactly what 12.5W per drive means. Imagine a full configuration of 10 cabinets of 240 drives (an unlikely prospect, I know) which requires 30,000W of power. In fact the power requirements are almost double that. This is a significant amount of energy and cooling.

Going back to solid state disks I mentioned some time ago, I'd expect that the power usage would come down considerably depending on the percentage of writes that go to NAND cache. NAND devices are already up to 16GB (so could be 5% of a current 300GB drive) and growing. If power savings of 50% can be achieved, then this could have a dramatic effect on datacentre design. Come on Samsung, give us some more details of your flashon drives and we can see what hybrid drives can do......

Monday, 28 August 2006

EMC and storage growth

I had the pleasure of a tour around the EMC manufacturing plant in Cork last week. Other than the obvious interest in attention to quality I experienced, the thing that struck me most was the sheer volume of equipment being shipped out the door. Now I know the plant ships to most of the world bar North America, however seeing all those DMX and Clariion units waiting to be shipped out was amazing.

It was assuring to see the volume of storage being shipped all over the world - especially from my position as a Storage Consultant. However it brings home more than ever the challenges going forward of managing ever increasing volumes of storage. I also spent some time to see the latest versions of EMC products such as ECC, SAN Advisor and SMARTS.

ECC has moved on. It used to be a monolithic ineffectual product in the original releases at version 5, however the product seems wholly more usable. I went back after the visit and looked at an ECC 5.2 SP4 installation. There were features I should have been using; I will be using ECC more.

SAN Advisor looks potentially good - it matches your installation to the EMC Support Matrix (which used to be a PDF and now is a tool) and highlights any issues of non-conformance. In a large environment SAN Advisor would be extremely useful, however not all environments are EMC only and multi-vendor support for the tool will be essential. Secondly, the interface seems a bit clunky, work needed there. Lastly, I'd want to add my own rules and want to make them complex - so for instance where I was migrating to new storage, I'd want to validate my existing environment against it and to highlight devices not conforming to my own internal support matrix.

SMARTS uses clever technology to perform root cause analysis of faults in an IP or Storage Network. The IP functionality looked good however at this stage I could see limited appeal on the SAN side.

All in all, food for thought and a great trip!

Monday, 14 August 2006

Convergence of IP and Fibre Channel

I'm working on Cisco fibre channel equipment at the moment. As part of a requirement to provide data mobility (data replication over distance) and tape backup over distance, I've deployed a Cisco fabric between a number of sites. Previously I've worked with McDATA equipment and Brocade (although admittedly not the latest Brocade kit) and there was always a clear distinction between IP technologies and FC.

Working with Cisco, the boundaries are blurred. There are a lot of integrated features which start to blur the distinctions between the two technologies. This I guess should be no surprise with the history of Cisco as the kings of IP networking. What is interesting is to see how this expertise is being applied to fibre channel. OK, so some things are certainly non-standard. For example, the Cisco implementation of VSANs is certainly not a ratified protocol. Connect a Cisco switch to another vendor's technology and VSANs will not work. However VSANs are useful for segmenting both resources and traffic.

As things move forward following the Brocade/McDATA takeover (is it BrocDATA or McCade?) the FC world is set to get more interesting. McDATA products were firmly rooted in the FC world (the implementation of IP was restricted to a separate box and not integrated into directors) - Brocade seem a bit more open to embracing integrated infrastructure. Keep watching, it's going to be fun....

Tuesday, 8 August 2006

Tuning Manager continued

OK

I've got Tuning Manager up and running now (this is version 5). I've configured the software to pick up data from 5 arrays, which are a mixture of USP, NSC and 9980. The data is being collected via two servers - each running the RAID agent.

After 5 days of collecting, here are my initial thoughts;

The interface looks good; a drastic improvement on the previous version.
The interface is quick; the graphs are good and the operation is intuitive.
The presented data is logically arranged and easy to follow.

However there are some negatives;

Installation is still a major hassle; you've got to be 100% sure the previous version is totally uninstalled or it doesn't work.
RAID agent configuration (or rather I should call it creation) is cumbersome and has to be done at the command line; the installation doesn't install the tools directory onto your command line path, so you have to trawl to the directory (or install something like "cmdline here").
The database limit is way too small; 16,000 objects simply isn't enough (an object is a LUN, path etc). I don't want to install lots of instances, especially when the RAID agent isn't supported under VMware.

Overall, so far this version is a huge improvement and makes the product totally usable. I've managed to set the graph ranges to show data over a number of days and I'm now spending more time digging into the detail; Tuning Manager aggregates data up to an hourly view. The next step is to use Performance Reporter to look at some realtime detail....

Brocade and McDATA; a marriage made in heaven?

So as everyone is probably aware, Brocade are buying McDATA for a shedload of shares; around $713m for the entire company. That values each share at $4.61, so no surprise McDATA shares are up and Brocade's are down. It looks like McDATA is the poorer partner and Brocade have just bought themselves a set of new customers.

I liked McDATA; the earlier products up to the 6140 product line were great. Unfortunately the i10K for me set their demise. It created a brand new product line with no backward compatibility. It was slow to market, OEM vendors took forever to certify it. Then there was the design - physical blade swap out for 4GB; no paddle replacement; a separate box for routing; all the product operating systems are different between the routing, legacy and i10K products. Moving forward, where's the smaller i10K model like a 128 or 64 port version?

The purchase of Sanera and CNT didn't appear to go well; the product roadmap lacked strategy.

So, time to talk to my new friends at Brocade, I bet they've got a smile on their face today...

Wednesday, 2 August 2006

Holiday is over

I've been off on holiday for a week (Spain as it happens), now I'm back. I'm pleased to say I didn't think of storage once. Not true actually; the cars we'd hired didn't have enough storage space for the luggage - why does that always happen?

So back to work and continuing on virtualisation. For those who haven't read the previous posts, I'm presenting AMS storage through a USP. I finished the presentation of the storage; the USP can see the AMS disks. These are all about 400GB each, to ensure I can present all the data I need.

Now, I'm presenting three AMS systems through one USP. A big issue is how cache should be managed. Imagine the write process; a write operation is received by the USP first and confirmed to the host after writing to USP cache. The USP then destages the write afterwards down to the AMS - which also has cache - and then to physical disk. Aside from the obvious question of "where is my data?" if there is ever a hardware or component failure, my current concern is managing cache correctly. The AMS has only 16GB of cache, that's a total of 48GB in all three systems. The USP has 128GB of cache, over twice the total of the AMS systems. It's therefore possible for the USP to accept a significant amount of data - especially if it is the target of TrueCopy, ShadowImage or HUR operations. When this is destaged, the AMS systems run the risk of being overwhelmed.

This is a significant concern. I will be using Tuning Manager to keep on top of the performance stats. In the meantime, I will configure some test LUNs and see what performance is like. The USP also has a single array group (co-incidentally also about 400GB) which I will use to perform a comparison test.

This is starting to get interesting...

Wednesday, 19 July 2006

HDS Virtualisation

I may have mentioned before that I'm working on deploying HDS virtualisation. I'm deploying a USP100 with no disk (well 4 drives, the bare minimum) virtualising 3x AMS1000 with 65TB of storage each. So now the tricky part; how to configure the storage and present it through the USP.

The trouble is; with the LUN size that the customer requires (16GB), the AMS units can't present all of their storage. The limit is 2048 devices per AMS (whilst retaining dual pathing), so that means having either only 32TB of usable storage per AMS or increasing the LUN size to 32GB. Now that presents a dilemma; one of the selling points of the HDS solution is the ability to remove the USP and talk directly to the AMS if I so chose to remove virtualisation (unlikely in this instance but as Sean Connery learned, you should Never Say Never). I can't present the final LUN size from the AMS of 16GB, I'll have to present larger LUNs, carve them up using the USP and forego the ability remove the USP in the future. In this instance this may not be a big deal, but bear it in mind, for some customers is may be.

So, presentation will be a 6+2 array group; 6x 300GB which actually results in 1607GB of usable storage. This is obviously salesman sized disk allocations; my 300GB disk actually gives me 267.83GB... I'll then carve up this 1607GB of storage using the USP. At this point it is very important to consider dispersal groups. A little lesson for the HDS uninitiated here; the USP (and NSC and 99xx before it) divides up disks into array groups (also called RAID groups), which with 6+2 RAID, is 8 drives. It is possible to create LUNs from the storage in an array group in a sequential manner, i.e. LUN 00:00 then 00:01, 00:02 and so on. This is a bad idea as the storage for a single host will probably be allocated out sequentially by the Storage Administrator and then all the I/O for a single host will be hitting a small number of physical spindles. More sensible is to disperse the LUNs across a number of array groups (say 6 or 12) where the 1st LUN comes from the first array group, the second from the second and so on until the series repeats at the 7th (or 13th using our examples) LUN. This way, sequentially allocated LUNs will be dispersed across a number of array groups.

Good, so lesson over; using external storage as presented, it will be even more important to ensure LUNs are dispersed across what are effectively externally presented array groups. If not, performance will be terrible.

Having thought it over, what I'll probably do is divide the AMS RAID group into four and present four LUNs of about 400GB each. This will be equivalent to having a single disk on a disk loop behind the USP, as internal storage would be configured. This will be better than a single 1.6TB LUN. I hope to have some storage configured by the end of the week - and an idea of how it performs; watch this space!

Monday, 17 July 2006

EMC Direction

EMC posted their latest figures. So they're quoting double digit revenue growth again (although I couldn't get the figures to show that). My question is; where is EMC going? The latest DMX and Clariion improvements are just that - performance improvements over the existing systems. I don't see anything new here. The software strategy seems to be to purchase lots of technology, but where's the integration piece, where's the consolidated product line? ECC still looks as poor as ever.

So what is the overall strategy? I think it's to be IBM. Shame IBM couldn't hold on to their dominant position in the market. I can see EMC going the same way as people overtake and improve on their core technologies.

Data Migration Strategies

Everyone loves the idea of a brand-new shiny SAN or NAS infrastructure. However over time this new infrastructure needs to be maintained. Not just at the driver level, but eventually arrays, fabrics and so on. So, data migration will become a continous BAU process we'll all have to adopt. Whilst I think further, I've distilled the requirements into some simple criteria:

Migration Scenarios

Within the same array
Between arrays from the same vendor
Between arrays from different vendors
Between similar protocol types
Between different protocol types

Migration Methods

Array based
Host Based
3rd Party Host based

Migration Issues

Performance – time to replicate, ability to keep replica refreshed
Protection – ensuring safety of the primary, secondary and any replicated copies
Platform Support – migration between storage technologies
Application Downtime – ensuring migration has minimum application downtime

Migration Reasons

Removal of old technology
Load balancing
Capacity balancing
Performance improvements (hotspot elimination)

Time to do some more thinking about filling out the detail. Incidentally, when new systems are deployed, a lot of though goes into how to deploy the new infrastructure, how it will work wth existing technology - how much time goes into planning how that system will be removed in the future?

Tuesday, 11 July 2006

More on Virtual Tape

Last week I talked about virtual tape solutions. On Monday HDS released the news that they're reselling Diligent's VTL solution. From memory, when I last saw this about 12 months ago, it was a software only solution for emulating tape, sure enough that's still the same. Probably the more useful thing though is ProtecTIER, which scans all incoming data and proactively compresses the incoming stream by de-duplicating the data. The algorithm relies on a cache map of previously scanned data held in memory on the product's server, with a claimed supported capacity of 1PB raw, or 25PB at the alleged 25:1 compression ratio.

As Hu Yoshida referenced the release on his blog, I asked the question on compression. 25:1 is a claimed average from customer experience, some customers have seen even greater savings.

So, my questions; what is performance like? What happens if the ProtecTIER server goes titsup (technical expression)? I seem to remember from my previous presentation on the product that the data was stored in a self referential form, allowing the archive index to be easily recreated.

If the compression ratio is true and performance is not compromised, then this could truly be a superb product for bringing disk based backup into the mainstream. Forget the VTL solution; just do straight disk based backups using something like Netbackup disk storage groups and relish in the benefits!!!

Monday, 10 July 2006

Performance End to End

Performance Management is a recurring theme in the storage world. As fibre channel SANs grow and become more complex, the very nature of a shared infrastructure becomes prone to performance bottlenecks. Worse still, without sensible design (e.g. things like not mixing development data in with production) production performance can be unnecessarily compromised.

The problem is, there aren't really the tools to manage and monitor performance to the degree I'd like. Here's why. Back in the "old days" of the mainframe, we could do end to end performance management - issue an I/O and you could break down the I/O transaction into the constituent parts; you could see the connect time (time the data was being transferred), disconnect time (time waiting for the data to be ready so it can be returned to the host) and other things which aren't quite as relevant now like seek time and rotational delay. This was all possible because the mainframe entity was a single infrastructure; there was a single time clock against which the I/O transaction could be measured - also the I/O protocol catered for collecting the I/O data response times.

SANs are somewhat different. Firstly the protocol doesn't cater for collecting in-flight performance statistics, so all performance measurements are based on observations from tracing the entire environment. The vendors will tell you they can do performance measurements and it is true, they can collect whatever the host, storage and SAN components offer - the trouble is, those figures are likely to be averages and either not possible or not easy to relate the figures to specific LUNs on specific hosts.

For you storage and fabric vendors out there, here's what I'd like; first I want to trace an entire I/O from host to storage and back; I want to know at each point in the I/O exchange what the response time was. I still want to total and average those figures. Don't forget replication - TrueCopy/SRDF/PPRC, I still want to know that part of the I/O.

One thought, I have a feeling fabric virtualisation products might be able to produce some of this information. After all, if they are receiving an I/O request for a virtual device and returning it to the host, the environment is there to map the I/O response to the LUN. Perhaps that exists today?

Wednesday, 5 July 2006

Virtual tape libraries

I previously mentioned virtual tape libraries. Two examples of products I've been looking at are Netapp's Nearstore Virtual Tape Library and ADIC's Pathlight products. Both effectively simulate tape drives and allow a virtual tape to be exported to a physical tape. Here are some of the issues as I see it:

1. How many virtual devices can I write to? Products such as Netbackup do well having lots of drives to write to; a separate drive is needed (for instance) for each retention period (monthly/weekly/daily) and for different storage pools. This can cause issues with the ability to make most effective use of drives, especially when multiplexing. So, the more drives, the better.

2. How much data can I stream? OK, it's great having lots of virtual drives, but how much data can I actually write? The Netapp product for instance can have up to 3000 virtual drives but can only sustain 1000MB/s throughput (equivalent to about 30 LTO2 drives).

3. How is compression handled? Data written to tape will usually be compressed by the drive, resulting in variable capacity on each tape, some data will compress well, some won't. A virtual tape system that writes to physical tape must ensure that the level of compression doesn't prevent a virtual tape being written to the physical tape (imagine getting such poor compression that only 80% of data could be written to the tape, what use is that). However the flip side to this is ensuring that all the tape capacity can be utilised - it's easy to simply write 1/2 full tapes to get around the compression issue.

4. How secure is my data? So you now have multiple terabytes of data on a single virtual tape unit. How is that protected? What RAID protection is there? How is the index of data on the VTL protected? Can the index be backed up externally? One of the great benefits of tape is the portability. Can I replicate my VTL to another VTL? If so, how is this managed?

5. What is the TCO? Probably one of the most important questions. Why should I buy a VTL when I could simply buy more tape drives and create an effective media ejection policy? The VTL must be cost effective. I'll touch on TCO another time.

6. The Freedom Factor. How tied will I be to this technology? The solution may not be appropriate, the vendor may go out of business. How quickly can I extricate myself from the product.

Tuesday, 4 July 2006

The Incredible Shrinking Storage Arrays

For those who don't relate to the title, check out IMDB....

You know how it is, you go to buy some more storage. You need, say 10TB. You get the vendor to quote - but how much to do you actually end up with? First of all, disk drives never have the capacity they purport to have even if you take into consideration the "binary" TB versus the "decimal" TB. Next there's the effect of RAID. That's a known quantity and expected, so I guess we can't complain about that one. But then there's the effect of carving up the physical storage into logical LUNs; this can easily result in 10% wastage. Plus there's more: EMC on the DMX-3 now uses the first 30 or so disks installed into the box to store a copy of cache memory in case of a power outage; OK, good feature but it carves into the host available space. Apologies to EMC there - you were the first vendor I thought to have a dig at.

Enterprise arrays are not alone in this attritious behaviour - Netapp filers for instance will "lose" 10% of their storage to inodes (the part which keeps track of your files) and another 20% reserved for snapshots - that's after RAID has been accounted for.

What we need is clear unambiguous statements from our vendors - "you want to buy 10TB of storage? Then it will cost you 20....."

Monday, 3 July 2006

Thin Provisioning and it's Monday and EMC are buying (again)

In the early 1990's StorageTek released Iceberg, a virtualised storage subsystem. Not only was it virtualised but it implemented compression and thin provisioning. For those who don't know, thin provisioning allows the overallocation of storage based on the fact that most disks don't use their entire allocation. Iceberg was great because the MVS operating system was well suited to this kind of storage management. An MVS disk (or volume) could only process a single I/O at any one time (although that has changed with PAVs). Iceberg allowed lots of partially filled virtual volumes, increasing the amount of parallel I/O. On Open Systems this would be a problem as there would be a lot of unused space; Iceberg only reserved the storage for files that were in use.

Thin provisioning is back again; companies like 3Par are bringing products to market that once again allow storage space to be overallocated. Is this a good thing? Well at first glance, the answer has to be yes; why waste money? After all, 10% of a $10m storage investment is still a cool million dollars. However there are some rather significant drawbacks to over-provisioning storage. Think what happens when a disk reaches 100% utilisation and can't store any more data. Any further attemps to write that device will fail. That could take down an application or perhaps a server. Hopefully growth on that one disk has been monitored and the rate of growth won't be a significant shock; either way, only users of that one device will be affected.

Now consider thin provisioning. When the underlying storage reaches 100% utilisation, then all disks carved out of that storage will receive write failures; worse still these write failures will occur on disks which don't appear to be full! Depending on the way the storage was allocated out then the impact of 100% utilisation could be widespread; if storage is accessed by multiple tiers of storage and different lines of business, then then one LOB could affect another; development allocations could affect production. It is likely that storage wouldn't be shared out in this way but the impact is obvious.

Thin provisioning can make storage savings; but it needs more careful management; I learned that 10 years ago with Iceberg.

EMC have been purchasing again; this time it's RSA Security. I can see the benefit of this; storage and security are becoming tightly linked as storage becomes the mainstay of the Enterprise. All I can say is I hope that the purchase improves the security features of ECC, Solutions Enabler and a number of other lightly protected EMC products.

Tuesday, 27 June 2006

ECC Lite

I've talked previously about management tools. I don't think we'll see a unified "one tool fits all" solution. However in the meantime I think there are some things we can do;

So, ECC, huge product, lots of features. What happens if you want to do a little configuration work but don't want to deploy an entire ECC infrastructure. I suggest ECC Lite, a simple GUI based version of SYMCLI. It lets you visualise your storage, do the same TimeFinder and SRDF functions and to build and kick off symconfigure. It runs under windows on your local machine (if you have Gatekeeper access to your Symmetrix) and doesn't require big servers. Lastly it is FREE. It doesn't exist; perhaps I'll write it.

In the interests of equality (certainly with HDS, I can't comment on IBM) if HDS provide command device access to their arrays, I'd write one for that too.

So, we get the basics; the terms that define LUNs, array groups, etc can then be made generic, then we can create a product that spans both vendors using generic storage terms. I'd like that one too.

Saturday, 24 June 2006

Pushing the datacentre distances

I've been wrestling with a couple of architecture issues over the last few weeks. They are interlinked; but stand alone as separate issues.

First is three datacentre. So imagine a scenario where there is a normal datacentre pair, reasonably close to each other. New rules require another datacentre some distance away, which will be out of zone. This means a distance of perhaps 50 or 60 miles. The original datacentre pair were only 5 miles apart. The aim is to be able to replicate between the local pair with a third copy in the remote centre. Not a problem you might say; the local pair can be synchronously replicated, the remote copy can be async. Well, OK, that would work and the major array vendors can do that today. But....I really don't want async and I don't necessarily want to pay for three copies.

How can I do it? At the moment I don't think it is possible; certainly I need to still have three copies and I can't reduce the replication to the remote site to synchronous without incurring a penalty. There is a possible scenario using a USP to virtualise the disks at the remote site to one of the local sites however this may not provide suitable data integrity.

I'd like to see the vendors provide this solution; in fact I think the best place for this requirement to be implemented is within the SAN as part of a director. I don't think any vendors are working to develop this; if they are they'd be cutting into the storage array vendors patch, something that might not be too popular.

In case anyone decides to comment, I am aware of EMC InVista but I don't think it offers this degree of functionality and flexibility.

Friday, 23 June 2006

Device Manager TNG & Central Storage Management

Right, I've got Device Manager 5 installed pending installation of Tuning Manager 5. Agreed, the layout looks better (shame it is all still Java, my least programming tool). Now I need to generate some data. But yes it looks better.

Old JWT at DrunkenData seems to think www.storagerevolution.com is still alive. Here's his recent post discussing the site plans. Unfortunately I don't think there's any holy grail here (excuse the DaVinci Code reference - it wasn't intentional - I haven't even seen the film or read the book). Storage is evolving daily, let's face it, EMC is buying nearly a company a day at the moment! Keeping up with this technology is impossible; spending 12 months developing an infrastructure will provide you with a design that is 12 months out of date, a long time in this industry. Just look at how quickly new technology is being introduced; think of how technologies such as virtualisation will skew all the products in terms of accounting and managing data.

Creating a single consistent framework for all vendors and technologies is simply impossible, so we have to think more pragmatically. We need a baseline; that baseline needs to create generic storage terms against which we can map technologies. Let's face it, all the major storage vendors have split mirror technology and COW technology, they just call it a different name. The same applies to other functionality - synchronous/async replication; NAS, iSCSI and so on.

My suggestion; start with the basics. Create a framework that maps against hardware from an agnostic position. Revise and develop it; increase the complexity to encompass application and business requirements. OK, now that will take some time; but at least there will be a working solution in place.

NO Storage Vendor is going to hold back their product development lifecycle just to make sure they work with generic standards, they will continue to press for market share bring new features to the market and always look to maintain their USP.

We will have to run just to keep up.

Wednesday, 21 June 2006

The missing storage generation

My Tuning Manager issues go on; there's obviously a problem with the version I'm using - reports don't work correctly even after the re-install. However one, small side effect, I picked up the wrong subsystems to monitor after a reboot of the TM server. This was because I didn't follow the instructions and ensure that the LUN binding remained the same after reboot. That meant the physicaldevice assignments from Windows changed after reboot and TM kindly picked up the wrong systems. My fault. My suggestion of the day, RTFM.

I had a very nice man at HDS contact me and explain how much better Version 5 is. Well, I'd agree - nice interface, in fact the whole HiCommand suite has nice colours at the top. Tuning Manager is green I think, Device Manager is blue. From what I know, the aim was to improve the GUI first, hence the first 5.x release - then concentrate on the other features which needed improving. I'll have an evaluation of 5.0.xxx done soon and I'll let you know how it rocks.

I started out working on mainframes. For those who are too young to remember, these were huge machines with football fields of storage and stored about the same amount of data as you can get on an iPod (apologies to those older than me who remember 360, I started on 370 and ESA). Running mainframes has become a problem; see http://www.techweb.com/wire/hardware/170000208 which dicusses the shortage of mainframe trained people. Anyway, that wasn't my reason for bringing it up; here's the reason. 17 years ago, IBM released DFSMS - the Storage Management Subsystem (or System Managed Storage). This allowed files to be directed to the most appropriate storage tier based on the importance of the file. These policy settings could be specified using a basic like language which allowed complex decisions on file locations to be made. Data was backed up and moved between tiers using a hierarchical storage manager - ILM.

Best of all, we had SAN in the form of ESCON, Enterprise System Connection, only recently replaced by FICON, but a fibre interconnect solution. One advanced feature offered virtualisation; EMIF - Escon Multi-Image Facility. This allowed any interface (e.g. HBA) to be used by any domain on the mainframe to connect to the storage. This isn't even possible today - every domain in say a Sun F15 or E25 as an example, needs dedicated HBA cards.

So what happened to the development of storage features between the late 80's and today? Where did the storage generation go? Personally, I moved to Open Systems and watched all these features being re-invented. Best of all, I loved the way all the "new" features in the O/S world were discovered! Nostalgia is not what it used to be....

Tuesday, 20 June 2006

Tuning Manager Again; Virtualising Tape

Tuning Manager didn't work properly. I needed to re-install it. Unfortunately I got into an infinite loop. The uninstall process requested I start one of the Tuning Manager services, when I did and tried again, the uninstall failed asking me to stop all services...whoa.

So, strategically thinking, where should backup be virtualised? There are virtualising options to completely virtualise the tape drive and the tape cartridge; this can be on disk or eventually to tape, either in a one-to-one relationship or virtualised. These are good but have drawbacks; how do I get off the virtualised media? If the data was written to tape in a 1:1 relationship with the backup product, then I'm OK, if not I need a migration tool.

OK, so I could abandon the concept of tapes and write backups to a disk pool; great, but how do I retain those backups indefinitely? Not really practical on disk.

So, I could abandon the backup software altogether - if I use NAS products I can take snapshot type backups, there are even products which will snapshot SAN (SCSI) devices. However I'm tied to this product for my data backup going forward.

Hmm. It is a quandry. I'm settling on a number of requirements;

1. Retain data in the format of the backup product, wherever the data finally ends up.
2. Use tape as the final destination for data which must be maintained long term.
3. Use disk as an intermediary; depending on the backup product, choose integral disk backups or use a VTL solution.
4. Don't completely virtualise the entire tape subsystem; allow tapes to be written which can still be understood by the backup product.

Seems simple. Anyway, Tuning Manager, sorted it. Needed to uninstall from the console. Shame I lost all my data though.

Thursday, 15 June 2006

Tuning Manager, storage on storage

I've done more work on Tuning Manager today and added a number of systems for collection. I'm not keen on the interface (currently using version 4.1) and so will be getting version 5.0 installed quickly. The proof of the pudding will be in the information I get out of the product. I'll give it a few days then start to see what we're collecting.

How much storage is now used to monitor, ahem, storage? With the creation of databases for collating performance data and configuration data (think of ECC for instance) a significant infrastructure and storage volume is needed simply to manage the storage itself. How much storage is needed before we reach Parkinson's Law and all the storage we have is for monitoring itself?

I shared a few beers with Dan Crain last night. It's an interesting time for the storage fabric vendors. Whilst I can't mention specifics, there's a lot of interesting stuff on the horizon and the storage fabric is moving well and truly away from a basic interconnect. Value add is going to become a big thing; a basic monolithic storage interconnect will no longer cut the mustard.

Wednesday, 14 June 2006

More on ECC and Black Boxes

So, I checked it out. The official position is not clear. There's no direct support under VMware ESX, but some components are supported (things like the fibre channel agent etc) but a lot of stuff won't be considered until ESX 3.0. Still waiting for clarification.

I've also been looking at USP configurations on two fronts; first, Virtual Partition Manager (VPM) allow storage resources on a USP to be placed into individual storage partitions (SLPRs). These can be administered separately. We're looking at this for separation of tier 1 and tier 2 storage. I like the idea that both storage ports and cache can be partitioned along with LDEVs to create a separately administerable storage partition. This not only simplifies the administration of multi-tiered storage in a single frame but also protects resources at each level of tiering.

The second USP configuration work involved LDEV layout. Although we ask our customers to consider the USP as a "black box", it clearly isn't and LUN distribution is paramount to obtain best performance, so I've been looking at where we allocate ShadowImage LDEVs. We plan to use higher capacity disks for SI to reduce cost, so disk layout is important. The conclusion is we should keep the primary disk and the SI copy within the same half of the frame, i.e. either on the left or the right as this will give the best performance when replicating data.

For those of you who have no idea what I'm on about, I'll do an entry on the TagmaStore one day, or you can look at the HDS website.

Tuesday, 13 June 2006

Oversubscribed and undersupported

Been thinking about oversubscription and port usage. I suspect most SANs are seriously underutilising their available bandwidth. Oversubscription is therefore a good thing - but it will be dependent on being able to use that oversubscription. Cisco switches (gen2) allow oversubscription at the port group level. On a 48-port card this is per group of 12 ports but there are restrictions - try putting more than two 4Gb/s ports at dedicated in a port group and you'll have to start turning other ports off! Restricting per port group is not enough. McDATA restrict by 32 port blade - fine, but the blades run at 2Gb/s unrestricted anyway. Brocade - not restricted - am I paying for bandwidth I don't need? Give me lots of ports and a reasonable amount of bandwidth for my money - and let me share it everywhere.

Why does EMC not support ECC on VMware? OK, I guess the Repository and Server components may cause a problem, but the stores - surely they can. What is going on with EMC and acquisitions? I can't believe it can integrate all of the software from the companies recently purchased - come on, at least ECC on VMware please - they're both your products.

Thursday, 8 June 2006

Intellectual Property

A while back I talked about the size increase in hard disk drives. We see disks growing at Moore's Law rates, predicting a doubling of size every 18 months to 2 years. EMC proclaim a petabyte of storage within a single storage subsystem. But what does that actually mean?

Imagine an average page of text, around 600 words and 3000 characters. Multiply that up to a book of 300 pages, that's around 900KB to store natively, but say 5MB with formatting. So, 1 Petabyte could store 215 million books - more than the entire collection of the British Library!

OK, this is hardly a scientific example, however what it does show is the amount of intellectual capital that exists in a single disk subsystem and therefore the subsequent responsibility on storage managers and storage architects to protect and provide timely access to this data.

How can we do this?

Within the subsystem itself we can guard against hardware failure with resiliency - RAID5/6, redundant components, multiple power sources - and picking a trusted vendor with a track record in producing solid products.

Outside the subsystem, we can prevent loss due to more catastrophic reasons by remote replication of data to another datacentre which could be located near or a long distance away. It is possible to take multiple replicas to be completely sure.

To prevent against data corruption rather than equipment failure, we can take regular backups - these could be snapshots on disk or tape based backups secured offsite.

So there are lots of options. The starting point however should always be to evaluate the value of the data to be secured and to base investment in data protection on the value of that data. Don't bother synchronously replicating data which will be discarded and never reused or which can easily be recreated (for example User Test Data). Conversely, use replication and snapshot technologies on production databases used to deliver key company services.

Remember that data is any company's intellectual capital and it is our responsibility to ensure its safety.

Thursday, 1 June 2006

Planning for SAN resilience

One aspect of storage design must consider the issues of resilience. All infrastructure components are subject to failure; even five 9's of reliability means an outage of just over 5 minutes per year. How do we plan for that?

Multipathing

This is a simple one; two or more entire fabrics connecting hosts to storage. If one fabric fails, then the other can take over. This design consideration is not just for recovery, it assists in maintenance, so one fabric can be upgraded whilst the other maintains operation. Multipathing is of course expensive; doubling up on all equipment. But it does reduce the risk of failure to an almost negligible number.

Director Class versus Switch

As mentioned, director class switches offer at least five 9's availability. Departmental switches on the other hand offer more like three 9s, which is a considerably less resilient piece of equipment. So, for a resilient SAN architecture, don't put deparmental switches into the infrastructure at points of criticality.

Component Failure

Director class five 9's refers to the failure of an entire switch. It doesn't refer to the resilience of an individual component. So, plan to spread risk across multiple components. That may mean separate switches, it may mean across separate blades on switches. Hardware capacity growth means blades have moved from 4-port (e.g. McDATA) to 32 and 48 port blades (Cisco), reconcentrating the risk back into a single blade. So, spread ISLs across blades, spread clustered servers across switches and so on.

In summary, look at the failure points of your hardware. Where they can't be remedied with redundant components, plan to spread the risk across multiple components instead. If you can afford it then duplicate the equipment with multiple fabrics, HBAs and storage ports.

Sunday, 21 May 2006

Hybrid disk drives

Samsung have announced a hybrid hard disk for use with Windows Vista. The announcement is here http://www.samsung.com/PressCenter/PressRelease/PressRelease.asp?seq=20060517_0000255266

Basically, the disk just has more cache memory, 128MB or 256MB, but crucially, can use the cache as an extension of the hard disk as a staging area. This allows (dependent on traffic type) for the hard disk to be left spun down for longer periods. Samsung are pitching the device as offering benefits to laptops and to reduce boot speeds for Windows systems.

That's all great for Windows, but what about wider usage? It seems to me that the benefit of having a larger more intelligent cache is great for a PC based operating system where the working set of data on a large disk may only be a few hundred megabytes. Larger systems will probably have access profiles of more random read/write or even significant sequential access.

It may be that with intelligent microcode, midrange and enterprise arrays can benefit from the ability to leave devices spun down, potentially saving power and cooling. That would be a great benefit in today's datacentres.

Thursday, 18 May 2006

The Case for 10Gb/s

Fibre channel storage speeds are now up to 10Gb/s as I'm sure we're all aware. Brocade, McDATA and Cisco all have 4Gb/s products. Question is, are they really necessary?

Pushing a full 1 or 2Gb/s of data across a fibre channel connection at a sustained rate requires some decent processing power, so why move things on to 4 and 10? Well, 4Gb/s and 10Gb/s certainly prove useful as ISL connections. They reduce the number of ports required and subsequently the cabling. But with faster connections comes a price; cabling distance for multimode fibre drops significantly. Check it out here : http://storagewiki.brookend.dyndns.org/ow.asp?Fibre%5FChannel

So faster speeds yes, but shorter distances. Wasn't one of the benefits of fibre channel to remove us from the 25m SCSI cable restriction?

One other thing to bear in mind. If we start consolidating traffic into less but faster ISLs, I'd want to be very sure that the aggregation of traffic gives me adequate quality of service for each of the traffic types sharing the link. Cisco can do that; McDATA are talking about it; Brocade, not sure.

So what do I think, are the faster speeds necessary? Well, I can see the benefit of 4Gb/s for storage ports, less so for host ports. I can also see the limited benefit of using 10Gb/s for ISLs between locally placed switches, but I think that's where it ends for 4 & 10Gb/s. Hosts and storage systems can't push that degree of data around and switch backplanes can't move the data around either. So for me, 1/2Gb/s will be the norm, 4/10Gb/s will be for special occasions.

Tuesday, 9 May 2006

Managing Hard Disk Capacities

It has been 50 years since the first hard disk drive was invented by IBM. It had a capacity of a mere 5MB. Contrast that today with the latest 500GB drive from Hitachi. In 50 years, that's an increase of 2000-fold capacity each year!

There's no doubting disk drives have also become more faster and more reliable as time has passed, however the recent increases in capacities have brought with them new challenges.

First of all, two streams of hard drive technology have developed based on the interface type they support; the SCSI/Fibre Channel/Serial Attached SCSI and the Serial ATA/Parallel ATA formats. SCSI-based are generally more reliable than SATA drives due to the components used and build quality, however SCSI-based drives carry a price premium. In addition, SCSI devices usually have faster rotational speed and lower capacities than SATA. Compare the Hitachi Ultrastar 15K147 drive which spins at 15,000rpm with a capacity of 147GB to the Deskstar 7K500 which spins at 7200rpm with a capacity of 500GB. That's half the speed with three times the capacity. Clearly these drive types have very different uses; the Ultrastar is more suited to high performance random workloads while the Deskstar is more of a low-cost low-activity archive type device.

The increase in capacity brings into question the subject of reliability. The Ultrastar drive quotes a reliability of 1 in 10E15 read/writes. This is only 250 complete reads of a single drive and for a heavily used device could be easily achieved in a short time. Deskstar drives are 10 times worse and would fail after only 20 complete reads. Obviously this is not acceptable and since 1978 we've had RAID and since 1988 more advanced versions (including RAID-5) which uses multiple disks to protect data in case of a single disk failure.

RAID works well, but as disk sizes have increased, so has the risk of a second disk failure during the rebuild of a failed disk in a RAID group. Have a look at Dave Hitz's article on the Netapp website. This seems a little simplistic, however it sobering reading and it is clear why double parity (or RAID 6, or 6+2) is a required evolution of standard RAID-5 technology. This provides for a second disk failure without data loss in a RAID group, statistically decreasing the failure risk for a RAID group to almost infinitesimal levels. I don't think Dave's calculation is fair as most enterprise arrays will predictively "spare out" a potentially failing disk before it actually fails. This rebuilds a new disk from suspected failing disk itself, still providing the other disks in the RAID group to use if the disk does actually fail.

Personally if I had the option I would choose 6+2 protection, subject to one proviso; each RAID array group on 300GB disks will be 2.4TB of storage at a time. This is not a small amount!

Hard disk drives will continue to grow relentlessly. Seagate have already announced a 750GB drive using perpendicular recording. A single disk subsystem can hold a huge amount of intellectual capacity. I'll discuss this another time.

The Green Datacentre

I've been thinking about green issues this week after we received our "bottle box" from the local council to encourage us to recycle glass bottles and jars. In particular, my thoughts were drawn to the environmental friendliness of datacentres, or not as the case may be.

Datacentres are getting more demanding in their power and cooling requirements. It's now possible for more space to be set aside for plant covering the provision of electricity and air conditioning than actual datacentre space and the usage of space is having to be carefully planned to ensure high demand equipment can be catered for. Take for example fabric switches, or more specifically directors as they are more large scale and environmentally hungry. Let'st start with the new Cisco beast, the 9513. In a full configuration, this has 528 ports in a 14U frame but takes 6000 watts of power and outputs the same amount of heat requiring cooling. That's 11.36W per port, or 11.36W per 1Gb/s of bandwidth (the Cisco in a full configuration provides 48Gb/s per 48-port card.

For McDATA, compare the i10K. This offers up to 256 ports, again in 14U of space and requires 2500W of power. That equates to 9.76W per port, but as the i10K offers full bandwidth of 2Gb/s on those ports, that's 4.88W per 1Gb/s of bandwidth, twice as good as Cisco.

Finally, Brocade. Looking at the Silkworm 48000, this offers up to 256 ports in a 14U chassis all at up to 4Gb/s bandwidth. Power demand is quoted in VA (volt amps) and assuming 1VA=1W (and OK, that's a big assumption but I'm assuming the power factor is 100% here), then the maximum power requirement is 750W for a full chassis, or a remarkable 2.93W per port or 0.73W per 1Gb/s of bandwidth.

Brocade seems to offer a far better environmental specification than the other two manufacturers and that translates to more than just power consumption per port. Power consumption (and more importantly) power dissipation or cooling per rack has a direct impact on how many directors could be placed in a single cabinet and therefore the amount of floorspace required to house the storage equipment. All three vendors could rack up 3 units in a single 42U cabinet but could you power and cool that much equipment? With Brocade probably, with Cisco I doubt it (in any case you probably couldn't cable the Cisco directors - imagine nearly 1600 connections from a single cabinet!). What that means is either a lot of empty space per rack or other equipment in the same rack that doesn't need anywhere near as much power.

So what's my point? Well clearly there's a price to pay for storage connectivity over and above the per port cost. Datacentre real estate is expensive and you want to make best use of it. What I'm saying is that the technology choice may not purely be driven by feature and hardware cost but on the overall TCO of housing and powering storage networking equipment (and obviously the feature set the products offer). Incidentally, I've also done a similar calculation on enterprise storage frames from IBM, EMC and HDS. I'll post those another time.

Tuesday, 2 May 2006

Storage Virtualisation

This week I've been thinking a bit more about storage virtualisation. I'm planning to implement virtualisation using the HDS USP product. The decision to use HDS has been based on the need to supply large amounts of lower tiered storage (development data) but retain the functionality to production tiers for data mobility. Using a USP or NSC55 as a "head" device enables the features of Truecopy to be provided on cheaper storage. In this instance the data quantities can justify using a USP as a gateway. A USP isn't cheap and in smaller implementations using the USP in this fashion wouldn't be practical.

So using this as a starting point, what is available in the virtualisation space? OK, start with HDS. The USP and NSC55 (Tagmastore) products both enable external storage (i.e. storage connected to a Tagmastore) to be presented out as if it was internal to the array itself. This means the functionality offered by the array can be retained on cheaper storage, for example the AMS range. The presentable storage is not limited to HDS products and therefore the Tagmastore range is being touted as a system for migration as well as virtualisation. However there are some downsides. LUNs from the underlying storage system are passed straight through the Tagmastore so the size characteristics are retained. This implementation is both good and bad; good because it is possible to take out the Tagmastore and represent the disk directly to a host, bad because you are restricted to the underlying LUN size of the cheaper device, which if it can't present LUNs exactly as you'd like, could mean Truecopy and ShadowImage functionality just won't work. There's also the issue of cache management. The Tagmastore device will accept I/O and confirm it to the host after it is received into cache - the preferred method. It could be possible for the Tagmastore to receive large volumes of write I/O which needs to be destaged to the underlying storage. If this is a lower performance device, then I/O bottlenecks could occur. Finally consider the question that should always be asked; how to I upgrade or get off the virtualisation product? Moving from Tagmastore is reasonably painless as the underlying data can be unpicked and represented to a new host; obviously that may not be practical if the Tagmastore functionality has been used.

If virtualisation is not done in the array itself, then it could be managed by an intermediate device such as the IBM SVC (SAN Volume Controller). The SVC controls all the storage on the underlying arrays and chooses how virtual LUNs are stored. Data is therefore spread over all the available arrays as the SVC sees fit. This approach again, is good and bad. Good; data is well spread giving theoretically even performance. Bad; Where is the data? What happens if the SVC fails? Asking the "how do I get off this product" is a bit more taxing. The only option (unless IBM offer another bespoke solution) is to do host-based migration from the SVC to new storage. This may be hugely impractical if the SVC storage usage is tens or hundreds of terabytes.

Option 3 is to virtualise in the fabric. This is the direction favoured by the switch manufacturers (no surprise there) and companies such as EMC with their InVista product. Fabric virtualisation opens up a whole new way of operating. All of the underlying subsystem functionality (remote replication, PIT copies) are pushed up to the fabric switches and associated device. This creates issues over performance and throughput and also the ability to operate in a multi-fabric environment, the standard model for all Enterprise companies.

From an architects view, all of these options are viable and the $64,000 question to ask is; "what am I looking to achieve through virtualisation?" Once this question is answered and quantified then the solution becomes more apparent. My recommendation for virtualisation is; think what you're looking to achieve, then match the available solution to your requirements. Think more about the future tie-in with the product than the current benefits as the costs in the long run will be removing a virtualisation solution rather than migrating to it.

Wednesday, 15 March 2006

Fit for purpose

I've been away skiing during the last week and whilst I'm sure no-one wants to hear the details of my skiing skills (which aren't good), it did make me think about how my trip related to storage.

For those uninitiated, the difficulty of ski slopes are graded by colour, in order from easiest to hardest, green, blue, red and black. With my abilities I managed green and blue runs, but red and black were too hard, however for plenty of skilled people, the red and black runs must have provided a real thrill and it was impressive to see them effortlessly skiing down the slopes. Clearly for me, I was fit for purpose on green and blue and red and black were unnecessary.

This is how the world is moving in storage. Systems are being designed with storage that is fit for the purpose required. Whilst this has gone on for years (think of DFHSM and SMS on the mainframe), the technology has become more complex. The old hierarchical storage management tools of the mainframe could move data between tiers of storage (mainly disk and tape) but data moved to a lower tier couldn't be directly accessed until returned to the initial tier.

Virtualisation changes this, enabling data to be accessed from and moved between multiple tiers of storage directly and seamlessly. As a consequence, more complex storage environments can be developed and data tiering can be as complex as required. Storage infrastructures can therefore be designed to provide the best price point on data usage versus the features and functionality required for the integrity of the data.

So, look at virtualisation and see whether it can make your storage infrastructure fit for purpose.

Monday, 27 February 2006

The technical solution is not always the best

As an architect, the best technical solution should win out every time. This week I've come to realise that the technical solution isn't always best. I've had to deal with not putting forward the technical solution I know is the most appropriate in favour of a technically inferior solution which is more politically acceptable.

I pondered for a while, should I accept the compromise or not? In the end I decided that it would be best to state my principles and agree to accept a compromise technical design. Let's hope its acceptable. If not, I can always say I told you so.

Thursday, 23 February 2006

Back to Blogging

I'm sure some people think writing a blog is some sort of personal journey to self expression. Some probably think it more akin to self promotion. Well what ever reason it is, I'm back writing again. This time I've chosen a new more neutral home.

So what am I going to talk about. OK, I'm a Storage Architect. What the hell is that then? It means I design (or try to design) large scale data storage infrastructures. It doesn't mean I design big yellow buildings you can use to store all your old furniture or stuff you can't be bothered to sell or throw away.

So here I am. I hope you will find what I write interesting and worthwhile. I hope I find it worthwhile too....