Saturday 28 June 2008

Keep Your Data for 200 Years - Why?

Courtesy of The Register, I followed their link to a company called Delkin (the data Belkin?) who are touting their premium Blu-Ray disks (BD-R) with a lifetime of 200 years (and 100 years for their DVD-R disks).

Now, this all sounds wonderful; a "guaranteed protection over time" (whatever that means) for your "wedding photos, tax documents etc". The trouble is, and we've been down this road before, having media that survives 200 years is great, but (a) what's going to be around to read it and (b) will the data format still be understandable by the latest software?

Attacking the first, it is conceivable that Blu-Ray compatible drives will be around in 10-20 years' time. After all, we can still read CD-ROMs 20 years after they were introduced and Blu-Ray is already a mass-marked storage platform, not just for data, but for media content too. However, 200 years is a bit hopeful. The 20 years since the introduction of the CD format has seen DVD, Blu-Ray, HD-DVD (!) plus countless other solid state formats.

Data format is more of an issue. I discussed this issue in a recent post.

Another great example of hardware and data compatibility issues can be seen with the BBC's Domesday Project, which used Laserdiscs and non-standard graphical images for displaying information. What's to say that in 50 years time we won't think JPEG just as archaic?

So, don't waste your money on $27 BD-R disks. Buy them cheap, keep multiple copies and refresh your data regularly.

Monday 23 June 2008

Incipient Revisited

You will remember that I recently posted a comment about migration costs, specifically with relation to Incipient. My view was (and still is) that the majority of migration costs come from preparatory and remedial work rather than execution of the migration. Well, Incipient asked for the right of reply and I had a call last week with Robert Infantino, their Marketing and Alliances Sr VP.

The $5000/TB figure they were quoting was an average they had seen in the industry for certain vendors' professional services time to come in and perform the migration work on behalf of the customer. Incipient's take was that they could provide their appliance/software expertise to provide the same service but at a significantly reduced cost (I won't quote specific numbers here, but the number quoted was much lower than the equivalent cost from "a vendor"). So, I guess with clarification, it is more clear that Incipient were comparing the vendor costs versus their product costs and not including any internal customer costs (project management, preparation work etc) in the calculation. This seems a more appropriate comparison in my opinion.

Getting back to the vendor discussion, there's a real issue here. If vendor X wants to sell you their latest technology, they need to accept and take the hit on helping with migration to their new array. This should be even more so where the vendor doesn't change as this should be a "no brainer" and built into the technology.

In a world where hardware is becoming a commodity, one differentiator will be the vendor who can minimise the effort/cost and impact of migrating from one technology to another. Until then, products like SVC and those from Incipient will continue to have a market position - oh and humble consultants like yours truly!

Tuesday 17 June 2008

The Rise of SSDs

Sun recently announced that they will be putting solid state disks into all of their server and storage range of hardware. EMC already have solid state drives for DMX-4, which was announced in January this year. EMC have also stated that they think SSDs will reach a price parity with high end FC drives by 2010.

All of a sudden (and I'm sure plenty of people will claim it isn't sudden) solid state disks are all the rage. For servers, I can see the logic. It's another step in keeping the power and cooling demands of servers down; it also extends primary memory further and will definitely increase performance.

But what about storage arrays? I can see the benefit of putting a tier of SSD drives into DMX arrays, especially in the way EMC have chosen to implement it. It allows those targeted applications to get the performance they require at a manageable price point without a drastic reconfiguration of the array. But an entire array of SSD? That's just the same as existing products like Tera-RamSan.

If SSD prices are driven down, then surely so will the price of standard hard drives. HDD manufacturers aren't going to lie down and let solid state take away their business. We've seen their response already with Seagate taking STEC to court over patent infringements.

So where will it end? Well, tape didn't go away as many forecast it would. I don't see spinning drives going away any time soon either. What I'd like to see is the rise of intelligent storage systems that learn the busy and quiet blocks and move the data between SSD and HDD to keep optimal performance. Meantime, HDD prices will continue to fall and the battle will be between cheap (but fast) HDDs and balancing their cost against the power/cooling they need.

Friday 13 June 2008

FC Enhancements

A comment posted to my previous blog entry reminds me of a requirement I've had for some time from Fibre Channel. In the "Good Old Days" in my first working life as a mainframe systems programmer, I could very easily see a breakdown of response time against each storage device on an LPAR. Now, the passing years may have given me "rose tinted spectacles" (or more accurately now, contact lenses) of that time, but I seem to remember the reason that I could see seek time, disconnect and connect time was due to the design of the (then) MVS I/O subsystem. As each I/O (CCW) was processed, the hardware must have been adding a consistent timestamp to each part of the process; the I/O initiation, the connect, the disconnect and subsequent seek and then the reconnect and data transfer time to complete the I/O (if none of this makes sense, don't worry, it probably means you are under 40 years old and never wore sandals to work).

Nowadays, the I/O infrastructure is a different kettle of fish. Each part of the infrastructure (host, HBA, fabric, array) are provided by different vendors and have no consistent time reference, therefore tracking the time to execute a storage "exchange" is very difficult. There is (as far as I am aware) nowhere within a fibre channel packet to track this response time at each stage of the journey from host to storage.

If we want the next generation of storage networks to scale, then without a doubt we need to be able to track the journey of the I/O at each stage and use this information to provide better I/O profiling.

Now, just how do I become a member of the t11 committee.....

Thursday 12 June 2008

Storage Migration Costs

I’ve not paid much attention to Incipient (their news page doesn’t provide an RSS feed, so there’s no chance of me seeing their press releases easily), but my attention was recently drawn to a recent release relating to their iADM and iNSP products (catchy names, those).

Now, if you want to know about their products, have a look at their website for yourself. Rather, my interest was sparked by a claim in their press release, quoted below:


The High Cost of Today's Data Migration

Industry estimates and field data captured by Incipient indicate that SAN storage is growing at 40 - 60 percent annually and 25 percent of data under management is moved annually at an average cost of $5,000 per terabyte. Based on these estimates, a data center with one petabyte of storage under management today spends $1.25 million annually on data migration operations. Two years later, the data center is likely to grow to nearly two petabytes increasing the annual data migration cost to nearly $2.5 million.

Source: Incipient Press Release 11 June 2008

So the estimate is $5000 per TB of data movement and 25% of data being moved each year. I can understand the latter; it’s simple logic that if you have a 3-4 year lifecycle on technology then on average 25% of your estate will be being refreshed each year (although that figure is slightly distorted by the fact that you’re also deploying an additional 40-60% each year). Now, how to get to a $5000 per TB calculation...

Excluding new storage acquisition, network bandwidth, etc, I’d assume that the majority of migration costs will be people time. That would include planning and execution of migrations. In environments of 1PB or more, I could (almost) bet my house on the fact that there will be a significant amount of the storage infrastructure which is (a) not understood (b) badly deployed (c) backlevel amongst many other issues. $5000/TB would therefore seem quite reasonable, based on the amount of work needed to refresh. The only problem, though, is that a majority of the manpower cannot be solved by software alone. This will include documenting the environment, bringing server O/S, firmware and drivers up to date, negotiating with customers for data migrations, migration schedule planning, clearing up wastage, new server hardware and so on.

It would be an interesting exercise to determine what percentage of the $5000/TB cost is actually attributable to data movement work (i.e. having someone sitting at a screen issuing data replication commands). I suspect it is quite low. From experience, I’ve been able to move large volumes of data in quite short timespans. In fact assuming sensible preparation and planning, most of the time doing migrations is sitting around (previous employers disregard this statement).

So how much money would Incipient save? My bet is not much.

Wednesday 11 June 2008

Ah this is so accurate...

As a contractor/consultant I can totally relate to this list; I'm sure many of you can too.

Simple is Good

I've been doing a lot of travelling recently (rather a lot in fact), mostly in Europe, with a little in the UK between airports. European trains are much better than their UK counterparts - they are reliable, clean, comfortable (note I didn't claim they were fast) and their cost structure is simple to understand. No restrictions about time of day travel, booking in advance or all that nonsense. No. Simply turn up at the station and buy a ticket.

The UK on the other hand must have one of the most complex ticketing systems, especially around London. As an example, if I travel into London from where I live and want to return home between 4pm and 7pm then I can't buy a cheap day return. Presumably that's because they can fleece travellers who don't realise this rule exists. However if I am already in London and want to travel out, I can buy a cheap day single and travel on it between 4pm and 7pm! Even the people selling the tickets think it is crazy. I could give you dozen's of other similar examples, but life's too short.

So it is with storage. Keep it simple. Take tiering as an example. You could spend days and weeks developing the most finely detailed tiering strategy but in reality you will find most data will sit on a small number of tiers, the bulk of it being in the middle range. Developing complex tiering structures, just like complicated train pricing structures just leads to confusion and in the end additional cost. All that's needed is a simple strategy with most of the data on cost efficient storage.

Remember - simple is good.

Tuesday 3 June 2008

Dealing With The Consequences

In a remarkable piece of coincidence, two WWII unexploded bombs were found today and caused air traffic delays. The first, at Stratford in east London, temporarily closed London City Airport. The second closed a runway at Amsterdam’s Schiphol airport. It’s amazing that over 60 years after these bombs were dropped, they are still being found; fortunately this time with no injury to anyone. I found out about both incidents because colleagues of mine flying from both City Airport and into Schiphol were delayed or had to alter their plans.

On a less serious level, but important nonetheless, we are having to live with the consequences of our data storage policies. We will have data being stored now which must be accessed in 60 years time. This will include medical and financial information, directly affecting individuals if the information cannot be retrieved. Obviously good data management practices are essential, but they will go beyond the normal storage of data we’ve been used to up until now.

Take the use of medical imagery; x-rays, CAT and PET scans, MRI scans. These all now produce complex digital images. If we store them using today’s format, how will the format and presentation of the images change in the future as we move to more complex display technology, higher resolutions and possibly 3D television screens?

If you want examples of what I mean, think of your old word processing documents. They could have been stored in an early version of Microsoft Word, WordPerfect, WordStar or other similar now defunct technology. You may be lucky and still be able to read them; you may not. Fortunately, apart from some formatting codes, most word processing documents can be opened in Notepad, WordPad or some raw file viewer which will at least allow you to recover the content. Things won’t be so easy with imaging files as their binary nature will render them useless if the software isn’t retained to open them.

I can see one of our future storage challenges will be to ensure all of our data is retained in a readable format for the future. XML addresses some of these problems and the adoption of file format standards will help. However, data will need to be refreshed as it ages. More metadata referring to the content format of files will have to be produced and software written to detect and convert unstructured files as file formats become defunct.

For many of us, we can afford to lose a few files here and there or perhaps print out the most important of our documents. For large organisations, the data management lifecycle has only just begun.