Tuesday, 23 December 2008

Storage Predictions for 2009

It's the end of another year and of course time to the obligatory posts on predictions in the industry for the next 12 months. True to form, I've spent some time thinking and here are my top 5 ruminations for the coming year.

  • EMC join SPC. EMC will finally have the epiphany we've all been dreaming of and embrace the world of generalised benchmarking. DMX will prove to be underpowered (due to the lack of SATA drives and the discovery that SSD is in fact the slowest disk technology) and be outperformed by Tony Asaro and his new Drobo appliance.
  • HDS Win At Web Awards. HDS embrace new media in a game changing way and Hu wins first prize at the Weblog Awards. Barry Burke is so impressed, he defects to HDS from EMC, becoming www.thestoragedefeatist.com.
  • XIV Conquers the World. IBM releases XIV-1000, by sending Moshe Yanai back in time from 2032, the time when Atmos became self-aware and took over storage for the entire world. EMC counter, sending StorageZilla (now aged 32) back to defeat Moshe in a monumental battle pitting SATA against SSD.
  • JWT joins Star Trek. Jon Toigo bows out of storage to follow a career in acting. His first role as the father of Willam Riker in Star Trek XI is critically acclaimed as a work of genius.
  • Convoy II: SWCSA is Released. The life of Marc Farley is brought to the big screen as he attempts to outwit the authorities and drive from the west to east coast using nothing but his blogging skills. Farley is portrayed on screen by Kris Kristofferson, reprising his role from the original cult movie.

Check back next year to see how many of these predictions did in fact come true.

Thursday, 18 December 2008

Do You Really Need a SAN - Of Course You Do!

The wacky boys at Forrester have a great new article posted relating to the requirement to have a Storage Area Network. Here's a link to their post. Tony Asaro and Hu Yoshida have both posted on the subject already but I couldn't resist having my 2 cents.

SANs evolved for very good reasons; the need to consolidate storage, the need to provide additional connectivity to storage arrays and the need to remove the requirement to closely couple the storage to the server (remember 25m limits on SCSI cabling). SANs and most notably fibre channel, enable that (as does iSCSI for the record).

Some of the Forrester objections to deploying SAN include;

Low Capacity Utilization - I did some work a couple of weeks ago at a client with purely DAS . They had 30% utilisation on their disks and after excluding boot drives, still had 30% utilisation. I've never seen a SAN array only 30% full, unless it was a new array onto which data was being deployed.

Inability to Prioritize Application Performance - Hmm, this seems a bit odd. DMX has Dynamic Cache Partitioning, Optimiser, USP has I/O prioritisation, Compellent can dynamically move data between tiers of storage; 3PAR has similar features which allow performance to be tweaked dynamically. There's also the same options in the fabric, especially with Cisco equipment. DAS has no such benefit, in fact if you have performance issues on a single server then you're potentially in a world of pain to fix it.

Long Provisioning Times - this is not a technology issue but one of process. I can provision terabytes of storage in minutes, however I have to guarantee that within a shared environment I don't take down another production environment - that's the nature of shared systems. In addition, users think they can just demand more and more storage without any consequences. Storage resources are finite - even more so in a non-scalable DAS solution. With sensible process, SAN storage can be turned around in hours - not the case for DAS unless you intend keeping spare disks onsite (at a price).

Soaring Costs - again, another conundrum. If you focus on pure hardware then SANs are inevitably more expensive, however TCO for storage is very rarely done. Don't forget SAN also includes iSCSI, which can be implemented across any IP hardware - hardly expensive.

So, there are other benefits that SAN easily wins over DAS.

Disaster Recovery. SAN-based replication is essential in large environments where the requirement to manually recover each server would be totally impractical. Imagine trying to recover 100+ database servers where each server requires the DBA to log in and perform forward recovery of shipped logs - all in a 2 hour recovery window.

Storage Tiering. SANs allow easy access to multiple storage tiers, either within the same array or across multiple arrays. Without SAN, tiering would be wasteful, as most servers would not be able to utilise multiple tiers fully.

SANs also provide high scalability and availability, simply not achievable with DAS.

There was a reason we moved to SAN. SAN has delivered, despite what Forrester say. However like all technologies, they need to be managed correctly. With sensible planning, standards and process, SANs knock DAS into a cocked hat.

Tuesday, 16 December 2008

Redundant Array of Inexpensive Clouds - Pt II

In my previous post I started the discussion on how cloud storage could actually be useful to organisations and not be simply for consumer use.


One of the big issues that will arise is the subject of standards. To my knowledge, there is no standard so far which determines how cloud storage should be accessed and how objects should be stored. Looking at the two main infrastructure providers, Amazon and Nirvanix, the following services are offered:

S3 (Simple Storage Service) - storage of data objects up to 5GB in size. These objects are basically files with metadata and can be accessed via HTTP or BitTorrent protocols. The application programming interface (API) uses REST/SOAP (which is standard) but follows Amazon's own standards in terms of functions to store and retrieve data.

Elastic Block Store (EBS) - this feature offers block-level storage to Amazon EC2 instances (elastic compute cloud) to store persistent data outside of the compute instance itself. Data is accessed at the block level, however it is still stored in S3.


Storage Delivery Network (SDN) - provides file-based access to store and retrieve data on Nirvanix's Internet Media File System. Access is via HTTP(S) using standard REST/SOAP protocols but follow Nirvanix's proprietary API. Nirvanix also offer access to files with their CloudNAS and FTP Proxy services.

The protocols from both Amazon and Nirvanix follow standard access methods (i.e. REST/SOAP) but the format of the APIs are proprietary in nature. This means the terminology is different, command structures are different, the method of storing and retrieving objects is different and the metadata format for referencing those objects is different.

Lack of standards is a problem. Without a consistent method for storing and retrieving data, it will become necessary to program to each service provider implementation, effectively causing lock-in to that solution or creating significant overhead for development.

What about availability? Some customers may choose not to use one service provider in isolation, in order to improve the availability of data. Unfortunately this means programming to two (or potentially more) interfaces and investing time to standardise data access to those features available in both products.

What's required is middleware to sit between the service providers and the customer. The middleware would provide a set of standardized services, which would allow data to be stored in either cloud, or both depending on the requirement. This is where RAIC comes in:

RAIC-0 - data is striped across multiple Cloud Storage infrastructure providers. No redundancy is provided, however data can be stored selectively based on cost or performance.

RAIC-1 - data is replicated across multiple Cloud Storage infrastructure providers. Redundancy is provided by multiple copies (as many as required by the customer) and data can be retrieved using the cheapest or fastest service provider.

Now there are already service providers out there offering services that store data on Amazon S3 and Nirvanix SDN; companies like FreeDrive and JungleDisk, however these companies are providing cloud storage as a service rather than offering a tool which integrates the datacentre directly with S3 and SDN.
I'm proposing middleware which sits on the customer's infrastructure and provides the bridge between the internal systems and the infrastructure providers. How this middleware should work, I haven't formulated yet. Perhaps it sits on a server, perhaps it is integrated into a NAS application, or a fabric device. I guess it depends on the data itself.
At this stage there are only two cloud storage infrastructure providers (CSIPs), however barriers to entry in the market are low; just get yourself some kit and an API and off you go. I envisage that we'll see lots of companies entering the CSIP space (EMC have already set their stall out by offering Atmos as a product, they just need to now offer it as a service via Decho) and if that's the case, then competition will be fierce. As the offering count grows, then the ability to differentiate and access multiple suppliers becomes critical. When costs are forced down and access becomes transparent, then we'll truly have usable cloud storage.

Monday, 15 December 2008

HDS Play Catch Up

Second post today!

So HDS have announced solid state disks. Here's the post. They're only 11 months behind EMC and once they've actually become available it will be nearly a full 12 months "late".

It's interesting to note that HP haven't (yet) made an equivalent announcement on XP. I imagine it will follow in the fullness of time, although its odd as Hitachi product announcements tend to be released by HDS and HP at the same time.

I wonder what HDS really think about this announcement? Part of the press release says:

"Flash-based SSDs in the USP V will help differentiate Hitachi Data Systems high-end offerings when deployed in combination with the company’s virtualization, thin provisioning and integrated management features."

Well Duh! As no other vendor (excluding the obvious HP) has virtualisation then clearly USP will always be differentiated, regardless of SSD support.

None of the HDS bloggers have co-ordinated a post with the announcement so there's no depth behind the press release, for instance to explain exactly how SSD and virtualisation create a differentiator - Tony??

Redundant Array of Inexpensive Clouds - Pt I

Storagezilla was quick to turn a Twitter conversation into a PR opportunity for EMC this week. Have a read. As one of the originators of this conversation, I'd intended to blog on it but was slightly beaten to print. Never mind, I've got more content to add to the discussion.

The original question was whether IT departments with purely DAS environments should consider going straight to cloud storage rather than implement traditional NAS or SAN.

For me the answer at the moment is a resounding no. Cloud computing is far too unreliable to commit production/operational data to it. However that's not to say the cloud can't be used for some things.

First of all, consideration needs to be given to the fact that all storage environments have a working set of data and that this forms only a small part of the overall quantity of data deployed across an enterprise. Most data is created and very quickly becomes inactive. This includes structured data, email, unstructured files and so on.

In some organisations, inactive data is retained - sometimes indefinitely, especially if it relates to content deemed "too hard" to process or legally sensitive. This inactive data is the perfect candidate for migration into the cloud, for a number of reasons;

  • It gets the data out of expensive datacentres, where the cost of maintaining that data is not just about the cost of the storage hardware, but also the whole TCO relating to data retention; power/cooling/floorspace, backup, technology refresh and so on.
  • It moves the data into a location where the cost of maintenance is simple to calculate as the cloud providers simply charge per GB per month.
  • It puts the data in a place where cloud providers could offer value added services.

Now, by value added services, I'm referring to a number of things. There's the possibility to offer simple services like automated virus scanning, content conversion and so on. There's also the option for the cloud providers to offer more advanced services.

Imagine you've terabytes of unstructured content that's been too difficult to process; perhaps there's copyrighted material in there, perhaps there's commercially useful data. Whatever it is, you don't have the time or the inclination to manage it, so up to now the data has been left, moved to cheaper storage and simply dumped in the storage landfill. Enter the cloud providers. For a fee, they will take this data off your hands and pick over it like parasites, removing illegal content, deleting irrelevant data and returning to you the gems in the rough that you should be re-using.

The cloud guys are in a perfect position to do it as they get to see *lots* of data and can build models of the content which allow them to automate the analysis process.

Now If data is pushed into the cloud, you (a) may want to guarantee security of the data and (b) standardise access to these providers. More on this in the next 2 posts.

Thursday, 11 December 2008

2V Or Not 2V (vendors that is)

Over on ITToolbox the age old subject of dual versus single vendor strategy has raised its head again.

This time, the consensus, apart from yours truly, was that a single vendor strategy was best - mostly because it is easier to implement.

I'm still of the opinion that a correctly executed dual-vendor strategy works well and can be achieved without the headache people think is involved. Here's some pointers as a recap.

  1. Standardise Your Services. I've seen many sites where a particular vendor is chosen over another for certain services - for instance using EMC for remotely replicated storage and HDS for non-replicated. If you want a real dual-vendor environment, each platform should offer the same services (unless by real exception).

  2. Standardise Your Support Matrix. Here's another issue; using one vendor for Windows and another for Unix because of things like driver or multi-pathing support.

  3. Standardise your Configuration. Keep things consistent. Create a design which you treat as consistent between vendors; for instance, in an Enterprise array, create a standard model which shows front-end port/cache/disk ratios and set a "module" size. This may be 8 FEPs/100 hosts/100GB. This becomes your purchasing unit when requesting quotes.

  4. Standardise Your Provisioning. Lots gets said about having to train staff twice or maintain two teams. This just isn't necessary. What is important is to document how storage is selected and provisioned (port choice, masking, LUN size etc).

  5. Standardise Your Offering. Give your customers no reason to question where there storage comes from. All they care about is availability, performance.

Ok there are some problems with dual-vendor'ing.

  1. Implementing a common tool set. No-one really fully supports multi-vendor provisioning. You will have to use more than one tool. Accept it. You can mitigate the problem however, by sensible scripting where necessary. This includes creating scripts which will do failover/replication support on HDS and EMC equipment. It can be done but needs to be thought through.

  2. Migration. Moving data from one platform to another will be problematic cross-vendor. However there are tools out there to do it - host and fabric based (even some array-based tools). Migration techniques need to be given serious thought before you spread data far and wide.

  3. Functionality. Not all vendors are the same and functionality is an issue. For instance, until recently the "big-boys" didn't do thin provisioning. You may have to compromise on your functionality or accept a limited amount of "one vendor only" functions.

Dual vendor is not for everyone. Size and complexity of environment will determine whether you feel comfortable with investing the time to manage dual (or even multi) vendors. However it can work and save you a shed load of cash into the bargain.

Wednesday, 10 December 2008

Storage Waterfall Revisited

A while back I presented a diagram showing how storage is lost throughout the provisioning process.

I've added a few more items onto the diagram and heres version 2. The additions show reasons why storage is lost at various points in the cycle, for example, disks not not in use, hot spares, not using all the remaining space on the disk etc.

If anyone has additional reasons I've missed, then please let me know.

The next step is to look at ways of retrieving this storage and improving efficiency.

Tuesday, 9 December 2008

All I Want For Christmas...

In the words of Mariah Carey, I don't want a lot for Christmas, I've got everything I need, but possibly not everything I want. Here's my Crimble list for this year (in no particular order):

  1. MacBook Pro - I guess I should see what all the fuss is about. I've never been an Apple fan (I guess it's a Marmite thing, you love them or hate them). Obviously I'll make sure I have VMware Fusion to run some decent Windows apps.
  2. Sony Ericsson Bluetooth Headphones. Can't get a pair of these in the UK, despite trying and having confirmed ordered cancelled. I already have the iPod Bluetooth broadcaster so just need something to send it too!
  3. Seagate FreeAgent Go. You can *never* have too much personal storage and I its hard to turn down brushed metal and a docking station. Preferred colour: green.
  4. Jaguar XK60. Slightly off-track, but desirable non-the-less. Don't actually care about the colour (although if push comes to shove I probably would). I expect this is the least likely item to be in my stocking on Christmas morning (unless it is a model one).

What's on your list?

Monday, 8 December 2008

Testing Out IT Principles - With Children

I was driving my youngest son back from Beavers this evening and we were talking computers and games (he's 6). He reminded me that some of his games won't run on Vista which is why I'd installed dual booted XP on the kid's machine. He asked me if I'd played "Age of Mythology" when I was young. I had trouble explaining to him the concept of an audio cassette tape and how my Sinclair Spectrum took 15 minutes to load games. I tried LPs as a starting point to try and explain cassettes and he said "oh yes, one of our teachers bought one in once". To him it was a piece of ancient history.

The interesting point we reached was me trying to explain why I'd even upgraded the kid's PC to Vista in the first place. I couldn't come up with a good reason at all (so I made up some lame excuse about future support).

Perhaps we should explain all of our upgrade/purchase decisions to our children and try and justify it with them - it might help us to understand it ourselves!

Thursday, 4 December 2008


In case you don't always go back and look at comments (and let's face it, it's not easy to track them), then have a look at Tony's comment from my post yesterday. It's good to see a bit of banter going on and that's what HDS could do with.

HDS have hardly developed a good blogging prowess, it's more a case of "oh well, better do that" than taking a lead in new media.

Look at EMC.

There's geeky leaky Storagezilla with his uber-technical posts and sneaky advance notice of EMC technology.

Next The Storage Anarchist with his ascerbic character and product assassinations of the competition.

And who can forget Chuck, EMC's futurologist with his head in the cloud.

There's others of course, filling in the technical detail. Apologies if I haven't mentioned you by name. EMC have certainly grabbed Web 2.0 and given it a good shake.

Sadly HDS don't seem to have the same enthusiasm for marketing to their customers. Blog posts are few and far between from the small slew of bloggers they have to date. Content is shallow and that's a big problem.

We *all* know USP is faster than DMX. Anyone who's had the products in their lab know exactly what I'm talking about. Unfortunately unless HDS make a song and dance about it, they're going to be the Betamax of the Enterprise storage world.

Tony, keep the posts coming! Give is some real substance to beat up the competition with!!

Wednesday, 3 December 2008

2 Days, 2 Bod Posts

For the second time in two days I find myself drawn to comment on a Storagebod related post.

The subject today is Tony Asaro's rant on one of StorageBod's recent posts denegrating virtualisation.

Now let's get things clear - I like HDS's implementation of virtualisation. I've deployed it, I'd recommend it again, but some of Tony's comments are way off base.

"The cost per GB for your DMX FATA and SATA drives is much higher than using other tiered storage solutions." - yes, but UVM ain't free - there's a licence charge. When you virtualise an array, you're not paying for just JBOD, you're paying for extra stuff like the controllers. Also on the USP array you have to reserve out ports for virtualisation; if you connect the storage together through a fabric then you'll be taking up fabric ports too. The point is, the cost of HDS virtualisation means there's a break even point in the TBs of storage - from my experience, that was a big number.

"Storagebod does not want to have applications span multiple storage systems but other IT professionals are open to doing this. And storage virtualization is a powerful technology to enable this. That is the point of any virtualization technology - to overcome the physical limitations of IT infrastructure." - there are very good reasons for not spanning arrays with applications, like ensuring replicated storage is consistent, for instance. Whilst virtualisation allows a virtual array to grow in size to almost limitless amounts (247PB in USP V) it also means there's a concentration of risk; multiple components to fail, multiple places where data can be pinned in cache when things go wrong. In fact, placing data on lower availability arrays will increase risk.

"That may be true for Storagebod but that is not my experience in most data centers. We are shifting from transactional data being the “big space-hogs” to unstructured data consuming the lion’s share." - this may be true, but USP LUN-based virtualisation isn't going to help here. Overlaying file-level granularity data migration onto LUN-based arrays would require a particularly complicated scheme for ensuring data for migration was concentrated onto exactly the right LUNs so they could be moved to another tier. Anyway, why put unstructured data on expensive enterprise arrays?

I think we all expected Tony would have something better to talk about than technology HDS brought to the market 4+ years ago. We need to hear something new, something game-changing (oh and not me-too stuff like HDS putting SSDs into their arrays).

Tomorrow I *promise* I'll talk about something else.

Tuesday, 2 December 2008

The SRM Conundrum

Martin (Storagebod) has an interesting post today. Rather than post a long reply, I've chosen to steal his thunder and post specifically on the subject - of SRM tools.

Apart from when I worked in the mainframe storage arena, I've always struggled with SRM tools. Just for reference, the mainframe was great - SMS did the job, although there were a few shortcomings like the lack of quota tools. In the open world, things are so, so different. I think the reason open systems is a problem relates to the fact that although standards exist, technology is all different.

Look back at my recent post; there are two fundamental issues happening here. First of all, each vendor has a different implementation of technology - EMC/HDS/IBM/3Par/Pillar/Equallogic, the list goes on. Why are they different? Because there has to be something to create a USP, a differentiator. Sure, front-end technology might be consistent; each vendor will implement LUNs and the fibre channel standards, but in reality the back-end deployment will be different as each manufacturer competes on features and functionality. The same applies for the switch vendors, NAS vendors, and so on.

SMI-S was meant to address these problems but never would as it basically dumbs down each vendor to a single set of features and doesn't address the platform specific functionality. Try using IBM and HDS arrays from ECC (in fact, try managing EMC arrays like Clariion from ECC) and you'll fall at the first post. I won't even suggest trying to use any other product like HiCommand...

Some software vendors have tried to do cross-platform SRM. Think of Creekpath. It failed miserably to offer cross platform support because (as Martin rightly states) they never understood how Storage Admins did their work.

The answer to the lack of an SRM tool would be for an independent to develop one. However there's one major barrier to entry and that's the vendors themselves. All the major vendors make a tidy profit (Martin's cash cow) from their SRM tools - software without which you could do *nothing* but for which you are obliged to pay. Why would those vendors give up that monopoly position?

I've been working on a tool for some months (see here) which will provide cross-platform reporting, but full SRM is another step again. Without full vendor support, and by that I mean full knowledge of the APIs and interfaces to their products, not just the standard SMI-S providers - and advance notice and access to new features -then developing an SRM tool will be impossible.

However if anyone is prepared to pony up the cash, I'm still up for it!!

Monday, 1 December 2008

Home Storage Management #1

My first-pass cleanup has focused on my laptop, which is my main work device.

I've already mentioned I segment data from applications by having a separate partition, in my case labelled L:\ for local. I also use offline files to map most of my data from a personal network share on my main file server.

The Offline Files feature enabled files from network file servers to be cached locally on a desktop or laptop for access when the PC is not connected to the network. As I travel a lot, Offline Files are essential for me and my local cache is quite large. However like a lot of people I choose to sync the whole of my network drive.

Using Treesize, I browsed the Offline Files cache, which is located by default in the CSC directory under the systemroot folder - in my case C:\Windows\CSC (CSC stands for Client Side Caching). A nice feature of Treesize is its ability to traverse the offline files folder directly as if it were a standard file system. That quickly allowed me to sort the offline files by size and type and immediately highlight some issues. I found;

  1. A directory called BackupToDiskTest which I'd used to test a backup product in 2005 (12GB of unwanted data).
  2. A large number of ISO files for software installation, which I moved to an archive directory on the main server.
  3. 2.7GB of home movie AVI files, subsequently moved to the main server.

Obviously I've been lazy in dumping everything into my own directory including data which I don't need offline. Now I didn't delete all of these files, however I did save space on my laptop drive, which is pretty limited at just over 103GB.

Rescanning the C:\ drive, I now found "System Volume Information". This is an area of disk used by Windows to store recovery information in the event that you need to restore your laptop to a previous known "good configuration". In my case, Windows was using 12.6GB of storage to retain my previous configuration details. Now, touch wood, I've never bothered to use the restore feature of Windows. I keep my machines pretty tidy and don't install a lot of test or junk software. The last restore point appeared to have been created by my virus scanner so I felt confident to delete the restore information. I did this by simply unchecking, applying and rechecking the drive letter in Control Panel -> System -> System Protection.

I also found a few other bits and pieces - some content in BBC iPlayer that had expired and could be deleted; 3.5GB of temp files in my local profile; another 5GB of home movie WMVs on my L: drive which I moved to the server.

So at the end of pass #1, things stand as follows;

Laptop C:\ Drive - capacity 103GB - allocated reduced from 75.4GB to 63.8GB (15%)

Laptop L:\ Drive - capacity 38.7GB - allocated reduced from 34.85GB to 24.1GB (31%)

I'm pleased with the savings, however there's a lot more to do. Each cleanup highlights new issues and I don't believe the Offline Files has reclaimed all of the files I moved. In money terms, the recovered space doesn't equate to anything of value, however it does mean as I move to consider online backups that I have only the relevant data being backed up - and that does translate into money.

Sunday, 30 November 2008

I Hate iTunes!

iTunes has to be one of the worst applications Apple make. It it truly awful.

I find it incredibly difficult to track where files came from, what's on my iPod Touch and not, what are duplicates and so on. The poor interface means I have files littered about my hard drive and on my iPod which I can't be sure I've listened to.

For a long time I used a MobiBlu cube. This is a fantastic device. Mine has 2GB of memory, a tiny screen and simple USB interface. By numbering my MP3 podcasts, I could easily see how many I'd listened to and delete them by number. The process worked - the numbering system I assigned meant I could easily change the order I listen to the files. A simple and effective process.

iTunes ruins all that. It insists on referring to the files by their MP3 ID3 tags, regardless of how I rename them. It fails to delete duplicates, it doesn't let me easily delete files which I've moved to my iPod.

I wanted to sort out an alternative to the MobiBlu for the car. I've got an iPod connector which broadcasts my iPod on the radio, but obviously it has a standard iPod plug and won't connect to the MobiBlu. However I've also got a standard 3.5mm jack one too - perhaps I should go back to the Mobiblu.

Has anyone seen any competing software to iTunes or do Apple simply not allow it?

Wednesday, 26 November 2008

Home Storage Management - Week 1

So after discussions on home storage, I'm going to do a weekly cleanup/report on what I've achieved. Here's the baseline;

Main Server; 927GB of usable storage (via Drobo) - 768GB in use. (82.84%). In fact I've consolidated a pair of mirrored 400GB drives onto the Drobo to make the full 768GB, so I've already freed these drives to be removed.


C: - 103GB total, 75.4GB in use (73.2%)

L: - 38.7GB, 34.85GB in use (90%)

I've included both C: (O/S) and L: (data) as my offline folder is on the C: drive


C: - 57.2GB - 34.3GB used (60%)

D: - 97.6GB - 4GB used (4.1%)

E: - 274GB - 133GB used (48.5%)

So that's the baseline. The first saving is to delete the Exchange backup - 314GB. More to follow.

Tuesday, 25 November 2008

Thin Provisioning or Good Practice, which is best? There's only one way to find out - Fight!

Marc Farley makes some interesting comparisons to storage purchasing decisions in a recent post. For the sake of disclosure, I do go to Costco and buy in bulk - no not 200lbs of chicken wings, but those things that can be divided and/or frozen (like salmon and coffee) - and more crucially things that don't become cheaper in price over time.

That is effectively Marc's argument; don't buy stuff you don't need yet because it will be cheaper in the future (not so with my salmon and coffee, I suggest). That's certainly true as we see a year on year reduction in storage per GB cost.

There are a number of reasons why people buy more than they need;

  1. New hardware deployment time is excessive due to datacentre restrictions and change control. In some sites this delay could be 3-6 months, so people work on the assumption that it's better to have more on the floor than be in a panic to deploy at the last minute.
  2. Business customers can't plan. It's a truism that everyone knows. Add on top the fact that chinese whispers inflate the original business requirement to two, three or four times more storage than actually needed.
  3. Vendors give discounts. Yes, shock! Vendors will sell you storage cheaper if you buy more. I know many places that buy complete arrays up front (even DMX-4 with 1920 drives!) to avoid the deploy time and get a better price.

There are many more reasons than this but you get the idea.

I've deliberately left off one issue - the inflexibility of some storage systems in their deployment method. Although this isn't directly a reason to buy more storage, it is certainly a reason why users hoard more storage on their servers. Monolithic arrays are way too slow at executing on configuration tasks and on dynamic rebalancing, requiring too much planning and thinking time to avoid bad layout and configuration issues.

So Marc, you should have stated that thin provisioning is only one aspect of reducing storage hoarding. Good practice is another. Flexible technology is an undoubted third.

Oh and 10 house points to the first non-UK person who can explain my post title!

Monday, 24 November 2008

Eating My Own Dog Food

After my last post relating to personal data storage, I thought I'd spend some time and check out what I was currently using.

On my main server, I store my data on a Drobo unit. The Drobo's visualisation of data is to present a 2GB LUN, regardless of the installed drives. This is slightly misleading when analysing storage utilisation as I only have 2x 1TB drives actually installed, which is around 1TB of usable space.

However, that problem aside, on analysis I see I have about 794GB of storage in use - around 79% of my capacity, which in a business environment I would consider to be close to the margin where I'd purchase more storage (depending on growth rate and deployment lead time).

Using Treesize Professional I did some initial analysis. Treesize is really quick and provides data in lots of different formats, including a bizarre format called Tree Map which uses cascading squares to indicate data types and capacities.

Immediately I realised that my Exchange backups have been writing to a single file as appended backups and that the file has grown to 321GB! I only have a round 2GB of actual email data, so I've never bothered to archive and all backups are full. Archiving won't save me that much at this point (although I could archive and reduce the daily backup size), however I will now start a new backup file and delete the old one in a few days. That gives me an instant 300GB back.

Digging further, my next biggest usage is media files. Many are home video which need processing, many are films or digitised music. I know these files need more work to get organised and there are some files which can be deleted, so I just need to put the effort in to sort them.

After that the remainder of files are ISO and EXE installation files (like various copies of Solaris and Linux distros) and could be archived to DVD. The rest are Excel, Word, Powerpoint and other miscellaneous office files which comprise only a few GB.

So in reality, my core data is probably less than 10GB. I could even consider putting this into an online service. Unfortunately the process by which I could easily isolate and selectively backup those files isn't easy - which is why a lot of the time we resort to just backing up everything.

What has all this taught me? Well, I've saved 300GB of storage at a stroke, I know how to prioritise my next group of storage saves and I know I need to selectively move out data I'd like on an online backup service.

All good. I reckon simple organisation should only take me an hour a week - the problem is finding that hour!!

Tuesday, 18 November 2008

Just Delete It Claus, Just Delete It

Claus Mikkelsen has woken up recently and started posting after a large break. Perhaps he's preparing for all those impending Christmas deliveries. Anyway, the crux of his post it to explain how he's moved from 2-4TB of home storage rather than take the time to sort out the mess of his home data. He then goes on to detail lots of clever technology which allows more data to be stored with less.

As I've posted many times before, we're just storing ourselves up a heap of trouble by not addressing the underlying issue here - delete the unwanted data.

We're creating storage landfills which will still need to be sorted out in the future. Like toxic waste in a rubbish dump, losing that critical file will eventually cost dearly.

Think of Claus' problem. Moving from 2-4TB doubles the amount of data that needs to be backed up (how do you back up 2TB of storage at home?), means any restores take longer, means spending more time searching for that file you know you had once, but can't remember what you called it - and if you use an online service for backup means you are paying unnecessarily for each month.

Take my advice, spend the time in developing (a) a good naming standard for your home files (b) a good standard for directories for storing your home files (c) delete the stuff you don't need. Immediately. Period.

Monday, 17 November 2008

Decho - EMC Takes Over The World

Chris Mellor just announced the news that EMC have bundled their Pi and Mozy acquisitions into a single entity, branded as Decho. I was far too slow and Storagezilla beat me to the mandatory EMC post.

So, with Mozy and Pi we now have our data and backups online in the EMC cloud - which conveniently arrived last week as Atmos.

I may have been somewhat overly negative towards EMC in previous posts (they're big boys, I'm sure they can take it), however the layering of cloud storage offerings with Atmos as the foundation (assuming they eat their own dog food and use it) and content/backup over the top does move EMC into a new and interesting market segment in offering storage services rather than just tin (or software for that matter).

Where's the logical conclusion as to where EMC are headed? Is the move to Storage-as-a-Service an implicit acceptance that, over time, hardware will become even more commoditised and that services are the future? In the long term, surely that's the ideal scenario for the end user; all data and services in "the cloud" somewhere with no need to know where/how the data is stored other than service level and performance guarantees. It's not likely to happen in the near future but as a long term trend, it is certainly compelling.

Thursday, 13 November 2008

Obligatory Atmos Post

I feel drawn to post on the details of Atmos and give my opinion whether it is good, bad, innovative or not. However there's one small problem. Normally I comment on things that I've touched - installed/used/configured/broken etc, but Atmos doesn't fit this model so my comments are based on the marketing information EMC have provided to date. Unfortunately the devil is in the detail and without the ability to "kick the tyres", so to speak, my opinions can only be limited and somewhat biased by the information I have. Nevertheless, let's have a go.


From a hardware perspective, there's nothing radical here. Drives are all SATA-II 7.2K 1TB capacity. This is the same as the much maligned IBM/XIV Nextra, which also only offers one drive size (I seem to remember EMC a while back picking this up as an issue with XIV). In terms of density, the highest configuration (WS1-360) offers 360 drives in a single 44U rack. Compare this with Copan which provides up to 896 drives maximum (although you're not restricted to this size).

To quote Storagezilla: "There are no LUNs. There is no RAID. " so exactly how is data stored on disk? What methods are deployed for ensuring data is not lost due to a physical issue? What is the storage overhead of that deployment?

Steve Todd tells us:

"Atmos contains five "built-in" policies that can be attached to content:

  • Replication
  • Compression
  • Spin-down
  • Object de-dup
  • Versioning

When any of these policies are attached to Atmos, COS techniques are used to automatically move the content around the globe to the locations that provide those services."

So, does that mean Atmos is relying on replication of data to another node as a replacement for hardware protection? I would feel mighty uncomfortable to think I needed to wait for data to replicate before I had some form of hardware-based redundancy - even XIV has that. Worse still, do I need to buy at least 2 arrays to guarantee data protection?

Front-end connectivity is all IP based, which presumably includes replication too, although there are no details of replication port counts or even IP port counts, other than the indication of 10Gb availability, if required.

One feature quoted on all the literature is Spin Down. Presumably this means spinning down drives to reduce power consumption; but spin down depends on data layout. There are two issues; if you've designed your system for performance, data from a single file may be spread across many spindles. How do you spin down drives when they all potentially contain active data? If you've laid out data on single drives, then you need to move all the inactive data to specific spindles to spin them down - that means putting the active data on a smaller number of spindles - impacting performance and redundancy in the case of a disk failure. The way in which Atmos does its data layout is something you should know - because if Barry is right, then his XIV issue could equally apply to Atmos too.

So to summarise, there's nothing radical in the hardware at all. It's all commodity-type hardware - just big quantities of storage. Obviously this is by design and perhaps it's a good thing as unstructured data doesn't need performance. Certainly as quoted by 'zilla, the aim was to provide large volumes of low cost storage and compared to the competition, Atmos does an average job of that.


This is where things get more interesting and to be fair, the EMC message is that this is a software play. Here are some of the highlights;

Unified Namespace

To quote 'zilla again:

"There is a unified namespace. Atmos operates not on individual information silos but as a single repository regardless of how many Petabytes containing how many billions of objects are in use spread across whatever number of locations available to who knows how many users."

I've highlighted a few words here because I think this quote is interesting; the implication is that there is no impact on the volume of data or its geographical dispersion. If that's the case (a) how big is this metadata repository (b) how can I replicate it (c) how can I trust that it is concurrent and accurate in each location.

I agree that a unified name space is essential, however there are already plenty of implementations of this technology out there, so what's new with the Atmos version? I would want to really test the premise that EMC can provide a concurrent, consistent name space across the globe without significant performance or capacity impact.

Metadata & Policies

It is true that the major hassle with unstructured data is the ability to manage it using metadata based policies and this feature of Atmos is a good thing. What's not clear to me is where this metadata comes from. I can get plenty of metadata today from my unstructured data; file name, file type, size, creation date, last accessed, file extension and so on. There are plenty of products on the market today which can apply rules and policies based on this metadata, however to do anything useful, then more detailed metadata is needed. Presumably this is what the statement from Steve means: "COS also implies that rich metadata glues everything together". But where does this rich metadata come from? Centera effectively required programming their API and that's where REST/SOAP would come in with Atmos. Unfortunately unless there's a good method for creating the rich metadata, then Atmos is no better than the other unstructured data technology out there. To quote Steve again:

"Rich metadata in the form of policies is the special sauce behind Atmos and is the reason for the creation of a new class of storage system."

Yes, it sure is, but where is this going to come from?

Finally, let's talk again about some of the built-in policies Atmos has:

  • Replication
  • Compression
  • Spin-down
  • Object de-dup
  • Versioning
All of these exist in other products and are not innovative. However extending policies is more interesting; although I suspect this is not a unique feature either.

On reflection I may be being a little harse on Atmos, however EMC have stated that Atmos represents a new paradigm in the storage of data. If you make a claim like that, then you need to back it up. So, still to be answered;

  • What resiliency is there to cope with component (i.e HDD) failure?
  • What is the real throughput for replication between nodes?
  • Where is the metadata stored and how is it kept concurrent?
  • Where is the rich metadata going to come from?

Oh, and I'd be happy to kick the tyres if the offer was made.

Tuesday, 11 November 2008

You Sunk My Battleship!

I spent some time today with the good folks at 3Par, as offered by Marc Farley a few posts ago. It was good to get more of a background on the product and also see what's happening in the future (although I can't talk about that!).

I think most of their technology is fairly well known (thin provisioning, wide striping etc), but two features stood out for me.

Dynamic Optimisation

Dynamic Optimisation allows a LUN to be moved around the array based on a number of parameters. One of the most interesting is the ability to change the RAID type without any outage or downtime. Think about it; you create LUNs as RAID-10 devices then realise you don't need to have that level of performance. With a couple of clicks, your LUN is changed and re-laid out as anything from RAID-5 2+1 to 8+1. The key factor here is that this is seamless, needs no outage or no host-based migration.

Compare and contrast this to a traditional "monolithic" array like a DMX-4 or USP. Well, the DMX-4 uses hypers (slices of a physical disk) which are combined up to create RAID devices. Naturally, the hypers used to create a RAID-5 device are a different size to those used on a RAID-1 device when creating a LUN of the same usable size. Re-purposing a LUN simply isn't possible; a RAID-1 LUN is a RAID-1 LUN for it's lifetime. In fact, releasing storage and recreating different LUNs although technically possible, is very rarely done as it's a complete nightmare to accomplish. If you like analogies (and I do), re-structuring an existing DMX array is a little like going back in time to a WWII RAF or Navy Operations room with lots of little models of ships being pushed around by wooden paddles, compared to today's modern ATC systems.
The way forward for monolithic or enterprise arrays has to be flexibility, especially with the need to provide storage for virtualised environments.
Thin Built In
So thin provisioning is on everyone's lips as a must have feature and 3Par claim to be the grandfather of the technology. I say claim, as TP has been done before, but that's not what I'm concerned about. What interests me more is the issue of Fat->Thin LUN migration. That is, copying "full size" LUNs onto a thin provisioned array. TP is great for new data. Create the LUNs, provision out the storage and voila!, more than 100% allocation on your array! TP relies on the fact that writing data to a LUN is a block-level process, and blocks are not necessarily written to sequentially, so there can be plenty of unwritten space on the LUN. However, copying a LUN from a standard array to a TP array will write all of the data blocks, negating the TP benefit.
3Par's arrays now have "Thin Build In", a custom ASIC which can detect unused space as data is written. This means fat->thin can be achieved as data is moved onto the array without any other intervention. It's worth thinking this one through; perhaps someone can answer whether EMC's TP implementation in 73 code and HDS's TP implementation on USP-V can do that and if not, how they expect to migrate existing data onto their arrays and still see the TP benefit.
While I'm on the subject, what happens if you defrag a TP volume? Well, data will be consolidated onto contiguous blocks of space and the location where data is moved from will get logically freed, but I don't believe products such as Diskeeper will write anything on the old data. What if it did? Well if that space was cleared, a 3Par array could reclaim this storage. So defragmenters out there - do you do this?
Anyway, thanks to the 3Par guys for the head's up on their technology; the next few years certainly are going to be interesting in this space!

Monday, 10 November 2008

EMC Announces Hulk/Maui (well almost)

Yes, it's almost here folks. The blogosphere tells us so. First of all, there's Chuck Hollis' latest post pondering the issue of how the storage cloud works and why it's really difficult to pick up data quickly in geographically dispersed areas. He leaves us with the cliffhanger;

"The magic isn't in the hardware, it's in the software ..."

So, Hulk/Maui's a software product then...

Next there's StorageZilla, with his viral marketing approach. No technical details here, just cryptic comments relating to trendy cultural references - Dr Who - and some bloke in a space helmet. Clearly I'm not worthy as I didn't understand the second one at all.

This morning we have 'Zilla's full disclosure with his latest post.

All of this is prior to an official annoucement - nothing on EMC's press release site yet.

What's next? So, expect Barry Burke to post a technical assassination of the opposition over at the Storage Anarchist. Then we can have other bloggers putting their spin on it too. I can't be bothered to list them all; I'm sure you know who they are.

But wait - have I not just fallen into the viral marketing trap too by helping out EMC? D'oh, perhaps those folks at Hopkinton Towers are more clever than we think....

Wednesday, 5 November 2008

New Seagate Savvio Drives

Seagate have announced the availability of the next generation of Savvio 2.5" drives running at 15K. Capacity is increased to 146GB (I'm waiting for confirmation this is the case as there are no data sheets online yet).

The capacity increase is overdue to keep up with the roadmap of 3.5" drives and co-incidentally I'm in the process of reviewing the existing Savvio model at the moment, more on this next week.

Meantime, last December I posted on the subject of 2.5" drives in Enterprise arrays and created this spreadsheet comparing different models. The reason for creating the sheet was to see if the physical density of 2.5" drives would exceed that of traditional 3.5" models. At the time, the best 2.5" drive offered 0.702GB/cm3 compared to a slightly better 0.796GB/cm3 for the 3.5" equivalent (73GB versus 300GB drives respectively).

With the release of 450GB 15K drives, the 3.5" pushed the lead further to 1.194GB/cm3. The latest Savvio has grabbed that lead back with 1.404GB/cm3!

OK, so the maths is not perfect and I'm talking about fractional differences which could be absorbed by the connectivity and interface attachments needed to hot plug these devices into arrays, but consider this; each 450GB 3.5" drive can be replaced by three 146GB 2.5" equivalents, giving 3 times as much parallel I/O capability. In storage arrays this is bound to have a benefit on throughput.
Now as to why 2.5" adoption hasn't occurred so far, word on the street is that it hasn't occurred due to the lack of multiple vendor streams. For the record, could only find Fujitsu and Seagate doing 2.5" 15K drives today.

Tuesday, 4 November 2008

LUN Stacker

A recent post from Martin "The Bod" Glassborow got me thinking about the whole process of LUN consolidation. I've done lots of migrations where people quake at the thought of changing the LUN size from one array to another. Now, I almost always want to change LUN sizes, as the vendor specific ones - 8.43GB/13.59GB etc are pretty painful and wasteful at the same time.

There's another good reason to standardise on LUNs. If you've implemented a good dual-vendor strategy and sorted your firmware driver stack out, then you can position to take storage from any of your preferred vendors. There's nothing better than having all of your vendors sweating on that next 500TB purchase when they know you take your storage from either or EMC/HDS/HP/IBM.

If LUNs and the I/O stack are all standardised, you can move data around too. The difficult part as alluded to in Martin's post is achieving the restacking of data.

Here's the problem; SAN storage is inherently block based and the underlying hardware has no idea of how you will lay out your data. Have a look at the following diagram. Each LUN from a SAN perspective is divided into blocks and each block has a logical block address. The array just services requests from the host for a block of data and reads/writes it on demand. It is the operating system which determines how the file system should be laid out on the underlying storage. Each volume will have a standard location (or standard method of calculating the location) for what was called the VTOC (Volume Table of Contents), also known as the FAT (File Allocation Table) in DOS and MFT (Master File Table) in NTFS. There are similar constructs for other O/S versions like Linux but I'm not 100% certain of the terminology so won't risk the rath of getting it wrong.

The layout of data on a file system is not a trivial task. Apart from keeping track of files, there's the requirement to keep track of free space and to be able to recreate the file index in the case of corruption, so some kind of journalling is likely to be implemented. There are also features such as compression, Single Instancing, Encryption, etc which all add to the mix of understanding exactly how file data is laid out on disk.

Now think of how multiple LUNs are currently connected together. This will be achieved with either a Volume Manager (like VxVM), supplied as a separate product, or a native LVM (logical volume manager). All of these tools will spread the "logical" volume across multiple LUNs and will format the LUN with information to enable the volume to be recreated if the LUNs are moved to another host. VxVM achieves this by having a private area on each LUN which contains metadata to rebuild the logical volume. Each LUN can be divided into sub-disks and then recombined into a logical volume, as shown in this diagram.

So a physical LUN from an array may contain a whole or partial segment of a host volume, including LVM metadata. Determining what part, whether all the parts are on this array (and where) is a tricky task - and we're expecting that the transmission protocol (i.e. the fabric) can determine all of this information "on the fly" as it were.

My thought would be - why bother with a fabric-based consolidation tool? Products like VxVM provide a wide set of commands for volume migration, although not automated they certainly make the migration task more simple. I've seen some horrendous VxVM implementations, which would require some pretty impressive logic to be developed in order to understand how to deconstruct and reconstruct a volume. However life is not that simple, and host-based migrations aren't always easy to execute on, so potentially a product would be commercially viable, even if the first implementation was an offline version which couldn't cope with host I/O at the same time.

Funny, what's required sounds a bit like a virtualisation product - perhaps the essence of this is already coded in SVC, UVM or Incipient?

Monday, 3 November 2008


"Innovative - featuring new methods or original ideas - creative in thinking" - Oxford English Dictionary of English, 11th edition.

There have been some interesting comments over the weekend, specifically from EMC in regard to this post which I wrote on Benchmarketing started by Barry Burke and followed by Barry Whyte.

"Mark" from EMC points me to this link regarding EMC's pedigree on innovation. Now that's like a red rag to a bull to me and I couldn't help myself going through every entry and summarising them.

There are 114 entries, out of which, I've classified 44 as marketing - for example appointing Joe Tucci (twice) and Mike Ruettgers (twice) and being inducted into the IT Hall of Fame hardly count as innovation! Some 18 entries relate directly Symmetrix, another 18 to acquisition (nnot really innovation if you use the definition above) and another 7 to Clariion (also an acquisition).

From the list, I've picked out a handful I'd classify as innovating.

  • 1987 - EMC introduce solid state disks - yes, but hang on, haven't they just claimed to have "invented" Enterprise Flash Drives?
  • SRDF & Timefinder - yes I'd agree these are innovative. SRDF still beats the competition today.
  • First cached disk array - yes innovation.

Here's the full list taken from the link above. Decide for yourself whether you think these things are innovative or not. Acquisitions in RED, Marketing in GREEN. Oh and if anyone thinks I'm being biased, I'm happy to do the same analysis for IBM, HP, HDS etc. Just point me at their timelines.

  • Clariion CX4 - latest drives, thin provisioning?
  • Mozy - acquisition
  • Flash Drives - 1980's technology.
  • DMX4 - SATA II drives and 4Gb/s
  • Berkeley Systems etc - acquisition
  • EMC Documentum - acquisition
  • EMC study on storage growth - not innovation
  • EMC floats VMware - acquisition
  • EMC & RSA - acquisition
  • EMC R&D in China
  • EMC Clariion -Ultrascale
  • EMC Smarts - acquisition
  • Symmetrix DMX3
  • Smarts, Rainfinity, Captiva - acquisitions
  • EMC - CDP - acquisition
  • EMC Clariion - Ultrapoint
  • EMC DMX3 - 1PB
  • EMC Invista - where is it now?
  • EMC Documentum - acquisition
  • Clariion AX100 - innovative? incremental product
  • Clariion Disk Library (2004) - was anyone already doing this?
  • DMX-2 Improvements - incremental change
  • EMC VMware - acquisition
  • EMC R&D India - not innovative to open an office
  • EMC Centera - acquisition - FilePool
  • EMC Legato & Documentum - acquisitions
  • Clariion ATA and FC drives
  • EMC DMX (again)
  • EMC ILM - dead
  • EMC Imaging System? Never heard of it
  • IT Hall of Fame - hardly innovation
  • Clariion CX
  • Information Solutions Consulting Group - where are they now?
  • EMC Centera - acquisition
  • Replication Manager & StorageScope - still don't work today.
  • Dell/EMC Alliance - marketing not innovation
  • ECC/OE - still doesn't work right today.
  • Symmetrix Product of the Year - same product again
  • Joe Tucci becomes president - marketing
  • SAN & NAS into single network - what is this?
  • EMC Berkeley study -marketing
  • EMC E-lab
  • Symmetrix 8000 & Clariion FC4700 - same products again
  • EMC/Microsoft alliance - marketing
  • EMC stock of the decade - marketing
  • Joe Tucci - president and COO - marketing
  • EMC & Data General - acquisition
  • ControlCenter SRM
  • EMC Connectrix - from acquisition
  • Software sales rise - how much can be attributed to Symmetrix licences
  • Oracle Global Alliance Partner - marketing
  • EMC PowerPath
  • Symmetrix capacity record
  • EMC in 50 highest performing companies - marketing
  • EMC multiplatform FC systems
  • Timefinder software introduced
  • Company named to business week 50 - marketing
  • EMC - 3TB in an array!!
  • Celerra NAS Gateway
  • Oracle selects Symmetrix - marketing
  • SAP selects Symmetrix - marketing
  • EMC Customer Support Centre Ireland - marketing
  • Symmetrix 1 Quadrillion bytes served - McDonalds of the storage world?
  • EMC acquires McDATA - acquisition
  • EMC tops IBM mainframe storage (Symmetrix)
  • Symmetrix 5100 array
  • EMC 3000 array
  • EMC BusinessWeek top score - marketing
  • Egan named Master Entrepreneur - marketing
  • EMC 5500 - 1TB array
  • EMC joins Fortune 500 - marketing
  • SRDF - innovation - yes.
  • Customer Council - marketing
  • EMC expands Symmetrix
  • EMC acquires Epoch Systems - basis for ECC?
  • EMC acquires Magna Computer Corporation (AS/400)
  • EMC R&D Israel opens - marketing
  • Symmetrix 5500 announced
  • Harmonix for AS/400?
  • EMC ISO9001 certification - marketing
  • Mike Ruettgers named president and CEO - marketing
  • Symmetrix arrays for Unisys
  • Cache tape system for AS/400
  • EMC implements product design and simulation system - marketing
  • Product lineup for Unisys statement - marketing
  • DASD subsystem for AS/400
  • EMC MOSAIC:2000 architecture
  • EMC introduces Symmetrix
  • First storage system warranty protection - marketing
  • EMC Falcon with rapid cache
  • First solid state disk system for Prime (1989)
  • Reuttgers improvement program - marketing
  • First DASD alternative to IBM system
  • Allegro Orion disk subsystems - both solid state (1988)
  • EMC in top 1000 business - marketing
  • EMC joins NYSE - marketing
  • First cached disk controller - innovation - yes
  • Manufacturing expands to Europe - marketing
  • EMC increases presence in Europe and APAC - marketing
  • Archeion introduced for data archiving to optical (1987)
  • More people working on DASD than IBM - marketing
  • EMC introduces solid state disks (1987)
  • Storage capacity increases - marketing
  • EMC doubles in size - marketing
  • Product introductions advance computing power - marketing
  • HP memory upgrades
  • EMC goes public - marketing
  • EMC announces 16MB array for VAX
  • Memory, storage products boost minicomputer performance
  • EMC offers 24 hour support
  • Testing improves quality - marketing
  • Onsite spares program - marketing
  • EMC delivers first product - marketing
  • EMC founded - marketing

Friday, 31 October 2008

Get the Balance Right

It's not very often I side with one vendor or another however after BarryB's recent post regarding "Benchmarketing" I feel obliged to comment. Have a read of Barry Whyte's rebuttal too.

We see technology advancements because "concept" devices are used to drive innovation but don't necessarily translate directly to end-user products. Look at the fashion industry - some of the most outrageous outfits are paraded down the catwalk but the same dress, coat, hat or whatever isn't sold in the shops. Instead it influences the next fashion season.

Look at the motor industry - concept cars appear well before actual consumer products. We may laugh at some and marvel at others - take the Bugatti Veyron. It is well known that Volkswagen make a loss on each car produced, however what counters this is the publicity, the research, the kudos of being able to claim Veyron technology (disputably the fastest car in the world) is deployed in the standard VW range. Lexus is another good example of a brand created by Toyota to perform the same function. Much the same can be said for Formula 1.

Now, I'm not endorsing IBM per-se here, however I don't see the harm with IBM marketing a "concept" piece of technology which could lead to innovation in the future. After all, IBM is well known for research of this kind; the disk drive and the tape drive spring to mind.

Even EMC's own bloggers question whether EMC is known for innovation and other than Symmetrix, I can't think of one thing I view as an EMC "idea".

Anyway, 'nuff said. As previously offered - I would love to take the position of moderator in developing real world benchmarking - bring it on!!

Who Ya Gonna Call?

Here's a quality piece of reporting from TechCrunch on the state of Facebook and their data problems. I mentioned just last week in this post about their data growth. It's incredible that they're purchasing a new Netapp 3070 filer each week!

I'm surprised that Facebook would be continually purchasing NAS filers to grow their content. There must be a rolling set of pictures, thumbnails and so on that are frequently looked at, but there also must be a significant amount that aren't and could be archived to super-dense nearline type technology akin to the Copan products.

Unfortunately when data growth is so intense, it isn't always easy to see the wood for the trees and from previous and current experience, using Netapp creates the risk of wasted resources.

In my experience, looking at just block-based arrays, I've always seen around 10-15% of orphan or unused resources and sometimes higher. When host-based wastage is taken into consideration, the figure can be much worse, although host reclamation is a much more intense process.

I'm willing to offer to anyone out there who has more than 50TB of storage on storage arrays a free analysis of their environment - for a 50:50 split of any savings that can be made. As budgets tighten, I think there will be more and more focus on this kind of work.

Pillar Crumbles

I picked this up last night on Mike Workman's blog over at Pillar. Looks like they're suffering the downturn. Storagezilla thinks this could be 30% of the workforce. I'm sure this is going to be one of many bad news stories from the storage industry we hear over the next few months.

I've never understood the point of Pillar's offering. Differentiating performance tiers based on the specific place on a disk seems a dead end idea. Firstly, disks might appear to be random access devices but if you're accessing one cylinder on a drive you can't be accessing another at the same time. You need some pretty clever code to ensure that lower tiered I/O requests don't overwhelm tier 1 requests in this kind of shared environment. In addition, everyone says SSDs are the future. Once these devices are mainstream, Pillar's tiering model is defunct (unless SSDs have some performance variant across different parts of the silicon!) as there's no differential in performance across an SSD device.

For me, a Compellent type architecture still seems best - granular access to each storage tier (including SSD) with dynamic relocation of data.

** disclaimer - I have no affiliation with Pillar or Compellent **

Thursday, 30 October 2008

SMI-S Is Dead

Take it from me, SMI-S is a thing of the past. If there's one thing the last few months have taught me it's how different each vendor's products really are. I've been working on a tool called SRA (see the link here) which will report on storage in a consistent manner. Let me tell you that isn't easy...

  • EMC Symmetrix/DMX - Physical disks are carved into smaller segments called hypers. These are then recombined into LUNs which then might be recombined into composite devices (metas) and replicated, cloned or snapped. The hypers that make up a LUN can come from anywhere within an array and can be moved around at will by a tool designed to improve performance, completely ruining your original well-planned configuration. Combining hypers give you RAID, which wasn't RAID before and was something called mirrors but is now, and is even RAID-6! Devices have personalities which survive their presentation or removal from a port. A device can have multiple personalities at the same time. LUNs use a nice numbering system based on hex - but don't expect them to number nicely if you destroy and create devices. Bit settings (flags) are used to ensure host SCSI commands work correctly.
  • HDS USP/HP XP - Physical disks are grouped into RAID groups from which LUNs are carved. Until recently you couldn't span RAID groups easily (unless you were combining some free space in each RAID group). Devices don't have a personality until they're presented to a host on a port, but they can have multiple personalities. HDS use a form of punishment known as CCI for anyone foolish enough to think they had made their arrays easy to manage. LUNs are numbered using a relic of the mainframe and yes, you can move things around to balance performance, but don't think you can do it unless there are spare LUNs (sorry LDEVs) around. Different host types are supported by a setting on a host group which lets you confuse the hell out of every one by telling them their LUN numbers are all the same but unique. Oh, and the storage the user sees doesn't actually have to be in the array itself.
  • HP EVA - Phew! Physical disks are managed in groups (which it's recommended to only have one of, but you can have more if you really must) but they don't use RAID at the group level because that would be too easy. Instead disks are grouped into Redundancy Storage Sets, which reduce the *risk* of disk failures but don't protect directly against them. LUNs are created only when they need to be presented to a host and they don't have simple LUN numbers, but rather 32 digit UUIDs. RAID protection is done at the LUN level, making it more difficult to conceptualise than either of the previous two examples.
  • Pillar Axiom - now we're getting really abstract. With Axiom, you can tier data on different levels of performance, but wait for it - they will be on the same drive, but utilising different parts of the same spindle! Argh! Enough!

Clearly every vendor wants to differentiate their product so you'll buy from them and not the competition. In some respects they *have* to differentiate otherwise all the vendors would spend their time in litigation with each other over patent copyright! (wait a minute, they already are). So SMI-S or any other standard is going to have a near impossible time creating a single reference point. Add to the mix the need to retain some competitive advantage (a bit like Microsoft holding back the really useful API calls in Windows) and to sell their own management tools and you can see why SMI-S will be at best a watered down generic interface.

So why bother. There's no benefit. Every vendor will give lip service to the standard and implement just what they can get away with.

The question is, what would replace it? There's no doubt something is needed. Most SRM tools are either overbloated, poorly implemented, expensive, or plainly don't work so some light touch software is a must.

I think the interim solution is to get vendors to conform to a standard API format, for example XML via an IP connection to the array. Then leave it to the vendor how to code up commands for querying or modifying the array. At least the access method would be consistent. We don't even see that today. All we need now is an acronym. How about Common Resource Access Protocol?

Wednesday, 29 October 2008

Understanding EVA - revisited

Thanks to all those who posted in response to Understanding EVA earlier this week, especially Cleanur who added a lot of detail. Based on the additional knowledge, I'd summarise again:

  • EVA disks are placed in groups - usually recommended to be one single group unless there's a compelling reason not to (like different disk types e.g. FC/FATA).

  • Disk groups are logically divided into Redundancy Storage Sets, which can be from 6-11 disks in size, depending on the number of disks in the group, but ideally 8 drives.

  • Virtual LUNs are created across all disks in a group, however to minimise the risk of data loss from disk failure, equal slices of LUNs (called PSEGs) are created in each RSS with additional parity to recreate the data within the RSS if a disk failure occurs. PSEGs are 2MB in size.

  • In the event of a drive failure, data is moved dynamically/automagically to spare space reserved on each remaining disk.

I've created a new diagram to show this relationship. The vRAID1 devices are pretty much as before, although now numbered as 1-1 & 1-2 to show the two mirrors of each PSEG. For vRAID5, there are 4 data and 1 parity PSEG, which initially hits RSS1, then RSS2 then back to RSS1 again. I haven't shown it, but presumably the EVA does a calculation to ensure that the data resides evenly on each disk.

So here's some maths on the numbers. There are many good links worth reading; try here and here. I've taken the simplest formula and churned the numbers on a 168-drive array with a realistic MTBF (mean time before failure) of 100,000 hours. Before people leap in and quote the manufacturers numbers that Seagate et al provide, which are higher figures, remember arrays will predictively fail a drive and in any case with temperature variation, heavy workload, manufacturing defects etc, the probability is lower than manufacturing figures (as Google have already pointed out).

I've also assumed a repair (i.e. replace) time of 8 hours, which seems reasonable for arrays unattended overnight. If disks are not grouped, then the MTTDL (mean time to data loss) is about 44553 hours, or just over five years. This is for a single array - imagine if you had 70-80 of them - the risk would be increased. Now, with the disks in groups of 8 (meaning that data will be written across only 8 disks at a time), the double disk failure becomes 1,062,925 hours or just over 121 years. This is without any parity.

Clearly grouping disks into RSSs does improve things and quite considerably so, even if no parity is implemented, so thumbs up to RSSs from a mathematical perspective. However if a double disk failure does occur then every LUN in the disk group is impacted as data is spread across the whole disk group. So it's a case of very low probability, very high impact.

Mark & Marc commented on 3Par's implementation being similar to EVA. I think XIV sounds similar too. I'll do more investigation on this as I'd like to understand the implications of double disk failures on all array types.

Tuesday, 28 October 2008

Virtualisation: LeftHand VSA Appliance - Part Two

In my previous post covering LeftHand's Virtual Storage Appliance, I discussed deploying a VSA guest under VMware. This post discusses performance of the VSA itself.

Deciding how to measure a virtual storage appliance's performance wasn't particularly difficult. VMware provides performance monitoring through the Virtual Infrastructure Client and gives some nice pretty graphs to work with. So from the appliance (called VSA1 in my configuration) I can see CPU, disk, memory and network throughput.

The tricky part comes in determining what to test. Initally I configured an RDM LUN from my Clariion array and ran the tests against that. Performance was poor and when I checked out the Clariion I found it was running degraded with a single SP and therefore no write cache. In addition, I also used a test Windows 2003 VM on the same VMware server - D'oh! That clearly wasn't going to give fair results as the iSCSI I/O would be going straight through the hypervisor and potentially VSA1 and the test W2K3 box would contend for hardware resources.

So, on to test plan 2, using another VMware server with only one single W2K3 guest, talking to VSA1 on the initial VMware hardware. So far so good - separate hardware for each component and a proper network in between (which is gigabit). To run the tests I decided to use Iometer. It's free, easy to use and you can perform a variety of tests with sequential and random I/O at different block sizes.

The first test was for 4K blocks, 50% sequential read/writes to an internal VMFS LUN on a SATA drive. The following two graphics show the VMware throughput; CPU wasn't max'd out and sat at an average of 80%. Data throughput averaged around 7MB/s for reads and only 704KB/s for writes.

I'm not sure why write performance is so poor compared to reads however I suspect there's a bit of caching going on somewhere. That's evident from looking at the network traffic which shows an equivalent amount of write traffic as there is network traffic. The read traffic doesn't add up. There's more read traffic on VSA1 than expected, which is shown in the figures from Iometer. It indicates around 700KB/s for both reads and writes.
I performed a few other tests, including a thin provisioned LUN. That showed a CPU increase for the same throughput - no surprise there. There's also a significant decrease in throughput when using 32KB blocks compared to 4KB and 512 bytes.
So, here's the $64,000 dollar question - what kind of throughput can I expect per Ghz of CPU and per GB of memory? Because remember there's no supplied hardware here from LeftHand, just the software. Perhaps with a 2TB limit per VSA maybe the performance isn't that much of an issue but it would be good to know if there's a formula to use. This throughtput versus CPU versus memory is the only indicator I can see that could be used to compare future virtual SANs against each other and when you're paying for the hardware, it's a good thing to know!

Monday, 27 October 2008

Understanding EVA

I've not had much exposure to HP EVA storage however recently I've had a need (as part of a software tool project) to get into the depths of EVA and understand how it all works. The following is my understanding as I see it, plus some comments of my own. I'd be grateful for any feedback which help improve my englightenment or equally, knock me back for plain stupidity!

So, here goes. EVA arrays place disks into disk groups. The EVA system automagically sub-groups the disks into redundant storage sets (RSS). An RSS is simply a logical grouping of disks rather than some RAID implementation as there's no underlying RAID deployment at the disk level.

Within each disk group, it is possible to assign a protection level. This figure is "none", "one" or "two", indicating the amount of storage to reserve for disk failure rebuilds. The figure doesn't represent an actual disk, but rather an amount of disk capacity that will be reserved across the whole pool. So, setting "one" in a pool of 16 disks will reserve 1/16th of each disk for rebuilds.

Now we get to LUNs themselves and it is at this point that RAID protection comes in. A LUN can be created in a group with either vRAID0 (no protection), vRAID1 (mirrored) or vRAID5 (RAID-5) protection. vRAID5 uses a RAID5 (4+1) configuration with 4-data and 1-parity.

From the literature I've read and playing with the EVA simulator, it appears that the EVA spreads a LUN across all volumes within a disk group. I've tried to show this allocation in the diagram on the right, using a different colour for each protection type, within a disk pool of 16 drives.

The picture shows two RSSs and a LUN of each RAID protection type (vRAID0, vRAID1, vRAID5). Understanding vRAID0 is simple; the capacity of the LUN is striped across all physical disks, providing no protection against the loss of any disk within the configuration. In large disk groups, vRAID0 is clearly pointless as it will almost always lead to data loss in the event of a physical failure.

vRAID1 mirrors each segment of the LUN, which is striped across all volumes twice, one for each mirror. I've shown these as A & B and assumed they will be allocated on separate RSS groups. In the event that a disk fails, then a vRAID1 LUN can be recreated from the other mirror, using the spare space set aside on the remaining drives to achieve this.

Question: Does the EVA actively re-create failed mirrors immediately on failure of a physical disk. If so, does the EVA then actively rebuild the failed disk, once it has been replaced?

Now, vRAID5, a little more tricky. My understanding is that EVA uses RAID-5 (4+1), so there will never be an even layout of data and parity stripes across the disk group. I haven't shown in on the diagram, but I presume as data is written to a vRAID5 LUN it is split into smaller chunks (I think 128KB) and striped across the physical disks. In this way, there will be as close to an even distribution of data and parity as possible. In the event of a disk failure, the lost data can be recreated from the other data and parity components that make up that stripe.

Question: What size block does the EVA use for RAID-5 stripes?

At this point, I'm not sure of the benefit of Redundant Storage Sets. They aren't RAID groups, so there's no inherent protection if a disk in an RSS fails. If LUNs are created within the same RSS, then perhaps this minimises the impact of a disk failure to just that group of disks; see the second diagram.
The upshot is, I think the technique of dispersing the LUN across all disks is good for performance, but bad for availability - especially as it isn't easy to see what the impact of a double disk failure can be - my assumption is that it means *all* data will be affected if a double disk failure occurs within the same RSS group. I may be wrong but that doesn't sound good.
Feel free to correct me if I've got any of this wrong!