It's not very often I side with one vendor or another however after BarryB's recent post regarding "Benchmarketing" I feel obliged to comment. Have a read of Barry Whyte's rebuttal too.
We see technology advancements because "concept" devices are used to drive innovation but don't necessarily translate directly to end-user products. Look at the fashion industry - some of the most outrageous outfits are paraded down the catwalk but the same dress, coat, hat or whatever isn't sold in the shops. Instead it influences the next fashion season.
Look at the motor industry - concept cars appear well before actual consumer products. We may laugh at some and marvel at others - take the Bugatti Veyron. It is well known that Volkswagen make a loss on each car produced, however what counters this is the publicity, the research, the kudos of being able to claim Veyron technology (disputably the fastest car in the world) is deployed in the standard VW range. Lexus is another good example of a brand created by Toyota to perform the same function. Much the same can be said for Formula 1.
Now, I'm not endorsing IBM per-se here, however I don't see the harm with IBM marketing a "concept" piece of technology which could lead to innovation in the future. After all, IBM is well known for research of this kind; the disk drive and the tape drive spring to mind.
Even EMC's own bloggers question whether EMC is known for innovation and other than Symmetrix, I can't think of one thing I view as an EMC "idea".
Anyway, 'nuff said. As previously offered - I would love to take the position of moderator in developing real world benchmarking - bring it on!!
Friday, 31 October 2008
It's not very often I side with one vendor or another however after BarryB's recent post regarding "Benchmarketing" I feel obliged to comment. Have a read of Barry Whyte's rebuttal too.
Here's a quality piece of reporting from TechCrunch on the state of Facebook and their data problems. I mentioned just last week in this post about their data growth. It's incredible that they're purchasing a new Netapp 3070 filer each week!
I'm surprised that Facebook would be continually purchasing NAS filers to grow their content. There must be a rolling set of pictures, thumbnails and so on that are frequently looked at, but there also must be a significant amount that aren't and could be archived to super-dense nearline type technology akin to the Copan products.
Unfortunately when data growth is so intense, it isn't always easy to see the wood for the trees and from previous and current experience, using Netapp creates the risk of wasted resources.
In my experience, looking at just block-based arrays, I've always seen around 10-15% of orphan or unused resources and sometimes higher. When host-based wastage is taken into consideration, the figure can be much worse, although host reclamation is a much more intense process.
I'm willing to offer to anyone out there who has more than 50TB of storage on storage arrays a free analysis of their environment - for a 50:50 split of any savings that can be made. As budgets tighten, I think there will be more and more focus on this kind of work.
Posted by Chris M Evans at 8:44 am
Thursday, 30 October 2008
Take it from me, SMI-S is a thing of the past. If there's one thing the last few months have taught me it's how different each vendor's products really are. I've been working on a tool called SRA (see the link here) which will report on storage in a consistent manner. Let me tell you that isn't easy...
- EMC Symmetrix/DMX - Physical disks are carved into smaller segments called hypers. These are then recombined into LUNs which then might be recombined into composite devices (metas) and replicated, cloned or snapped. The hypers that make up a LUN can come from anywhere within an array and can be moved around at will by a tool designed to improve performance, completely ruining your original well-planned configuration. Combining hypers give you RAID, which wasn't RAID before and was something called mirrors but is now, and is even RAID-6! Devices have personalities which survive their presentation or removal from a port. A device can have multiple personalities at the same time. LUNs use a nice numbering system based on hex - but don't expect them to number nicely if you destroy and create devices. Bit settings (flags) are used to ensure host SCSI commands work correctly.
- HDS USP/HP XP - Physical disks are grouped into RAID groups from which LUNs are carved. Until recently you couldn't span RAID groups easily (unless you were combining some free space in each RAID group). Devices don't have a personality until they're presented to a host on a port, but they can have multiple personalities. HDS use a form of punishment known as CCI for anyone foolish enough to think they had made their arrays easy to manage. LUNs are numbered using a relic of the mainframe and yes, you can move things around to balance performance, but don't think you can do it unless there are spare LUNs (sorry LDEVs) around. Different host types are supported by a setting on a host group which lets you confuse the hell out of every one by telling them their LUN numbers are all the same but unique. Oh, and the storage the user sees doesn't actually have to be in the array itself.
- HP EVA - Phew! Physical disks are managed in groups (which it's recommended to only have one of, but you can have more if you really must) but they don't use RAID at the group level because that would be too easy. Instead disks are grouped into Redundancy Storage Sets, which reduce the *risk* of disk failures but don't protect directly against them. LUNs are created only when they need to be presented to a host and they don't have simple LUN numbers, but rather 32 digit UUIDs. RAID protection is done at the LUN level, making it more difficult to conceptualise than either of the previous two examples.
- Pillar Axiom - now we're getting really abstract. With Axiom, you can tier data on different levels of performance, but wait for it - they will be on the same drive, but utilising different parts of the same spindle! Argh! Enough!
Clearly every vendor wants to differentiate their product so you'll buy from them and not the competition. In some respects they *have* to differentiate otherwise all the vendors would spend their time in litigation with each other over patent copyright! (wait a minute, they already are). So SMI-S or any other standard is going to have a near impossible time creating a single reference point. Add to the mix the need to retain some competitive advantage (a bit like Microsoft holding back the really useful API calls in Windows) and to sell their own management tools and you can see why SMI-S will be at best a watered down generic interface.
So why bother. There's no benefit. Every vendor will give lip service to the standard and implement just what they can get away with.
The question is, what would replace it? There's no doubt something is needed. Most SRM tools are either overbloated, poorly implemented, expensive, or plainly don't work so some light touch software is a must.
I think the interim solution is to get vendors to conform to a standard API format, for example XML via an IP connection to the array. Then leave it to the vendor how to code up commands for querying or modifying the array. At least the access method would be consistent. We don't even see that today. All we need now is an acronym. How about Common Resource Access Protocol?
Wednesday, 29 October 2008
Thanks to all those who posted in response to Understanding EVA earlier this week, especially Cleanur who added a lot of detail. Based on the additional knowledge, I'd summarise again:
- EVA disks are placed in groups - usually recommended to be one single group unless there's a compelling reason not to (like different disk types e.g. FC/FATA).
- Disk groups are logically divided into Redundancy Storage Sets, which can be from 6-11 disks in size, depending on the number of disks in the group, but ideally 8 drives.
- Virtual LUNs are created across all disks in a group, however to minimise the risk of data loss from disk failure, equal slices of LUNs (called PSEGs) are created in each RSS with additional parity to recreate the data within the RSS if a disk failure occurs. PSEGs are 2MB in size.
- In the event of a drive failure, data is moved dynamically/automagically to spare space reserved on each remaining disk.
I've created a new diagram to show this relationship. The vRAID1 devices are pretty much as before, although now numbered as 1-1 & 1-2 to show the two mirrors of each PSEG. For vRAID5, there are 4 data and 1 parity PSEG, which initially hits RSS1, then RSS2 then back to RSS1 again. I haven't shown it, but presumably the EVA does a calculation to ensure that the data resides evenly on each disk.
So here's some maths on the numbers. There are many good links worth reading; try here and here. I've taken the simplest formula and churned the numbers on a 168-drive array with a realistic MTBF (mean time before failure) of 100,000 hours. Before people leap in and quote the manufacturers numbers that Seagate et al provide, which are higher figures, remember arrays will predictively fail a drive and in any case with temperature variation, heavy workload, manufacturing defects etc, the probability is lower than manufacturing figures (as Google have already pointed out).
I've also assumed a repair (i.e. replace) time of 8 hours, which seems reasonable for arrays unattended overnight. If disks are not grouped, then the MTTDL (mean time to data loss) is about 44553 hours, or just over five years. This is for a single array - imagine if you had 70-80 of them - the risk would be increased. Now, with the disks in groups of 8 (meaning that data will be written across only 8 disks at a time), the double disk failure becomes 1,062,925 hours or just over 121 years. This is without any parity.
Clearly grouping disks into RSSs does improve things and quite considerably so, even if no parity is implemented, so thumbs up to RSSs from a mathematical perspective. However if a double disk failure does occur then every LUN in the disk group is impacted as data is spread across the whole disk group. So it's a case of very low probability, very high impact.
Mark & Marc commented on 3Par's implementation being similar to EVA. I think XIV sounds similar too. I'll do more investigation on this as I'd like to understand the implications of double disk failures on all array types.
Tuesday, 28 October 2008
In my previous post covering LeftHand's Virtual Storage Appliance, I discussed deploying a VSA guest under VMware. This post discusses performance of the VSA itself.
Monday, 27 October 2008
I've not had much exposure to HP EVA storage however recently I've had a need (as part of a software tool project) to get into the depths of EVA and understand how it all works. The following is my understanding as I see it, plus some comments of my own. I'd be grateful for any feedback which help improve my englightenment or equally, knock me back for plain stupidity!
Wednesday, 22 October 2008
I've been running the LeftHand Networks Virtual SAN Appliance for a while now. As I previously mentioned, I can see virtual storage appliances as a great new category, worthy of investigation for the flexibility of being able to provide functionality (replication, snapshots etc) without having to deploy appliance hardware.
This post is one of a number covering the deployment and configuration of VSA.
Tuesday, 21 October 2008
I've been working on getting my "home SAN" into a usable configuration over the last few weeks. One hassle has been VMware (and I won't even mention Hyper-V again) and the support for fibre channel.
I guess thinking logically about it, VMware can't support every type of HBA out there and the line has to be drawn somewhere, but that meant my Qlogic and JNI cards were no use to me. Hurrah for Ebay, as I was able to pick up Emulex Lp9002L HBA cards for £10 each! I remember when these cards retailed at £600 or more.
Now I have two VMware instances accessing Clariion storage through McDATA switches and it all works perfectly.
That leads me on to a couple of thoughts. How many thousands of HBA cards are out there that have been ditched as servers are upgraded to the latest models? Most of them are perfectly servicable devices that will continue to give years of useful service, but "progress" to 4/8Gb fibre channel and FCoE dictates we must ditch these old devices and move on.
Why? It's not as if they cause a "green" issue - they're not power hungry or take up lots of space. I would also challenge anyone who can claim that they need more than 2Gb/s bandwidth on all but a small subset of their servers. (Just for the record, I see the case for using 4/8Gbs for large virtual server farms and the like as you've concentrated the I/O nicely and optimised resources)
So we need two things; (a) a central repository for returning old and unwanted HBAs (b) vendors to "open source" the code for their older HBA models to allow enthusiast programmers to develop drivers for the latest O/S releases.
If we can reduce the amount of IT waste, then I think this is a key strategy to any company claiming to be green in the world moving forward.
Monday, 20 October 2008
Have a look at this link from Facebook - http://www.facebook.com/note.php?note_id=30695603919
They're now serving up 15 billion images a day! From the figures quoted, Facebook host 40 billion files as 1PB of storage, or 25KB per image. Peak load is 300,000 images a second, or 7.5GB per second of bandwidth.
Now I suspect (and it's not rocket science to guess) that Facebook don't store all their images in one place and they are distributed for performance and redundancy, so storage usage must be significantly higher than the quoted figure.
Growth rate is 2-3TB a day! That's up to a petabyte a year at the current rates. What's interesting is that potentially all of these images must be available to view (although most of the time the smaller preview clips will get shown first) so as Facebook grows, they must be hitting some serious issues maintaining reasonable response times.
So, how many of these images are also located elsewhere on other sites like Flickr? How many are sitting on memory sticks, hard drives and so on? I guess we'll never know, but maybe when we've One Cloud to Rule Them All, we'll have a shed load of spare disks lying around.
It annoys me when vendors release new products and don't update their global website correctly.
Like this link for HDS; http://www.hds.com/products/storage-systems/adaptable-modular-storage-2000-family/index.html?WT.ac=prodssams2000
Which should follow through to a page (link at the bottom) to AMS2000 Specifications. Except it doesn't - it only displays the older models. Doesn't a product release of the level of the AMS2000 warrant someone checking the website (i.e. the shop window) to ensure all the data is there.
Come on guys!!
Friday, 17 October 2008
Here's this Friday's list of bloggers I follow. Today it's the turn of EMC.
- Andrew's Blog - Andrew Cohen (EMC General Counsel) - HomePage - RSS Feed
- Chuck's Blog - Chuck Hollis, EMC Marketing - HomePage - RSS Feed
- Never Talk When You Can Nod - Andrew Chapman, SharePoint GM - HomePage - RSS Feed
- Confessions of an Ebiz Junkie - Len Devanna - HomePage - RSS Feed
- Cornelia Davis's Weblog - Cornelia Davis - HomePage - RSS Feed
- Craig's Musing's - Craig Randall - HomePage - RSS Feed
- Dave Graham's Weblog - Dave Graham - HomePage - RSS Feed
- Energy Matters - Dick Sullivan - HomePage - RSS Feed
- Information Playground - Steve Todd - HomePage - RSS Feed
- Mark's Blog - Mark Lewis - HomePage - RSS Feed
- Oracle Storage Guy - Jeff Browning - HomePage - RSS Feed
- Storagezilla - Mark Twomey - HomePage - RSS Feed
- The Storage Anarchist - Barry Burke - HomePage - RSS Feed
Thursday, 16 October 2008
Well, I wasted 3 hours of my life last night trying to get Hyper-V working on one of my PC/servers. Admittedly it's an ancient 2 years old, only has PCI-Express, SATA-II support and up to 4-core Intel processors, but for some reason, my attempts to install Hyper-V would get just so far and fail with a cryptic 0x8007045D error.
As a seasoned professional, I tried the obvious - shouting at the PC, kicking the PC, snapping at my children as they came in to ask innocent questions, then as a last resort I tried using different installation media, screwing about with BIOS settings and so on.
None of it worked. The error code, according to Google, seems to be hardware related, but I've no idea where and Hyper-V being a complex high-quality piece of software gave me no clues. Perhaps if the installation hadn't taken up to 30 minutes at a time (goodness knows what it was it was doing) I could have got back to Heroes an hour earlier.
After giving up, I re-installed VMware ESXi - an installation which, no kidding, took only 10 minutes end to end.
I have been planning a review of the virtualisation technologies, especially with respect to storage, clearly Hyper-V is going to make this a challenge.
Microsoft - you're not on my Christmas card list this year (which they weren't on in the first place as my wife writes all the cards in our house) - VMware welcome back.
Tuesday, 14 October 2008
Thanks to Chuck who pointed out to me SVC's ability to move virtual WWNs between nodes during replacement. At some stage in the future I may get to play with SVC but I haven't so far, so this feature eluded me. Question: is SVC the *only* block virtualisation appliance to offer this functionality and is it a seamless operation or does it require downtime?
How about InVista or Incipient (or any other vendor I may have missed off - we will assume USP doesn't have the facility)? Answers on a
postcard comment please.
There's been a lot of talk this week about Compellent and their support for solid state drives. See the press release here. So now we have two vendors offering SSD devices in their arrays, Compellent join the club with EMC. Which is best?
At a meeting I had last week, we discussed SSD drives and EMC's implementation in particular. The consensus was that SSDs (or should I be calling them EFDs?) in existing DMX array were more of an "also supports" rather than a mainline feature. The reason for that thinking was that DMX was never engineered specifically to support EFDs, but rather they've been added on as a recent value-add option. What's not clear is whether this bolt-on approach really means you get the best from the drives themselves, something that's important with the price point they sit at. Consider that EFDs sit behind a shared architecture of director ports, memory, front-end ports and queues. Do EFDs get priority access (I know they have to be placed in specific slots in the DMX storage cabinet so presumably they are affected by their position on the back-end directors).
The other problem with the EMC approach is that entire EFD LUNs must be given up to a host. With large databases, how do you predict which parts of the database at any one time are the hot parts? How does a re-org or reload affect the layout of the data? Either you need to put all of your database on EFD or spend a lot more time with the DBAs and Sys Admins creating a design that segments out active areas (and possibly repeating this process often).
If Compellent's technology works as described, then LUNs will be analysed at the block level and the active blocks will remain on the fastest storage with the least active moved to lower tiers of disk (or to other parts of the disk) within the same array.
This should offer a more granular approach to using SSDs for active data. In addition, if data can dynamically move up/down the stack of storage tiers, then as data profiles change over time, no application re-mapping or layout should be necessary. Hopefully this means that SSDs are used as efficiently as possible, justifying their inflated cost.
Just to conclude, I'm not saying Compellent have the perfect solution for using SSDs but it is a step in the right direction for making storage usage as efficient as possible.
Monday, 13 October 2008
There's no doubting that storage virtualisation will prove to be a key component of IT architecture in the future. The overall benefit is to abstract the physical storage layer from servers either in the fabric, or through the use of a dedicated appliance or even in the array itself.
Over time, storage resources can be upgraded and replaced, potentially without any impact to the host. In fact, products such as USP from HDS are sold on the virtues of their migration features.
However at some stage the virtualisation platform itself needs to be replaced. So how do we do that?
The essential concept of virtual storage is the presentation of a virtual world wide name (WWN). Each WWN then provides virtual LUNs to the host. The virtualisation appliance manages the redirection of I/O to the physical device, which also includes responding to SCSI LUN information queries (like the size of the LUN).
Ultimately, the host believes the virtual WWN is the physical device and any change to the underlying storage is achieved without affecting this configuration. If the virtualisation appliance must be replaced, then the virtual WWN could change and this means host changes, negating the benefit of deploying a virtual infrastructure.
As an example, HDS and HP allow USP/XP arrays to re-present externally connected storage as if it is part of the array itself. LUNs can be moved between either physical storage medium (internal or external) without impact to the host. However, the WWN used by the host to access the storage is a WWN directly associated with the USP/XP array (and in fact decoding the WWN shows it is based on the WWN serial number). If the USP is to be replaced, then some method of moving the data to another physical array is needed. At the same time, the host->WWN relationship has to change. This is not easy to achieve without (a) an outage (b) host reconfiguration (c) using the host as the data mover.
There isn't an easy solution to the issue of replacing the virtualisation tool. Stealing an idea from networking, it could be possible to provide a DNS style reference for the WWN with a "name server" to look up the actual physical WWN from the "DNS WWN". Unfortunately whilst this would be relatively easy to implement (a name server already exists in Fibre Channel) the major problem would be maintaining data integrity as a DNS WWN entry is changed and reads/writes start occurring from a new device. What we'd need is a universal synchronous replicator to ensure all I/Os written to an array are also written to any other planned target WWN, so as the WWN DNS entry is changed, it can't become live until a guaranteed synchronous mirror exists. I can't see many vendors agreeing to open up their replication technology to enable this; perhaps they could offer an API for "replication lite" which was used solely for migration purposes while the main replication product does the big replication features.
In the short term, we're going to have to accept that replacing the virtualisation engine is going to have some impact and just learn to work around it.
Monday, 6 October 2008
As well as storage, one area of IT I find really interesting is virtualisation. Over the years I've used VM (e.g. the IBM mainframe platform), MVS (now morphed into z/OS) as well as products such as Iceberg. More recently I've been using VMware since it was first released and finally have managed to deploy a permanent VMware ESX installation in my home/office datacentre. That has given me the opportunity to install and test virtual SAN appliances, such as VSA from LeftHand Networks and Network Storage Server Virtual Appliance from FalconStor. I'll publish more on these in a week or so once I've done some homework, but for now I want to discuss Netapp.
As many of you will know, Netapp have offered a simulator for ONTAP to their customers for some time (BTW, Dave and the crew, although I'm not a customer, I would be grateful of an up-to-date copy). The simulator is great for script testing and learning new commands without totally wrecking your production operations. However I think it is about time Netapp took the plunge and offered ONTAP as a virtual appliance.
It shouldn't be hard to do for two reasons (a) the code is mostly Unix anyway and (b) most if not all the code exists in the simulator. It also seems to me to be an easy win; there are many organisations who wouldn't consider placing a Netapp filer into a branch office due to cost, but would deploy VMware for other services. A virtual filer could provide File & Print, iSCSI, SAN *and* most usefully, replicate that data back to core using standard Netapp protocols such as Snapmirror and Snapvault.
Perhaps Netapp haven't done it as they don't want to cut into their generous hardware margin on disk, but with a virtual offering to complement their physical ones, Netapp could retain their position as NAS vendor of choice.
Thursday, 2 October 2008
HP announced today their intention to acquire LeftHand networks, an iSCSI and virtualised SAN player.
Now, I doubt HP needed to buy LeftHand for their iSCSI technology. I suspect the bigger play here is the virtualised SAN technology they have - also known as the Virtual SAN Appliance. This allows a SAN to be created in a VMware guest, utilising the storage of the underlying VMware server itself.
I think we have a new technology sector starting to mature; virtual storage appliances.
At first glance you might ask why virtualise the SAN and initially I was skeptical until I gave it some thought (especially with reference to a client I'm dealing with at the moment). Imagine you have lots of branch offices. Previously you may have deployed a DNS/Active Directory server, perhaps a file server and some storage, the amount of storage being dependent on demand within the branch. Deploying the storage becomes a scalability and support nightmare if you have lots of branches. But how does a virtual SAN help?
Well, it allows you to provide SAN capability out of the resilient architecture you've already deployed in that location. Chances are you've deployed more than one physical server for failure purposes. You may also not need a large amount storage, but want advanced features like replication, snapshots etc. Deploying a virtual SAN lets you utilise these features but leverage both the hardware and storage of the ESX infrastructure you've deployed. The crucial point here is that you've benefited from getting the functionality you require without deploying bespoke hardware.
So you reduce costs, still maintaining a resilient infrastructure provide scalable support for small and medium branches. The challenge moves from supporting hardware (which has become a commodity) to supporting software as part of a virtual infrastructure and that's a different issue. What you've gained is a consistent set of functional SAN operations which can be overlaid on different hardware - hardware which can be changed and upgraded without impacting the virtual SAN configuration.
I've downloaded VSA to test as I now have a resilient VMware environment. I'm looking forward to discovering more.
Wednesday, 1 October 2008
Yesterday I discussed beating the credit crunch by getting your house in order. This was picked up my Marc Farley over at StorageRap and he posted accordingly. Marc, thanks for the additional comments, I will be reviewing the diagram accordingly based on your thoughts.
Moving on, think to yourself does this sound familiar?
Storage requests come in over time in a constant but unpredictable rate. When they arrive, you just provision them. Perhaps you check the requestor can "pay" for their storage (i.e. is authorised to request) but generally, storage is provisioned pretty much on demand. When you run out of storage, there's a minor panic and rush to place new hardware orders and then in a few weeks you're back in the game and provisioning again.
Welcome to Fast Food Storage Provisioning! I was going to use a brand name in this post, but then decided against it. After all, as these guys know, you're just asking for trouble.
How does this compare to storage? Easy. You walk into a fast food place and they're just there waiting to serve you, no questions asked, as long as you pay. They may have what you require ready, but if not, there's a panic in the kitchen area to cook what you want and so a delay in the delivery of your request. Those customers who eat food/storage every day become "overprovisioned" in both senses of the word.
Clearly Fast Food establishments have a vested interest in acquiring more customers as it builds their profits, however unless you are selling a service, storage growth is bad for the bottom line.
So, how about taking a few steps to make sure that storage is really needed?
- When do you need the storage by? Poor project planning means storage requests can be placed long before servers and HBAs have even been delivered, never mind racked and configured.
- Can the storage be delivered to you in increments? Most users who request 20TB immediately will never actually use it for days or weeks (in extreme cases may never use it all).
- Have you checked your existing server to see if you have free storage? You would be amazed how many users have free LUNs on their servers they didn't know were there.
- What exactly is your requirement in detail? How will you use what we give you? By questioning the request, you can find out if users have simply doubled the estimate of required storage given to them by the DBA. Get to the real deal on what growth is anticipated.
I'm not advocating saying no to customers, just to be confident that what you're deploying is what you need. Then you won't have that guilty feeling ordering another burger - I mean array...