Wednesday 31 January 2007

Storage protocols for VMware

I've been doing more VMware work recently. The deployment I'm working on is using SAN presented disk. The storage started as 50GB LUNs, quickly grew to 100GB and now we're deploying on 200GB LUNs, using VMFS and placing multiple VM guests on each meta volume.

Now, this presents a number of problems. Firstly, it was clear the LUN sizes weren't big enough in the first place. Second, migrating guests to larger LUNs had to be an offline process; present the new LUNs, shutdown the guest, clone the guest, restart the guest, blow the old guest away. A time intensive process, especially if it has to be repeated regularly.

Using FC presented LUNs/metas also presents another problem; if we choose to use remote replication (TrueCopy/SRDF) to provide DR failover then all the VM guests on a meta have to go on that failover too. This may not (almost certainly not!) be practical.

Add in the issue with lack of true active/active multipathing and restrictions on the number of LUNs presentable to an ESX server and FC LUNs don't seem that compelling.

The options are to consider iSCSI or store data on CIFS/NFS. I'm not keen on the CIFS/NFS option, iSCSI seems more attractive. It pushes the storage management away from the ESX Server and onto the VM guest; security is managed at the array level, rather than within ESX. Personally I think this is preferable let ESX (system) administrators do their job etc etc. One last benefit; I can present as many iSCSI LUNs as I like of whatever size. It means I can also stripe multiple LUNs; something I'm unlikely to do on VMFS presented devices.

Therefore I think iSCSI could be a great option. Then I thought of one curve ball; what if I could do thin provisioning on FC? Here's the benefit. Imagine creating 20 VM guests on a server, all running Win2K3. Standard deployment is 10GB for the root/boot disk but I'm only actually using about 5. The remainder is left to allow for maintenance/patching/temporary space (we don't want to have to rebuild servers) - applications and data go on separate volumes. I'll use a 200GB meta. Unfortunately it's 50% wasted. But bring in thin provisioning and I can allocate 10GB drives with impunity. I can allocate 20 or 30 or 40! FC is back on the menu. Incidentally, I'm more than aware that iSCSI devices can already be presented thin provisioned.

Lots of people tell me, why bother with thin provisioning. I think in VMware I've found a perfect usage.

Monday 29 January 2007

Buy Buy Buy


More acquisition news; Brocade and McDATA are finally just Brocade after the purchase completes. There's a snazzy new logo

which I can't decide is like a pair of red angel wings, or some convoluted join of the M and B from the two companies. Personally, I preferred the old logo depicting a fabric, it worked for me. I even had a lovely "personal" email from Michael Klayko (as I'm sure lots of other people did).

Anyway, moving on, what's more interesting is IBM bought SoftTek. Now they have a great product, TDMF, a piece of software that really made me go "wow!" when I first saw it in action. It literally is a transparent data mover. With the mainframe version (I was a proper storage admin when I started out on the mainframe) you could sync up many volumes in real time and perform an instant swap on them, allowing migration of TB of storage in seconds. I used the product to migrate data from an old storage array to new without any user impact or input at all. I hope IBM uses its new jewel well.

Sunday 28 January 2007

iSCSI Security Part 2

To tie down my iSCSI test environment I've implemented IPsec between client and server. This allows me to encrypt either the traffic or headers of my IP stream. I chose ESP (Encapsulating Security Payload) as this gives full confidentiality to my data, rather than Authentication Header (AH) which provides integrity as to the source of the data.

Implementation on my Netapp Simulator and Windows client was easy; the ipsec command, a file setting on the filer and configuration of the IPsec settings through MMC on Windows. Once I'd dropped and relogged in the iSCSI targets, I did some testing.

Now before I go further, let me stress that this testing was only slightly scientific. Everything I'm using is virtual and on the same physical machine with one hard disk. The figures I'm quoting are indicative of the performance I received and not directly comparable with real-world systems, although you'll get the idea.

So I tested with HDTach and no IPsec enabled. Response time for a standard test was about 0.2ms and throughput ranged from 2-32MB/s over the course of the test (an average of around 17MB/s). With IPsec enabled, response time doubled to 0.4ms and throughput dropped significantly to 2-7MB/s with an average of 4.2MB/s. I repeated the tests a number of times with IPsec on and off and saw consistent results.

Just to be sure, I checked for other bottlenecks in my system. The virtualising PC was not bottlenecked, neither the filer or client. I can therefore put the performance change down to simply "more work to do".

Unsurprisingly, I couldn't check the IP packets with IPsec enabled. Although this is secure, it presents issues when diagnosing problems. I don't know whether tools exist to enable encrypted streams to be analysed, but Ethereal couldn't do it. My IPsec implementation was also rather simple - just a shared key for encrypting the traffic. If I was implementing this for real, I'd be discussing with the Network guys the best way to implement security.

Friday 26 January 2007

Is it me?

..or do some people not like comments. Here's another iSCSI post from Marc Farley at Equallogic. I think he's saying iSCSI is good. I agree - good in the right circumstance.

Marc - either tell me how to or enable comments - please!!

iSCSI Security Part 1

I've done some more work on iSCSI. That meant rebuilding one of my iSCSI client environments as I totalled it by trying to add another drive to the VM guest. Anyway, panic over. I've got the client working again and I've been running traces to see how easy it is to locate relevant data in an unprotected iSCSI stream.

I mentioned before that I am using iSCSI on Win2K3 talking to a Netapp filer (simulator) with Ethereal on the host Win2K3 box to monitor I/O activity. I tried today to write to a test file with Ethereal running and tracing I/O activity. The tool is splendid in its interpretation of the data it sees, formatting the packets on the basis of the content. My save of a file containing some test data yielded some interesting results.

Firstly, I found Ethereal detected the logfile entries for NTFS being saved before my actual file. These are RCRD and RSTR records which NTFS uses to recover the filesystem should anything untoward happen before the data is actually committed to disk.

After this, I tracked the MFT entries being written. These are the actual file saves which contain my data. Non-encrypted I can see the content. Using Ethereal and a tool to map NTFS records, I could easily spy on data being stored on iSCSI volumes. Here are some screenshots:




The first shows the MFT entry that NFS stores for the file save. I used Winhex to examine the blocks on disk. Note the content of the file "The Quick Red.......", plus the file name "TestFile1.txt" in unicode, so using double characters. Contrast this to the Ethereal output which was scanning the network. This shows the same data captured by scanning the network. Ethereal is generous enough to even format the data to allow me to locate the iSCSI data. This is no good for any organisation which must offer data security. Part II will discuss what can be done.





iSCSI Nonsense

I've just read Mark Lewis's latest blog entry. Unfortunately he doesn't seem to permit comments so I'll just have to take issue here.

For those who haven't read it, he talks about the recent talking up of iSCSI.

So, I have done that, so have others. He misses a huge point. IT is driven by technologists. The world is driven by technologists. Let me cite an example. Who knows about the standards relating to television broadcasts? - to be fair, who cares. Well, the people who care are those who want higher and better resolutions, the early adopters, the technologists. They say and so we all want 1080p (well done if you know what that means) because the technologists tell us that it is the best.

So, we all want to consider iSCSI because it will bring benefit to our businesses - the technologists tell us so. Absolutely no-one with an ounce of sense would say that iSCSI is going to kick FC's ass. On the contrary, what's being said is that iSCSI complements FC to a degree that customer choice will be the deciding factor.

iSCSI is maturing as a technology to complement existing offerings. It is a lower level technology play. SOA sits far above the protocol layer and is another discussion entirely. Whatever happens, something has to move the SOA data - I'd bet that either iSCSI or FC is going to be doing that for a long time going forward.

Thursday 25 January 2007

Brocade/McDATA Merge Approved

The Brocade purchase of McDATA has been approved by both shareholders. The expected completion of the merger is 29 January.

I'm interested to see the merged product lines and how BrocDATA intends to support both product sets (especially the director class devices). There will be a lot of customers out there looking to see what bridging technology the merged company will produce and how the roadmap will look.

Whatever happens, Mc-cade needs to come out with something quickly or Cisco will be in and mercilessly stealing their market share.

Thursday 18 January 2007

A confession

Nigel commented on my slow provisioning post. I have to agree that Device Manager has its faults. I'm doing some work on Device and Tuning Manager in the coming weeks as I'm installing both on VMware guests. So, I should have some performance feedback.


The comment highlights how software still has a long way to go.

A few (well quite a few) years ago I needed to purchase a storage management product for implementing storage quotas on the mainframe. For those who remember DFSMS, it was just at the time SMS as a product was being implemented on MVS/ESA.

So, I asked my boss if I could purchase the software. He said I needed to a competitive analysis with other products and explain why my choice was the best. Unfortunately there were no other equivalent products out there, so I pondered my options and decided to invent one.

Based on a character at the time, I called the product "Loads-a-space" after a character called "Loadsamoney" - see the picture. Loadsamoney was a crass plasterer who always boasted how much money he made, it was the 80's boom time. Loadsamoney was a creation of Harry Enfield, so I called my ficticious software house Enfield Software.

I added Loads-a-space from Enfield Software to my proposal as the inferior product, got my approval and my purchase. It helped that my boss was Canadian and didn't know much about British Comedy.
OK, now I've got that off my chest, Nigel, I'd be interested to know how big your Device Manager deployment is and what you run it on, just to do some kind of comparison

Tuesday 16 January 2007

iSCSI Continued (2)

After my previous post on iSCSI testing I promised some more detail. So my test environment is based on the Netapp Simulator version 7.2.1, which you can download if you're a Netapp customer - not sure if it's available to non-customers (it should be as it is a great marketing tool) but I guess if you want to find out you could ask Dave.


Netapp filers export LUNs as iSCSI devices. As far as I can tell, the LUN implementation on ONTAP is effectively a qtree (i.e. a share) based on the way I created it. Anyway, once created, I associated the LUN with an initiator group and the initiator group has the access associated with it (hope you're all following this). The screenshot here shows the output from the LUN show command and igroup show command which list the LUNs I created and their association. You can see the igroup can be used to provide access to a number of servers based on the iqn, iSCSI Qualified Name, which is used to reference an iSCSI target or initiator device. The iqn seems to be a "gentlemens agreement" format, based on the reverse DNS of the server on which the iSCSI device resides, plus the date and month that domain was registered. In this case I registered the test server I have the iSCSI initator on and I got "iqn.1991-05.com.microsoft" followed by the specific server identifier of "vmware2.vmware.brookend.com", the name of my server itself.
My first inclination (being a hacker of old) was to spoof this, so I configured another server (XP this time) with the iSCSI initiator and changed its iqn. Voila I can access the same disks, albeit from another IP address. Being iSCSI and block data, the filer simulator didn't care about multiple access (which is fine) and I spent some time trying to break the shared LUN writing and reading data from both sources.
Needless to say this example highlighted how simple security does (or doesn't) work. I know CHAP authentication is available and I suspect there are many more security options that I need to investigate, so that's my next thing. Getting standards and security right I think will be more important than making sure the network performs.
More details to come.










Slow Provisioning

Poor provisioning tools annoy me. I've been annoyed today. I've been changing some VMware metas from 100GB to 200GB on a DMX. Unfortunately they were already presented (but not used) and replicated with SRDF. So I had to:

  1. "Not Ready" the R1 and R2 drives
  2. Unmask the LUNs from the FA
  3. Split the SRDF relationship
  4. Break the SRDF relationship
  5. Unmap the LUNs from their FAs
  6. Dissolve the metas
  7. Create the metas
  8. Re-establish and resync SRDF
  9. Map the LUNs to the FA
  10. Mask the LUNs to the hosts

10 steps which take some considerable time to write validate and execute. I don't do this stuff often enough to justify writing scripts to help me out; but I think this should be a vendor thing - a software tool with various configure options that creates the symconfigure and associated commands for you and indicates the steps you will have to perform. ECC is *supposed* to do it but it doesn't. Roll on some good software.

Sunday 14 January 2007

more about iSCSI

I mentioned as a "Storage Resolution" to look more in-depth at iSCSI. Well I've started doing just that today.

The first thing I thought I needed was a working environment. I'm not keen on investing in an entire storage array (at this stage) to do the testing (unless some *very* generous vendor out there wants to let me "loan" one) so I've build a virtual environment based on a number of free components.

I've a dedicated VMware testing machine recently built which has a dual Core Intel processor, 2GB of RAM and a SATA drive. Nice and simple. It runs Win2K3 with the free VMware server, onto which I've created another Win2K3 R2 partition and a Linux partition running Fedora Core 6. This is where my iSCSI "target" will sit.

For those unfamiliar with SCSI terminology, the source disk or disk system presents LUNs which are referred to as targets. The host accessing those LUNs is the initiator; simply put the host initiates a connection to a target device, hence the names. My iSCSI target in this instance is a copy of the Netapp simulator running on Linux.

Most people are probably aware of the simulator. If not, Dave Hitz talks about it here. I've created a number of disks into an OnTAP volume and out of that created a LUN. LUNs can be presented out as FC or iSCSI, in this instance I've presented it out as iSCSI.

By default the simulator doesn't enable iSCSI so I enabled it with the standard settings. This means my target's iSCSI address is all based on Netapp defaults. I'm going to work on what the best practices should be for these settings over the coming days. Anyway, I've presented 2 LUNs and numbered them LUN 4 and LUN 9.

At the initiator (host) end, I've used my Win2K3 Server and installed the iSCSI initiator software from Microsoft. This gave me a desktop icon to configure the settings. Again, I've ended up with the default names for my iSCSI initiator, but that doesn't matter; all I had to do was specify in the iSCSI initiator settings the IP address of my target, log on and it finds the LUNs (oh, one small point, I had to authorise the initiator on the simulator). Voila, I now have 2 disks configured to my Windows host which can be formatted as standard LUNs.

As a performance test, I ran HdTach against the iSCSI LUNs on Win2k3. I got a respectable 45MB/s throughput, which isn't bad bearing in mind this environment is all virtual on the same physical machine.

All the above sounds a bit complicated, so I'll break it down over the coming days as to what I had to do; I'll also explain the iSCSI settings I needed to make and my experiments with dual pathing and taking the iSCSI devices away from Windows in mid-operation.

Thursday 11 January 2007

WWN Decoding

The WWN Decoder page on my main site is updated. It's at http://www.brookend.com/html/main/wwndecoder.asp

If I've done it correctly, it now handles EMC up to DMX3 and HDS USP/NSC/AMS and also indicates the model type of the discovered device.

If anyone has samples from their arrays which they are willing to share, let me know and I'll validate them to see if it helps amend the decoder.

Friday 5 January 2007

Manic Miner and Storage Resource Management

I tried a bit of nostalgia the other day. From a "freebie" CD-ROM I installed a games emulator for the ZX Spectrum, a personal computer that was hugely popular in the '80s. The game I installed was called Manic Miner, one of the original platform games. At the time (1983) it was a classic and (shamefully) I even hacked the copy I had to remove the protection (you had to load a 4 or 6 digit code from a sheet of blue paper, which couldn't be photocopied). When my children saw the game, they fell about laughing, not surprising when you compare it to their latest play, Star Wars Battlefront.

It made me think how things have changed in 20 years; from 32x24 graphics to 1280x1024 with advanced polygon shading etc. What has this to do with storage? Well, I ponder on what will happen to Storage Resource Management in the next 20 years.

I think what we'll see is artificial intelligence-based software managing our data. The software will proactively fix hardware faults, relocate data based on our usage/value policies, provide CDP and CDR, deliver optimum performance and make all storage administrators obsolete.

Er, well all except the last one; yes I do think the worries we have about SRM tools will be resolved, however I think with the growth in capacity, complexity and features of todays storage, that Storage Administrators will be needed for a long time to come.

Tower of Tera

Lots of talk today about the 1 terabyte drive from Hitachi. In fact the drive is more likely to be about 931GB based on the dubious practice of using decimal 1000's rather than binary (whilst we're on that subject, the concept of decimal versus binary does annoy me - what with that and overhead, on some of the AMS's I've installed, a 300GB drive comes out as 267GB).

So, yes, I want a 1TB drive - no idea what I want to put on it, or how I'll back it up - but I want one.

Thursday 4 January 2007

Hybrid Storage Alliance




Fujitsu, Hitachi, Toshiba, Samsung and Seagate are getting together as the Hybrid Storage Alliance to promote the use of HDDs with large amounts of additional cache. They've set up a (not so snazzy) website at www.hybridstorage.org (check out the builder on the features page - when did you last see a builder using a laptop, never mind understanding what a hard disk is).

It's good news. I love the idea and have mentioned my thoughts before. I also think that onboard cache provides more options to develop the successor to RAID, although I'm still thinking how it could be done.

Sandisk also announced the device shown on the right - a solid state HDD. After 50 years, the hard disk is seeing some exciting changes.

Wednesday 3 January 2007

My Favourite De-Duplication Technology

Here's one of my favourite websites; www.shazam.com. In fact it isn't the website that is the favourite thing, it's what Shazam do. In the UK (apologies for US readers, I don't know your number), dialling 2580 down the middle of your 'phone and holding your mobile up to a music source for 30 seconds will give you a text back with the track title and the artist. Seeing this for the first time is amazing; as long as the track is reasonably clear, any 30 second clip will usually work. I've astounded (and bored) dozens of friends and it only costs me 50p each time.

So, Shazam got me thinking. How can they track the almost millions of music tracks in existence today and match this to a random clip of music I provide over a tinny link from my mobile phone? The most obvious issues are those of quality; I've almost only ever used the service in a bar with a lot of background noise (mostly druken colleagues). That aside, I tried to see how they could have indexed all the tracks and still allowed me to provide a random piece of the track against which they match.

I started thinking about pattern matching and data de-duplication as it exists today. Most de-dupe technology seems to rely on identifying common patterns within data and indexing those against a generated "hash" code which (hopefully) uniquely references that piece of data. With suitable data containing lots of similar content then a lot of duplication can be removed from storage and referenced as pointers. Good examples would be backups of email data (where either the user or a group of users share the same content) and database backups where only a small percentage of the database has changed. The clever de-dupe technology would be able to identify variable length patterns and determine variable start positions (i.e. byte level granularity) when indexing content. This would be extremely important where database compression re-aligns data on non-uniform boundaries.

Now this is where I failed to understand how Shazam could work; OK, so the source content could be de-duped and indexed, but how could they determine where my sample occurred in the music track? A simple internet search located the following presentation from the guy (Avery Wang) who developed the technology. The detail is here http://ismir2003.ismir.net/presentations/Wang.PDF. The de-dupe process actually generates a fingerprint for each track, highlighting specific unique spectogram peaks in the sounds of the music, then uses a number of these to generate hash tokens via a technique called "combinatorial hashing". This uniquely identifies a track, but also provides the answer as to how any clip can be used to identify a track; the relative offsets of each hash token is used to identify the track, so the absolute offset of the sample isn't important.

Anyway, enough of the techie talk, try Shazam - amaze your friends!

Tuesday 2 January 2007

Storage Resolutions

The new year is here. Everyone loves to make resolutions to say how they are going to improve their lives. Personally I think it is nonsense; if you want to change, then you can do it any time rather than the abitrary time of new year.

Anyway, enough of my humbug. Here's a few storage resolutions I hope to maintain:


  • iSCSI - I haven't paid enough attention to this. I think iSCSI is due to hit its tipping point this year and get much more widespread adoption.
  • WAFS - Wide Area File Systems interest me. Any opportunity to reduce the volume of data being moved across networks while centralising the gold copy strikes me as a sensible idea.
  • CDP - There are some interesting products around providing continuous data protection. They aren't scalable yet, but when they are I can see them being big.
  • NAS Virtualisation - OK, I know how it works and what the products are; I just need to get into more detail.
  • CAS - I've always seen CAS as pointless. It's time to give it a second chance.

So that's the technology side covered. What about process?

  • ILM - I think it is time to harp on about proper ILM - i.e. that which is integrated into the application rather than the poor efforts we've seen to date. I think application development needs to be addressed to cover this.
  • SRM - how about some proper tools which actually do the job of managing the (large scale) process of storage deployment? More thought required here.
  • Cost Management - I believe there are lots of options for managing and reducing cost, I should expound on them more.
  • Technology Refresh - Always a problem and certainly needs more thought for mature datacentres.

Hmm, funny how each year's resolutions end up sounding just like the ones the year before?

Netapp 1 EMC 0

My RSS reader just picked up a lovely report from Netapp countering an EMC report showing that with MS-Exchange workloads, EMC was better than Netapp (CX3-40 v 3050). It just shows how when vendors are challenged to defend their products, they can make them work much better than the "standard" configuration. I can't help thinking it would be better if these products did this without needing the vendor to do a lot of configuration work.

Original report here.... http://www.netapp.com/library/tr/3521.pdf