Tuesday, 4 November 2008

LUN Stacker

A recent post from Martin "The Bod" Glassborow got me thinking about the whole process of LUN consolidation. I've done lots of migrations where people quake at the thought of changing the LUN size from one array to another. Now, I almost always want to change LUN sizes, as the vendor specific ones - 8.43GB/13.59GB etc are pretty painful and wasteful at the same time.

There's another good reason to standardise on LUNs. If you've implemented a good dual-vendor strategy and sorted your firmware driver stack out, then you can position to take storage from any of your preferred vendors. There's nothing better than having all of your vendors sweating on that next 500TB purchase when they know you take your storage from either or EMC/HDS/HP/IBM.

If LUNs and the I/O stack are all standardised, you can move data around too. The difficult part as alluded to in Martin's post is achieving the restacking of data.

Here's the problem; SAN storage is inherently block based and the underlying hardware has no idea of how you will lay out your data. Have a look at the following diagram. Each LUN from a SAN perspective is divided into blocks and each block has a logical block address. The array just services requests from the host for a block of data and reads/writes it on demand. It is the operating system which determines how the file system should be laid out on the underlying storage. Each volume will have a standard location (or standard method of calculating the location) for what was called the VTOC (Volume Table of Contents), also known as the FAT (File Allocation Table) in DOS and MFT (Master File Table) in NTFS. There are similar constructs for other O/S versions like Linux but I'm not 100% certain of the terminology so won't risk the rath of getting it wrong.

The layout of data on a file system is not a trivial task. Apart from keeping track of files, there's the requirement to keep track of free space and to be able to recreate the file index in the case of corruption, so some kind of journalling is likely to be implemented. There are also features such as compression, Single Instancing, Encryption, etc which all add to the mix of understanding exactly how file data is laid out on disk.

Now think of how multiple LUNs are currently connected together. This will be achieved with either a Volume Manager (like VxVM), supplied as a separate product, or a native LVM (logical volume manager). All of these tools will spread the "logical" volume across multiple LUNs and will format the LUN with information to enable the volume to be recreated if the LUNs are moved to another host. VxVM achieves this by having a private area on each LUN which contains metadata to rebuild the logical volume. Each LUN can be divided into sub-disks and then recombined into a logical volume, as shown in this diagram.

So a physical LUN from an array may contain a whole or partial segment of a host volume, including LVM metadata. Determining what part, whether all the parts are on this array (and where) is a tricky task - and we're expecting that the transmission protocol (i.e. the fabric) can determine all of this information "on the fly" as it were.

My thought would be - why bother with a fabric-based consolidation tool? Products like VxVM provide a wide set of commands for volume migration, although not automated they certainly make the migration task more simple. I've seen some horrendous VxVM implementations, which would require some pretty impressive logic to be developed in order to understand how to deconstruct and reconstruct a volume. However life is not that simple, and host-based migrations aren't always easy to execute on, so potentially a product would be commercially viable, even if the first implementation was an offline version which couldn't cope with host I/O at the same time.

Funny, what's required sounds a bit like a virtualisation product - perhaps the essence of this is already coded in SVC, UVM or Incipient?


BarryWhyte said...

I see I need to get you down here sooner rather than later ;)

So what you describe is kind of applicable to SVC. The beauty is you simply create arrays and luns and present to SVC. You then pool those luns with the same characteristics, performance, redundancy etc.

Now the "host" LUN provisioning is from the pool, and you don't really know where the actual blocks are - somewhere on that pool - but since they all act the same, it doesn't matter. You can then offload the "striping / pooling" from the LVM layer to the SVC, and simply create a large enough LUN for the given application. At any time you decide to migrate from one pool to another you simply say 'do it' and behind the scenes the data gets move - host see's not change, just the "host LUN" that was presented.

Unless I'm missing the point you were making?

Chris M Evans said...

Barry, yes, sort of; I was meaning that SVC sees all the blocks that comprise a LUN and therefore it has to read/write them in the same way that the Brocade DMM product would do in the fabric, so SVC is ideally positioned (if it knew how) to dynamically stack a LUN onto a new virtual one. Obviously the practicalities of executing on that are much, much more complicated, but my point was that SVC sees all the data coming through it.

Globe Treader™ - © Kiran Ghag said...

we try to stick to standard LUN "block size" to make data movement easy. VxDMP is used to move blocks around as discussed. This also allows to stack luns from diff storages to create bigger volume on host if one storage runs out of space.

This requires that the arrays in question are supported by veritas. we didnt go for SVC/USPV because it adds another comaptibility level between host and storage array.

Most vendors do not suggest/support mixing drive speeds in a RAID group or meta lun combination. Primary reason for that being performance driven by lowest performer. In similar manner, having blocks stacked from different storage means, lowest performer array drives the performance again :) unless storage level cache is able to absorb the I/O peaks most of the time...

Martin said...

The ability to concatenate multiple LUNs into a single LUN would be incredibly useful; I can do it at the host level but it's a bit of a ball-ache for the sysadmins.

It would be neat to do the stacking in the SAN; can't see it ever happening and I imagine it will always be done with host tools. Why? Once you've done it and moved it into our new wide-striped world, you can resize LUNs etc pretty much at will. I'm hoping that the sheer proliferation of LUNs to support misguided DBAs will slowly vanish as wide-striping and automated movement of blocks becomes more and more mainstream.