Monday 25 June 2007

Responding to Comments

I'm not quite sure what the right way is to respond to posts; If I comment after them, then there's a chance that the responses might be missed. Anyway I will attempt to go back and check all those comments I've not responded to. First, here's a response to Cedric's question, on VMware;

What I was referring to was the way in which (a) VMware lays storage out over LUNs, (b) the way storage systems do failover at a LUN level.

For standard arrays which present LUNs, the lowest unit of granularity for TrueCopy/SRDF etc is the LUN. That's fine for systems that have lots of LUNs and then recombine them at the host level to create volumes. VMware works on a smaller number of larger LUNs (in my experience 50-200GB metas or LUSEs) and then divides this storage across multiple guests. So failover of a LUN could mean failover of a number of hosts. The tradeoff is therefore on how to lay out the storage in order to allow failover to work without affecting multiple hosts and also when additional storage is needed for a host how to present that additional storage.

The issue occurs because SAN storage is presented through the VMware hypervisor. iSCSI LUNs are presented directly to the host and so the original granularity can be retained and iSCSI LUNs can still be replicated. Therefore DR on iSCSI presented LUNs can easily be achieved.

Under the current versions of VMware (and please correct me if I'm wrong) it is possible to boot a guest from an iSCSI LUN and therefore theoretically possible to fail this over to another array. Personally, I'd not do that as I think it would present significant complications to achieve and I'd prefer to just present data LUNs for replication and keep O/S data local.

I hope this clarifies my thinking, feel free to post if you want me to clarify more.

2 comments:

DCed said...

Hi Chris,

I'm not quite sure hthe right way to answer your answer to my post ;).

I understand a little bit more your comment regarding the FC attached disks but i still see an issue wether it is FC or iSCSI.

With VMWare, you consolidate more & more servers on one unique platform that rely on FC or ISCI to provide you all sexy features from VMWARE like vmotion, DR. However, so far, i haven't found a simple solution to ensure a good redundancy at the storage array level. What happens today, if you need to execute some maintenance work on your array and you want to move active lun's on your secondary array ? What happens if you have on site failure and you need to switch on the second one. The only answer so far is : crash all your guest OS, switch your SRDF/truecopy pair and restart your guest OS on the remote site. Vmotion is useless in this case. It's far from a good solution for me. What's your opinion ?

Chris M Evans said...

Cedric, yes, I see your problem and agree that currently the only way to move a guest to another site is to crash it and restart if a site failure occurs.

What you're asking for I think requires some fairly fancy processing, which could be achieved with VMotion if it understood SRDF/TrueCopy, however I think timing would be an issue. I don't think an SRDF split could be achieved fast enough to make the VMotion process as seamless as it is today. Somehow, VMware would have to cache the I/O activity as the SRDF split/failover took place and then apply the writes at the remote target. Going back to previous discussions this would absolutely not work with iSCSI as the VMware hypervisor would need to be in control of the I/O path, which clearly it is not when using iSCSI LUNs.

The alternative is some kind of lazy dual writing by the hypervisor. Imagine for instance that the VMware server maintains two disks, one on a local system, one remote (over DWDM for instance) and writes are pushed to both copies in a checkpointed fashion. Vmotion could then fail over a guest to a remote location and run off the remote LUN at the same time.

All of these options are very much controlled migrations. I don't see at this stage how unplanned outages could be managed without effectively having a server appear to have crashed - but I'm fairly certain VMware will be working on it.

The whole VM proposition and how it relates to storage is going to be very interesting and I see it will need some cunning minds to ensure the implementations are done correctly!!