Thursday 13 November 2008

Obligatory Atmos Post

I feel drawn to post on the details of Atmos and give my opinion whether it is good, bad, innovative or not. However there's one small problem. Normally I comment on things that I've touched - installed/used/configured/broken etc, but Atmos doesn't fit this model so my comments are based on the marketing information EMC have provided to date. Unfortunately the devil is in the detail and without the ability to "kick the tyres", so to speak, my opinions can only be limited and somewhat biased by the information I have. Nevertheless, let's have a go.

Hardware

From a hardware perspective, there's nothing radical here. Drives are all SATA-II 7.2K 1TB capacity. This is the same as the much maligned IBM/XIV Nextra, which also only offers one drive size (I seem to remember EMC a while back picking this up as an issue with XIV). In terms of density, the highest configuration (WS1-360) offers 360 drives in a single 44U rack. Compare this with Copan which provides up to 896 drives maximum (although you're not restricted to this size).

To quote Storagezilla: "There are no LUNs. There is no RAID. " so exactly how is data stored on disk? What methods are deployed for ensuring data is not lost due to a physical issue? What is the storage overhead of that deployment?

Steve Todd tells us:

"Atmos contains five "built-in" policies that can be attached to content:

  • Replication
  • Compression
  • Spin-down
  • Object de-dup
  • Versioning


When any of these policies are attached to Atmos, COS techniques are used to automatically move the content around the globe to the locations that provide those services."

So, does that mean Atmos is relying on replication of data to another node as a replacement for hardware protection? I would feel mighty uncomfortable to think I needed to wait for data to replicate before I had some form of hardware-based redundancy - even XIV has that. Worse still, do I need to buy at least 2 arrays to guarantee data protection?

Front-end connectivity is all IP based, which presumably includes replication too, although there are no details of replication port counts or even IP port counts, other than the indication of 10Gb availability, if required.

One feature quoted on all the literature is Spin Down. Presumably this means spinning down drives to reduce power consumption; but spin down depends on data layout. There are two issues; if you've designed your system for performance, data from a single file may be spread across many spindles. How do you spin down drives when they all potentially contain active data? If you've laid out data on single drives, then you need to move all the inactive data to specific spindles to spin them down - that means putting the active data on a smaller number of spindles - impacting performance and redundancy in the case of a disk failure. The way in which Atmos does its data layout is something you should know - because if Barry is right, then his XIV issue could equally apply to Atmos too.

So to summarise, there's nothing radical in the hardware at all. It's all commodity-type hardware - just big quantities of storage. Obviously this is by design and perhaps it's a good thing as unstructured data doesn't need performance. Certainly as quoted by 'zilla, the aim was to provide large volumes of low cost storage and compared to the competition, Atmos does an average job of that.

Software

This is where things get more interesting and to be fair, the EMC message is that this is a software play. Here are some of the highlights;

Unified Namespace

To quote 'zilla again:

"There is a unified namespace. Atmos operates not on individual information silos but as a single repository regardless of how many Petabytes containing how many billions of objects are in use spread across whatever number of locations available to who knows how many users."

I've highlighted a few words here because I think this quote is interesting; the implication is that there is no impact on the volume of data or its geographical dispersion. If that's the case (a) how big is this metadata repository (b) how can I replicate it (c) how can I trust that it is concurrent and accurate in each location.

I agree that a unified name space is essential, however there are already plenty of implementations of this technology out there, so what's new with the Atmos version? I would want to really test the premise that EMC can provide a concurrent, consistent name space across the globe without significant performance or capacity impact.

Metadata & Policies

It is true that the major hassle with unstructured data is the ability to manage it using metadata based policies and this feature of Atmos is a good thing. What's not clear to me is where this metadata comes from. I can get plenty of metadata today from my unstructured data; file name, file type, size, creation date, last accessed, file extension and so on. There are plenty of products on the market today which can apply rules and policies based on this metadata, however to do anything useful, then more detailed metadata is needed. Presumably this is what the statement from Steve means: "COS also implies that rich metadata glues everything together". But where does this rich metadata come from? Centera effectively required programming their API and that's where REST/SOAP would come in with Atmos. Unfortunately unless there's a good method for creating the rich metadata, then Atmos is no better than the other unstructured data technology out there. To quote Steve again:

"Rich metadata in the form of policies is the special sauce behind Atmos and is the reason for the creation of a new class of storage system."

Yes, it sure is, but where is this going to come from?

Finally, let's talk again about some of the built-in policies Atmos has:

  • Replication
  • Compression
  • Spin-down
  • Object de-dup
  • Versioning
All of these exist in other products and are not innovative. However extending policies is more interesting; although I suspect this is not a unique feature either.

On reflection I may be being a little harse on Atmos, however EMC have stated that Atmos represents a new paradigm in the storage of data. If you make a claim like that, then you need to back it up. So, still to be answered;

  • What resiliency is there to cope with component (i.e HDD) failure?
  • What is the real throughput for replication between nodes?
  • Where is the metadata stored and how is it kept concurrent?
  • Where is the rich metadata going to come from?

Oh, and I'd be happy to kick the tyres if the offer was made.

7 comments:

Rob said...

You picked up on Steve Todd's explanation of Policies and then state:

"So, does that mean Atmos is relying on replication of data to another node as a replacement for hardware protection? I would feel mighty uncomfortable to think I needed to wait for data to replicate before I had some form of hardware-based redundancy - even XIV has that. Worse still, do I need to buy at least 2 arrays to guarantee data protection?"

You missed his overview of a gold policy where it is synchronous in 2 places and asynch to shanghai. I'd think that would more than make up for hardware failures. Still paranoid and rich? Make up your own Platinum policy. Synchronous in 4 sites. I've seen that discussed elsewhere.

Chris M Evans said...

Rob

I see what you mean - however I also raised the issue of replication connectivity - which appears to be all IP. There's no documentation to indicate whether replication is sync, async or otherwise. So assume sync exists - on the basic model it will be achieved over 1Gb IP. Is 1Gb/s enough to replicate my data to a remote site synchronously? If not, how much cache is there in each node? Is this all battery backed write cache in case I experience a hardware failure? I'm not trying to be awkward - there are lots of assumptions being made here - I want to understand how it all works to help me see how I'd be exposed in the case of component failure.

Chris M Evans said...

...and another thing - say I am relying on array-to-array replication to protect me from data loss and I lose a disk - I then have to re-create that data from the replica - preferably as quickly as possible - how long will it take to replicate a 1TB drive over IP? 2.7hours at full 1Gb/s if I am lucky. Is it really the best approach to use expensive network bandwidth to protect against disk failure?

Unknown said...

Hi Chris,

There is no "array-based replication" in Atmos. As new data gets written into the Atmos infrastructure it gets synchronously mirrored to N locations (depending on the policy). Any async mirroring is done later (and also does not rely on array-based replication).

You raise a lot of good issues in your post; I am sure that over time myself and others will do our best to answer them thoroughly, which is better done in a new post (as opposed to commenting here).

Steve

Chris M Evans said...

Steve

Thanks for the clarification. It seems to me to be a bit of a risk - you can't deploy one array on its own without the risk of losing data. Am I missing something here? I realise Atmos is meant to be a scalable solution and its targeted at large customers who will deploy multi-petabyte installations. Perhaps future explanations will help make this more clear.

Stephen Foskett said...

Great questions! Despite the voluminous posts by all the usual EMC folks, we still have a lot to learn about Atmos.

Unknown said...

Chris,
That's right, this solution is not meant to be deployed as "one array". It is meant to globally span data centers. Your use of the word "array" conjures up images (for me) of our Symm/CX-style products; Atmos is a different mindset.
Steve