The Storage Architect

Monday 9 February 2009

The Storage Architect Has Moved!

I've decided to move the blog over to Wordpress and there's a new direct URL too; http://www.thestoragearchitect.com. Please check me out in the new location. In addition, there's a new feed too; http://thestoragearchitect.com/feed/ - the feedburner feed stays the same and redirects. Please update your bookmarks!

Thursday 5 February 2009

Personal Computing: The Whole Of Twitter In Your Hand

A quick check on Twitter this morning shows me they're up to message number 1,179,118,180 or just over the 1.1 billion mark. That's a pretty big number - or so it seems, but in the context of data storage devices, it's not that big. Let me explain...

Assume Twitter messages are all the full 140 characters long. That means, assuming all messages are being retained, that the whole of Twitter is approximately, 153GB in size. OK, so there will be data structures needed to store that data, plus space for all the user details, however I doubt whether the whole of Twitter exceeds 400GB. That fits comfortably on my Seagate FreeAgent Go!

If every message ever sent on Twitter can be stored on a single portable hard drive, then what on earth are we storing on the millions of hard drives that get sold each year?

I suspect the answer is simply that we don't know. The focus in data storage is to provide the facility to store more and more data, rather than rationalise what we do have. For example, a quick sweep of my hard drives (which I'm trying to do regularly) showed half a dozen copies of the Winzip installer, the Adobe Acrobat installer plus various other software products that are regularly updated, for example the 2.2.1 update of the iPhone software at 246MB!

What we need is (a) common sense standards for how we store our data (I'm working on those), (b) better search and indexing functionality that can make decisons based on the content of files - like the automated deletion of defunct software installers.

There's also one other angle and that's when network speeds become so fast that storing a download is irrelevant. Then our data can all be cloud-based and data cleansing becomes a value add service and someone else's problem!

Wednesday 4 February 2009

Enterprise Computing: Seagate Announces new Constellation Hard Drives

Seagate announced this week the release of their new Constellation hard drives. Compared to the Savvio range (which are high-performance, low form-factor), these drives are aimed at lower tier archiving solutions and will scale to 2TB.

I had a briefing on these drives a couple of weeks ago and there's the usual capacity and performance increase metrics to drool over (let's face it, who doesn't want a 2TB drive), however, impressive as it is, pure capacity increases don't cut it any more for me. What's more relevant are the other less obvious features.

Power Reduction

With PowerTrim, Seagate are claiming a 2.8W consumption (idle) for the 2.5" form-factor drive. This compares to 5.2W for the Savvio 10K 146GB - almost half. This reduction is relevant not just for the power saving, but for the benefits in reduced cooling requirements and consequently the ability to stack more of these drives in a small space.

Constellation also provides PowerChoice, which will allow drives to be progressively spun down to reduce power. I've included a couple of graphics courtesy of Seagate which show the benefits of the different power-down levels.

In a previous discussion with COPAN, they indicated to me that their power-down solution had seen an increase in the life of hard drives, so I would expect Constellation to see the same benefits, although Seagate haven't indicated that.

Encryption

Although encryption isn't new, what's good to see is that it is becoming a standard feature on enterprise drives and will be available on SAS Constellation drives later this year (Seagate Secure SED).

Security breaches are unacceptable; destroying soft-fail drives because they can't be recycled with "sensitive" material on them is also irresponsible. Hopefully encryption can tackle both issues head-on.

Summary

So where and how will these drives be used? Well, I hope the major vendors are looking to bring out 2.5" form-factor products and potentially blended products as well. It's not unreasonable to expect these guys to be using 2.5" drives to make their products lighter and more efficient. Also, for modular and monolithic arrays, exchangable canisters or enclosures could easily allow 2.5" drives to be incorporated into existing hardware.

Oh and before anyone comments, yes I am aware that the "multiple supplier" argument will be used as an excuse not to adopt this technology...

Of course, we shouldn't forget the underlying reason why we've reached the position of 2TB in a single drive - we are keeping too much data. We all need to pay as much attention to optimising our existing assets as we do to installing new and shiny ones.

Wednesday 28 January 2009

Storage Management: Aperi - It's all over

It looks like the open storage management project Aperi has finally been put to rest. See this link.

Storage Resource Management is in a woeful state. SNIA with their SMI-S initiative have failed to deliver anything of value. I've posted multiple times here and here about how bad things are. I'm not the only one: Martin's Recent Post discussed it; if I could be bothered I'm sure I could find more!

Previously I've discussed writing SRM software and I've done just that with a company I've been working with for some months: http://www.storagefusion.com/. Whilst this might be a shameless plug, I can honestly say that as a product (in the reporting space at least) SRA will harmonise storage reporting more than anything else out there today. Here's why:

It doesn't rely on generic standards for reporting, but gets the full detail on each platform.
It uses element managers or management console/CLIs to retrieve data.
It doesn't need additional servers or effort to deploy or manage.
It normalises all data to provide a simple consistent framework for capacity reporting.

Now reporting is good, but management is hard by comparison. Reporting on hardware doesn't necessarily break it - SRM software which changes the array could - therefore it needs to know exactly how to interact with an array and therefore requires decent API access.

Vendors aren't going to give this out to each other, so here's a proposal:

Vendors fund a single organisation to develop a unified global SRM tool. They provide API access under licence which doesn't permit sharing of that API with competitors. As the product is licensed to end users, each vendor gets paid a fee per array per GB managed so thay have some financial recompense for putting skin into the game.

Anyone interested?

Monday 26 January 2009

Personal Computing: Mangatars

It seems to be all the rage to change your Twitter image to a Manga avatar or Mangatar. Well, here's mine.

No doubt there will be plenty of people who will claim I've taken some artistic liberties, but I can't answer for the lack of "features" in the software to fully capture my natural essence.

Enjoy.

Saturday 24 January 2009

Enterprise Computing: Using USP for Migrations

Thanks to Hu Yoshida for the reference to a previous post of mine which mentioned using virtualisation (USP, SVC, take your pick) for performing data migrations. As Hu rightly points out, the USP, USP-V, NSC55 and USP-VM can all be used to virtualise other arrays and migrate data into the USP as part of a new deployment. However nothing is ever as straightforward as it seems. This post will discuss the considerations in using a USP to virtualise and migrate data into a USP array from external sources.

Background

The Universal Volume Manager (UVM) feature on the USP enables LUN virtualisation. To access external storage, storage ports on the USP are configured as "External" and connected either directly or through a fabric to the external storage. See the first diagram as an example of how this works.

As far as the external storage is concerned, the USP is a Windows host and the settings on the array should match this. Within Storage Navigator, each externally presented LUN appears as a RAID group. This RAID group can then be presented as a single LUN or if required, carved up into multiple individual LUNs.

The ability to subdivide external storage isn't often mentioned by HDS; it's usually assumed that external storage will be passed through the USP on a 1:1 basis and if the external storage is to be detached in the future then this is essential. However if a configuration is being built from scratch then external storage could be presented as larger LUNs and subdivided within the USP. This is highlighted in the second diagram.

At this point, external storage is being passed through the USP but the data still resides on the external array. The next step is to move the data onto LUNs within the USP itself. Here's the tricky part. The target LUNs in the USP need to be exactly the same size as the source LUNs on the external array. What's more, they need to be the same size as the way the USP views them - which is *not* necessarily the same as the size on the external storage itself. This LUN size issue occurs because of the way the USP represents storage in units of tracks. From experience, the best way to solve this problem was to actually present the LUN to the USP and see what size the LUN appears as. When I first used UVM, HDS were unable to provide a definitive method to calculate the size a LUN would appear within Storage Navigator.

The benefits of virtualisation for migration can fall down at this point. If the source array is particularly badly laid out, the target array will retain the multiple LUN sizes. In addition, a lot of planning needs to be performed to ensure the migration of the LUNs into the USP doesn't suffer from performance issues.

Data is migrated into the USP using Volume Migration, ShadowImage (or TSM). This clones the source LUN within the USP to a LUN on an internal RAID group. At this point, depending on the migration tool it may be necessary to stop the host to remap to the new LUNs. This completes the migration process. See the additional diagrams, which conceptualise migration with TSM.

Now, this example is simple; imagine the complexities if the source array is replicated. Replication has to be broken, potentially requiring an outage for the host. Replication needs to be re-established within the USP but data has to be fully replicated to the remote location before the host data can be confirmed as consistent for recovery. This process could take some time.

Summary

In summary, here are the points that must be considered when using USP virtualisation for migration:

Configuring the external array to the USP requires licensing Universal Volume Manager.
UVM is not free!
Storage ports on the USP have to be reserved for connecting to the external storage.
LUN sizes from the source array have to be retained.
LUN sizes aren't guaranteed to be exactly the same as the source array.
Once "externalised" LUNs are replicated into the USP using ShadowImage/TSM/VM.
A host outage may be required to re-zone and present the new LUNs to the host.
If the source array is replicated, this adds additional complication.

I'll be writing this blog up as a white paper on my consulting company's website at www.brookend.com. Once it's up, I'll post a link on the blog. If anyone needs help with this kind of migration, then please let me know!

Friday 23 January 2009

Cloud Storage: Review - Dropbox

Over the last few weeks I've been using a product called Dropbox. This nifty little tool let's you sync up your files from anywhere and across multiple platforms. It's a perfect example of Cloud Storage in action.

The Problem

Keeping data in sync between multiple devices is a real issue. I use two laptops, a MacBook and a main PC on an almost daily basis. I don't always take the same machine with me but I always have a common set of files I need access to. I've tried various ways to solve my data synchronicity issues; Windows Offline Folders, portable hard drives and so on. These solutions fail for various reasons, the main one being the lack of a central automated repository where my data is kept. This is the issue Dropbox solves.

How It Works

Dropbox provides up to 50GB of shared "cloud" storage for storing data. This space is replicated onto a local directory (usually called "My Dropbox") on each machine that needs to access the shared storage. As files are created and amended, Dropbox keeps them in sync, uploading and downloading changes as needed. Obviously you need an Internet connection to make this work, but these days most PCs and laptops are 'net connected and changes are kept locally on disk until an Internet connection is detected.

The interface itself is pretty cool. Have a look at the screenshots. There's an icon for the system tray and if you click it, you get a mini-menu showing the status of uploads and downloads. In my screenshot you can see I'm up to date and using about 81% of the free 2GB allocation.

What's really neat though are the things you don't notice; uploads and downloads happen automatically and as groups of files are synchronised, a reminder window appears in the top right of the screen. Also, each file in the Dropbox is overlaid with a tick or a circle symbol to show whether or not it has been synchronised into the "cloud". See the third screenshot where one file is sync'd and the other is being uploaded. If the same file is edited on multiple machines before being synchronised, then they are stored with a suffix indicating the machine they came from, letting the user sort out update clashes.

Without boring you in the detail, there's support for Windows, Mac and Linux plus a web interface; files are transmitted via SSL and stored using AES-256 security; you can also share files publicly and publish photos.

Why This is Good

I've installed Dropbox across 5 machines (one of which is a virtual Windows host under VMware Fusion) and have found the synchronisation pretty much flawless. At first I used only a limited subset of my main files, but as I've used the product more, I'm gaining more confidence in putting more and more data online.

Drawbacks

The only flaw I've found so far is that I can't put my Dropbox folder onto a network share. This is pretty annoying as my main machine at home only stores data on a central file server. I did however manage to get a Dropbox folder onto a machine where the network share was also an offline folder.

Online services like this are not a replacement for sensible data management; in addition to storing data on a service like this you should still be keeping regular backups - (a) in case the service goes away (b) in case you corrupt your data.

Wishlist

I'd like to see network shares supported as a Dropbox location. I'd also like to see the equivalent of VSS for Dropbox - so as I change or delete files I can recover previous versions.

The free version of Dropbox comes with 2GB of storage. I've just about reached that and I'm considering upgrading to the paid service which gives 50GB a month (although I might do a bit of data re-organisation first!). Go try it for yourself and see how it improves your productivity; you can do cool stuff like move your "favourites" folder into Dropbox and share IE bookmarks between machines - the same goes for any standard folder link.

This is the first of a number of products I'm reviewing. I'll be keeping a comparison checklist of all of them which will get posted as each product is reviewed.

The Storage Architect

Monday 9 February 2009