Monday 9 February 2009

The Storage Architect Has Moved!

I've decided to move the blog over to Wordpress and there's a new direct URL too; http://www.thestoragearchitect.com. Please check me out in the new location. In addition, there's a new feed too; http://thestoragearchitect.com/feed/ - the feedburner feed stays the same and redirects. Please update your bookmarks!

Thursday 5 February 2009

Personal Computing: The Whole Of Twitter In Your Hand

A quick check on Twitter this morning shows me they're up to message number 1,179,118,180 or just over the 1.1 billion mark. That's a pretty big number - or so it seems, but in the context of data storage devices, it's not that big. Let me explain...




Assume Twitter messages are all the full 140 characters long. That means, assuming all messages are being retained, that the whole of Twitter is approximately, 153GB in size. OK, so there will be data structures needed to store that data, plus space for all the user details, however I doubt whether the whole of Twitter exceeds 400GB. That fits comfortably on my Seagate FreeAgent Go!

If every message ever sent on Twitter can be stored on a single portable hard drive, then what on earth are we storing on the millions of hard drives that get sold each year?

I suspect the answer is simply that we don't know. The focus in data storage is to provide the facility to store more and more data, rather than rationalise what we do have. For example, a quick sweep of my hard drives (which I'm trying to do regularly) showed half a dozen copies of the Winzip installer, the Adobe Acrobat installer plus various other software products that are regularly updated, for example the 2.2.1 update of the iPhone software at 246MB!

What we need is (a) common sense standards for how we store our data (I'm working on those), (b) better search and indexing functionality that can make decisons based on the content of files - like the automated deletion of defunct software installers.

There's also one other angle and that's when network speeds become so fast that storing a download is irrelevant. Then our data can all be cloud-based and data cleansing becomes a value add service and someone else's problem!

Wednesday 4 February 2009

Enterprise Computing: Seagate Announces new Constellation Hard Drives

Seagate announced this week the release of their new Constellation hard drives. Compared to the Savvio range (which are high-performance, low form-factor), these drives are aimed at lower tier archiving solutions and will scale to 2TB.


I had a briefing on these drives a couple of weeks ago and there's the usual capacity and performance increase metrics to drool over (let's face it, who doesn't want a 2TB drive), however, impressive as it is, pure capacity increases don't cut it any more for me. What's more relevant are the other less obvious features.

Power Reduction

With PowerTrim, Seagate are claiming a 2.8W consumption (idle) for the 2.5" form-factor drive. This compares to 5.2W for the Savvio 10K 146GB - almost half. This reduction is relevant not just for the power saving, but for the benefits in reduced cooling requirements and consequently the ability to stack more of these drives in a small space.


Constellation also provides PowerChoice, which will allow drives to be progressively spun down to reduce power. I've included a couple of graphics courtesy of Seagate which show the benefits of the different power-down levels.




In a previous discussion with COPAN, they indicated to me that their power-down solution had seen an increase in the life of hard drives, so I would expect Constellation to see the same benefits, although Seagate haven't indicated that.





Encryption

Although encryption isn't new, what's good to see is that it is becoming a standard feature on enterprise drives and will be available on SAS Constellation drives later this year (Seagate Secure SED).

Security breaches are unacceptable; destroying soft-fail drives because they can't be recycled with "sensitive" material on them is also irresponsible. Hopefully encryption can tackle both issues head-on.
Summary
So where and how will these drives be used? Well, I hope the major vendors are looking to bring out 2.5" form-factor products and potentially blended products as well. It's not unreasonable to expect these guys to be using 2.5" drives to make their products lighter and more efficient. Also, for modular and monolithic arrays, exchangable canisters or enclosures could easily allow 2.5" drives to be incorporated into existing hardware.
Oh and before anyone comments, yes I am aware that the "multiple supplier" argument will be used as an excuse not to adopt this technology...
Of course, we shouldn't forget the underlying reason why we've reached the position of 2TB in a single drive - we are keeping too much data. We all need to pay as much attention to optimising our existing assets as we do to installing new and shiny ones.

Wednesday 28 January 2009

Storage Management: Aperi - It's all over

It looks like the open storage management project Aperi has finally been put to rest. See this link.


Storage Resource Management is in a woeful state. SNIA with their SMI-S initiative have failed to deliver anything of value. I've posted multiple times here and here about how bad things are. I'm not the only one: Martin's Recent Post discussed it; if I could be bothered I'm sure I could find more!

Previously I've discussed writing SRM software and I've done just that with a company I've been working with for some months: http://www.storagefusion.com/. Whilst this might be a shameless plug, I can honestly say that as a product (in the reporting space at least) SRA will harmonise storage reporting more than anything else out there today. Here's why:


  1. It doesn't rely on generic standards for reporting, but gets the full detail on each platform.
  2. It uses element managers or management console/CLIs to retrieve data.
  3. It doesn't need additional servers or effort to deploy or manage.
  4. It normalises all data to provide a simple consistent framework for capacity reporting.

Now reporting is good, but management is hard by comparison. Reporting on hardware doesn't necessarily break it - SRM software which changes the array could - therefore it needs to know exactly how to interact with an array and therefore requires decent API access.

Vendors aren't going to give this out to each other, so here's a proposal:

Vendors fund a single organisation to develop a unified global SRM tool. They provide API access under licence which doesn't permit sharing of that API with competitors. As the product is licensed to end users, each vendor gets paid a fee per array per GB managed so thay have some financial recompense for putting skin into the game.

Anyone interested?

Monday 26 January 2009

Personal Computing: Mangatars

It seems to be all the rage to change your Twitter image to a Manga avatar or Mangatar. Well, here's mine.


No doubt there will be plenty of people who will claim I've taken some artistic liberties, but I can't answer for the lack of "features" in the software to fully capture my natural essence.

Enjoy.

Saturday 24 January 2009

Enterprise Computing: Using USP for Migrations

Thanks to Hu Yoshida for the reference to a previous post of mine which mentioned using virtualisation (USP, SVC, take your pick) for performing data migrations. As Hu rightly points out, the USP, USP-V, NSC55 and USP-VM can all be used to virtualise other arrays and migrate data into the USP as part of a new deployment. However nothing is ever as straightforward as it seems. This post will discuss the considerations in using a USP to virtualise and migrate data into a USP array from external sources.


Background

The Universal Volume Manager (UVM) feature on the USP enables LUN virtualisation. To access external storage, storage ports on the USP are configured as "External" and connected either directly or through a fabric to the external storage. See the first diagram as an example of how this works.

As far as the external storage is concerned, the USP is a Windows host and the settings on the array should match this. Within Storage Navigator, each externally presented LUN appears as a RAID group. This RAID group can then be presented as a single LUN or if required, carved up into multiple individual LUNs.

The ability to subdivide external storage isn't often mentioned by HDS; it's usually assumed that external storage will be passed through the USP on a 1:1 basis and if the external storage is to be detached in the future then this is essential. However if a configuration is being built from scratch then external storage could be presented as larger LUNs and subdivided within the USP. This is highlighted in the second diagram.

At this point, external storage is being passed through the USP but the data still resides on the external array. The next step is to move the data onto LUNs within the USP itself. Here's the tricky part. The target LUNs in the USP need to be exactly the same size as the source LUNs on the external array. What's more, they need to be the same size as the way the USP views them - which is *not* necessarily the same as the size on the external storage itself. This LUN size issue occurs because of the way the USP represents storage in units of tracks. From experience, the best way to solve this problem was to actually present the LUN to the USP and see what size the LUN appears as. When I first used UVM, HDS were unable to provide a definitive method to calculate the size a LUN would appear within Storage Navigator.

The benefits of virtualisation for migration can fall down at this point. If the source array is particularly badly laid out, the target array will retain the multiple LUN sizes. In addition, a lot of planning needs to be performed to ensure the migration of the LUNs into the USP doesn't suffer from performance issues.

Data is migrated into the USP using Volume Migration, ShadowImage (or TSM). This clones the source LUN within the USP to a LUN on an internal RAID group. At this point, depending on the migration tool it may be necessary to stop the host to remap to the new LUNs. This completes the migration process. See the additional diagrams, which conceptualise migration with TSM.
Now, this example is simple; imagine the complexities if the source array is replicated. Replication has to be broken, potentially requiring an outage for the host. Replication needs to be re-established within the USP but data has to be fully replicated to the remote location before the host data can be confirmed as consistent for recovery. This process could take some time.



Summary

In summary, here are the points that must be considered when using USP virtualisation for migration:

  1. Configuring the external array to the USP requires licensing Universal Volume Manager.
  2. UVM is not free!
  3. Storage ports on the USP have to be reserved for connecting to the external storage.
  4. LUN sizes from the source array have to be retained.
  5. LUN sizes aren't guaranteed to be exactly the same as the source array.
  6. Once "externalised" LUNs are replicated into the USP using ShadowImage/TSM/VM.
  7. A host outage may be required to re-zone and present the new LUNs to the host.
  8. If the source array is replicated, this adds additional complication.
I'll be writing this blog up as a white paper on my consulting company's website at www.brookend.com. Once it's up, I'll post a link on the blog. If anyone needs help with this kind of migration, then please let me know!

Friday 23 January 2009

Cloud Storage: Review - Dropbox


Over the last few weeks I've been using a product called Dropbox. This nifty little tool let's you sync up your files from anywhere and across multiple platforms. It's a perfect example of Cloud Storage in action.


The Problem

Keeping data in sync between multiple devices is a real issue. I use two laptops, a MacBook and a main PC on an almost daily basis. I don't always take the same machine with me but I always have a common set of files I need access to. I've tried various ways to solve my data synchronicity issues; Windows Offline Folders, portable hard drives and so on. These solutions fail for various reasons, the main one being the lack of a central automated repository where my data is kept. This is the issue Dropbox solves.

How It Works

Dropbox provides up to 50GB of shared "cloud" storage for storing data. This space is replicated onto a local directory (usually called "My Dropbox") on each machine that needs to access the shared storage. As files are created and amended, Dropbox keeps them in sync, uploading and downloading changes as needed. Obviously you need an Internet connection to make this work, but these days most PCs and laptops are 'net connected and changes are kept locally on disk until an Internet connection is detected.

The interface itself is pretty cool. Have a look at the screenshots. There's an icon for the system tray and if you click it, you get a mini-menu showing the status of uploads and downloads. In my screenshot you can see I'm up to date and using about 81% of the free 2GB allocation.

What's really neat though are the things you don't notice; uploads and downloads happen automatically and as groups of files are synchronised, a reminder window appears in the top right of the screen. Also, each file in the Dropbox is overlaid with a tick or a circle symbol to show whether or not it has been synchronised into the "cloud". See the third screenshot where one file is sync'd and the other is being uploaded. If the same file is edited on multiple machines before being synchronised, then they are stored with a suffix indicating the machine they came from, letting the user sort out update clashes.
Without boring you in the detail, there's support for Windows, Mac and Linux plus a web interface; files are transmitted via SSL and stored using AES-256 security; you can also share files publicly and publish photos.

Why This is Good

I've installed Dropbox across 5 machines (one of which is a virtual Windows host under VMware Fusion) and have found the synchronisation pretty much flawless. At first I used only a limited subset of my main files, but as I've used the product more, I'm gaining more confidence in putting more and more data online.

Drawbacks
The only flaw I've found so far is that I can't put my Dropbox folder onto a network share. This is pretty annoying as my main machine at home only stores data on a central file server. I did however manage to get a Dropbox folder onto a machine where the network share was also an offline folder.
Online services like this are not a replacement for sensible data management; in addition to storing data on a service like this you should still be keeping regular backups - (a) in case the service goes away (b) in case you corrupt your data.
Wishlist
I'd like to see network shares supported as a Dropbox location. I'd also like to see the equivalent of VSS for Dropbox - so as I change or delete files I can recover previous versions.
The free version of Dropbox comes with 2GB of storage. I've just about reached that and I'm considering upgrading to the paid service which gives 50GB a month (although I might do a bit of data re-organisation first!). Go try it for yourself and see how it improves your productivity; you can do cool stuff like move your "favourites" folder into Dropbox and share IE bookmarks between machines - the same goes for any standard folder link.
This is the first of a number of products I'm reviewing. I'll be keeping a comparison checklist of all of them which will get posted as each product is reviewed.

Wednesday 21 January 2009

Off Topic: Dropping TweetDeck Refresh Rate


Slightly off topic and apologies for it, but I've been using Twitter for some time now and I normally use TweetDeck rather than the standard interface (although on the iPhone I use Tweetie).


As follower numbers have increased, I'm finding one minute updates a distraction so I've dropped my refresh rate to something more manageable.
General updates are now 15 minutes, 5 mins for replies and 10 for DMs. This has made things more easy to cope with but I may drop to lower when getting "real" work done!
I had 624 tweets this morning to go through. I can't imagine how people like Stephen Fry cope!

Monday 19 January 2009

Personal Computing: MacBook Day 3

I've only just picked up my MacBook for the day; too much real work do to!


Seriously though, my next issue is to decide how to edit my standard word and spreadsheet documents.  I've installed the latest version of OpenOffice and it works fine.  At least, it appears to work fine on simple documents.  Who knows how it would work on some of the more complex documents I work on.  So what are the options:

  • Office 2008 for Mac - chargeable
  • OpenOffice
  • Office 2007 for Windows under Fusion (or other)
  • iWorks
Any other options?  I'm happy to take other suggestions, but at first, I think Office under Windows seems to be best choice.  


Enterprise Computing: Migrating Petabyte Arrays

Background

The physical capacity of storage arrays continues to grow at an enormous rate, year on year. Using EMC as a benchmark, we can see that a single array has grown over the years;

  • Symmetrix 3430 - 96 drives, 0.84TB
  • Symmetrix 5500 - 128 drives, 1.1TB
  • Symmetrix 8830 - 384 drives, 69.5TB
  • DMX3000 - 576 drives, 76.5TB
  • DMX-4 - 1920 drives, 1054TB

Note: these figures are indicative only!

DMX-3 and DMX-4 introduced arrays which scale to petabytes (1000TB) of available raw capacity. At some point, these petabyte arrays will need to be replaced and will represent a unique challenge to today's storage managers. Here's why.


Doing The Maths

From my experience, storage migrations from array to array can be complex and time consuming. Issues include:


  • Identifying all hosts for migration
  • Identifying all owners for storage
  • Negotiating migration windows
  • Gap analysis on driver, firmware, O/S, patch levels
  • Change Control
  • Migration Planning
  • Migration Execution
  • Cleanup

With all of of the above work to do, it's not surprising that realistically, around 10 servers per week is a good estimate of the capability of a single FTE (Full Time Equivalent, e.g. a storage guy). Some organisations may find this figure can be pushed higher, but I'm talking about one person, day in day out, performing this work, so I'll stick with my 10/week figure.

Assume an array has 250 hosts, each of an average 500GB, then this equates to about 125TB of data and almost 6 month's effort for our single FTE! In addition, the weekly migration schedule requires moving on average 5TB of data. If the target array differs from the source (e.g. a new vendor, different LUN size) then the migration task can be time consuming and complex to execute.




Look at the following diagram. It shows the lifecycle of physical storage in an array over time. Initially the array is deployed and storage configured. Over the lifetime of the array, more storage is added and presented to hosts until either the array reaches a maximum physical capacity or an acceptable capacity threshold. This remains until migrations start to take place to another array. Up to the point migrations take place, storage is added and paid for as required, however once migrations start, there is no refund from the vendor for the unused resources (those represented in green). They have been purchased but remain unused until the entire array is decommissioned. If the decommissioning process is lengthy then the amount of unused resources becomes high, especially on petabyte arrays. Imagine a typical 4-year lifecycle; up to 1 year could be spent moving host to new arrays - at significant cost in terms of manpower and impact to the business.

Solutions

So how should we adapt migration processes to handle the issue of migrating these monster arrays?

  • Establish Standards.  This is an age old issue but one that comes up time and time again.  Get your standards right.  These include consistent LUN sizes, naming standards and support matrix (compatibility) standards.
  • Consider Virtualisation. Products including SVC, USP, InVista (EMC) and iNSP (Incipient) all allow the storage layer to be virtualised.  This can assist in the migration process.
  • Keep Accurate Records.  This may seem a bit obvious but it is amazing the number of sites who don't know how to contact the owner of some of the servers connected to their storage.
  • Talk to Your Customers.  Migrations inevitably result in server changes and potentially an outage.  Knowing your customer and keeping them in the loop regarding change planning saves a significant amount of hassle.
Technology replacement is now part of standard operational work.  Replacing hardware is not all about technology; procedures and common sense will form a more and more important part of the process.

Sunday 18 January 2009

Personal Computing: MacBook Day 2

So, second day with my MacBook and I've started to look at application transparency between Mac and Windows.


On a positive note, I managed to get DropBox working (easily) and MindManager for Mac - all my mind maps are directly compatible.  I didn't expect I would have a problem but it's good to see compatibilty sensibly implemented.

For major Office documents, I'm looking at OpenOffice or MS Office for Mac (however I don't like the idea of paying for the same product on multiple platforms.  

If I can get consistency in my standard data formats, then I can see me being more confident about using cloud applications to store data.  That's a good thing as I'm trialling a number of products and would want to see the platform become irrelevant as long as the applications are good.  The only issue with that is having a continual reliance on an Internet connection; but that's achievable anyway.  

So far, products I've got happily working cross-platform;

  • DropBox
  • NetNewsWire/FeedDemon
  • MindManager
  • TweetDeck
  • Exchange Email
Next challenges will be the Office installation, address and calendar (and decent synchronisation) and my favourite challenge, P2V'ing my "work" laptop (i.e. a machine I have to use to access a corporate network I use).  If I can virtualise that machine and transport it around, I may be able to bin my Windows laptop entirely.

Saturday 17 January 2009

Personal Computing: MacBook Day 1

For those who don't follow me on Twitter, today I "upgraded" my laptop to a shiny new MacBook.  If you are interested, it's the 2.4Ghz version with 4GB of RAM.  Enough of the specifications, how am I finding it so far?


I'm reminded of the time, many years ago (15+) I started to use Unix.  Previously I was a mainframe guy (professionally) and had used many different PCs - Amiga, Spectrum, Oric, BBC Micro, ZX-81, to name but a few, so change was never an issue.  However in a work environment, Unix was radically different.  In particular, I loathed the command line and the awful vi editor.  

Curiously I find myself in a similar position today.  I'm confident to say I know Windows pretty well.  In fact, after using the MacBook for a couple of hours, I realise I understand Windows intimately.  It is slightly unnerving that (a) I don't know where to find things (thx Storagebod for a few pointers) (b) I have no idea how best to organise the device I have just purchased.

This is an interesting situation to be in and in many ways reflects on my professional work.

Think about when you buy a new storage array; at that point you probably don't understand the intricacies of how it should be configured when taking into consideration best practices, standards, performance and so on.  That's one of the benefits of having help when deploying new technology.  It shortcuts that learning process.

Another thing that occurs is how much data I store online; Newsgator; RTM, DropBox to name but a few.  Perhaps cloud storage has been there for a lot longer than we think.

Tuesday 13 January 2009

Where's All the Data Gone?

Eric Savitz over at Tech Trader has an interesting article today.


Demand at Seagate is down and consolidation of the industry is expected. However as recently as March last year EMC was telling us how storage growth just keeps on spiralling upwards.


So what's happening? Are we becoming inherently more efficient at storing our data all of a sudden, now that a credit crunch is upon us? Somehow I don't think so.


Demand ebbs and flows as finances dictate the ability to purchase new equipment, but growth remains steady. Technology is replaced constantly but just like you or I might hold on to our car for another year or so before replacement, so will IT departments, preferring to pay maintenance on existing kit rather than rip and replace to the latest and greatest. I can see two consequences from this;

  • More time and effort will need to be paid to using current resources more efficiently.
  • Migration to new hardware will need to be even more slick and quick to reduce the overhead of migration wastage.

I'll discuss these subjects in more detail this week.

Job Losses: EMC Joins The Club

EMC have finally announced that they will be following the industry trend and cutting staff. Approximately 7% (2400) of the workforce will go. The cuts are widely reported (here for instance) and at their earliest were forecast by Stephen Foskett in his December post.

Have a look at this list of tech layoffs. Those storage related are Seagate, Dell, EMC, WD, Pillar Data, Sun, SanDisk, HP. Not on the list are Quantum and COPAN.

Do you know of any others?

2009 will be the year of rationalisation and optimisation. The only prediction to make for the next 12 months is that end-users will be looking to do more with less.

Wednesday 7 January 2009

Redundant Array of Inexpensive Clouds - Pt III

In my two previous articles I discussed Cloud Storage and the concept of using middleware to store multiple copies of data across different service providers. In this final part, I'd like to discuss the whole issue of security.


Using "the cloud" to store data requires a major shift in thinking; traditionally all your information would be stored locally and therefore benefit from the advantage of physical security. Not only would someone need to hack your firewall to get network access, they would then have to obtain system access too, and likely as not would be spotted (hopefully) quite quickly. So, retaining physical access to data has been a significant benefit.

Now we've obviously been trusting a form of cloud storage for some time. Email systems like Gmail, Hotmail and Yahoo have always had access to our email data and have provided limited storage capabilities but they haven't really been the foundation for running a business (although I'm sure there are organisations that have done it). Putting data into the cloud means there's always a risk of someone else getting to your data. You make someone else the guardian or gatekeeper of that data access and rely on the quality of their encryption and access controls. So, it is important to understand what facilities each infrastructure provider offers.

Amazon Web Services

Amazon have a great whitepaper on security, which can be found here. It highlights the level of physical security offered (which is high) plus details of the logical security of data. It may seem surprising that Amazon don't routinely back up data on AWS but rely instead on multiple copies in remote locations, however backup and archive should be thought of as distinct requirements. In addition, data at rest in AWS is not encrypted; users of AWS should therefore ensure their service provider offers this capability at source.

Nirvanix

Nirvanix have two white papers which discuss data security. They can be found here (registration required). As with Amazon, Nirvanix are keen to highlight the security of their facilities and adherence to Statement on Auditing Standard (SAS 70) certification. They also go further in indicating that data is stored using RAID-6 and RAID-10 protection, with backups in place too.

Summary

Both AWS and Nirvanix offer good physical security and SSL encryption for data in flight. Encryption at rest and backups are not routinely offered and therefore a cloud user should weigh up how these features are to be implemented. This takes us back to the original premise of these postings, the idea of using multiple cloud providers to add resilience and availability to cloud stored data. It also demands a set of standards for cloud storage use, which I am working on even as I write this post. Watch this space.

Tuesday 6 January 2009

Personal Storage: Goodbye to Old Friends

I like to use the Christmas holidays as an excuse for a good old-fashioned cleanout. This invariably means burning (shredding takes to long and we don't have a hamster) old paperwork and junking lots of defunct technology.


I tend to hoard stuff, however I have got my miscellaneous technology down to four crates. Being thrown this year was a lot of storage related technology including;


  • Two 3.5" floppy drives -I don't actually have any floppy media so the drives are no longer useful

  • Philips DVDRW208 - one of my earliest DVD writers

  • Toshiba DVD-RAM SD-W1101

  • Creative 52x CD Drive CD5233E

  • Pioneer CD-ROM DR-U06S

  • Exabyte EXB-4200CT DAT drive

  • Seagate STT320000A DAT drive

The early DVD writers were a pain to get working with different media types. The quality of the media sure made a difference. I never got on with DVD-RAM, especially with the cartridge loading format; and as for the DAT drives...


At the time this technology seemed new and cutting edge. Now it seems so old hat. I wonder what I'll be throwing out next year!

Monday 5 January 2009

Enterprise Computing: RAID Is Not Enough


Happy New Year and welcome back to all my readers!
:-)

I've been messing about with some old hard drives this week and unusually for me, one is sounding decidedly sickly. I've never had a personal hard drive go on me (I guess I always upgrade/move on before it happens), but rest assured I've had plenty "fail" in the Enterprise arena. Usually those failures are pre-emptive microcode soft-fails and the array seamlessly rebuilds onto another spare device and no data is lost.


Pity poor JournalSpace who managed to total their business this week by relying purely on RAID within their main database server.


The loss of the data is not clear - the server had a RAID-1 configuration; follow the link and have a read, but I quote:


"There was no hardware failure. Both drives are operating fine; DriveSavers had no problem in making images of the drives. The data was simply gone. Overwritten."


Now RAID is a great technology for recovering from physical drive failure and that is all it is - a mechanism to reduce the risk of data loss from failure of a hard drive. It is not a solution for managing data correctly. In this instance Journalspace must have suffered from the other things all good storage admins think (worry) about;



  • Sabotage

  • Server failure

  • Catastrophic array failure

  • Software bug

  • Site failure

  • User stupidity

If data is the lifeblood of your organisation then you *must* replicate it onto another online copy or at least onto a backup and have multiple copies in multiple locations.


If anyone out there is not sure they're protecting their data properly - then give me a call!