Sunday, 30 November 2008
Wednesday, 26 November 2008
So after discussions on home storage, I'm going to do a weekly cleanup/report on what I've achieved. Here's the baseline;
Main Server; 927GB of usable storage (via Drobo) - 768GB in use. (82.84%). In fact I've consolidated a pair of mirrored 400GB drives onto the Drobo to make the full 768GB, so I've already freed these drives to be removed.
C: - 103GB total, 75.4GB in use (73.2%)
L: - 38.7GB, 34.85GB in use (90%)
I've included both C: (O/S) and L: (data) as my offline folder is on the C: drive
C: - 57.2GB - 34.3GB used (60%)
D: - 97.6GB - 4GB used (4.1%)
E: - 274GB - 133GB used (48.5%)
So that's the baseline. The first saving is to delete the Exchange backup - 314GB. More to follow.
Tuesday, 25 November 2008
Marc Farley makes some interesting comparisons to storage purchasing decisions in a recent post. For the sake of disclosure, I do go to Costco and buy in bulk - no not 200lbs of chicken wings, but those things that can be divided and/or frozen (like salmon and coffee) - and more crucially things that don't become cheaper in price over time.
That is effectively Marc's argument; don't buy stuff you don't need yet because it will be cheaper in the future (not so with my salmon and coffee, I suggest). That's certainly true as we see a year on year reduction in storage per GB cost.
There are a number of reasons why people buy more than they need;
- New hardware deployment time is excessive due to datacentre restrictions and change control. In some sites this delay could be 3-6 months, so people work on the assumption that it's better to have more on the floor than be in a panic to deploy at the last minute.
- Business customers can't plan. It's a truism that everyone knows. Add on top the fact that chinese whispers inflate the original business requirement to two, three or four times more storage than actually needed.
- Vendors give discounts. Yes, shock! Vendors will sell you storage cheaper if you buy more. I know many places that buy complete arrays up front (even DMX-4 with 1920 drives!) to avoid the deploy time and get a better price.
There are many more reasons than this but you get the idea.
I've deliberately left off one issue - the inflexibility of some storage systems in their deployment method. Although this isn't directly a reason to buy more storage, it is certainly a reason why users hoard more storage on their servers. Monolithic arrays are way too slow at executing on configuration tasks and on dynamic rebalancing, requiring too much planning and thinking time to avoid bad layout and configuration issues.
So Marc, you should have stated that thin provisioning is only one aspect of reducing storage hoarding. Good practice is another. Flexible technology is an undoubted third.
Oh and 10 house points to the first non-UK person who can explain my post title!
Monday, 24 November 2008
Tuesday, 18 November 2008
Claus Mikkelsen has woken up recently and started posting after a large break. Perhaps he's preparing for all those impending Christmas deliveries. Anyway, the crux of his post it to explain how he's moved from 2-4TB of home storage rather than take the time to sort out the mess of his home data. He then goes on to detail lots of clever technology which allows more data to be stored with less.
As I've posted many times before, we're just storing ourselves up a heap of trouble by not addressing the underlying issue here - delete the unwanted data.
We're creating storage landfills which will still need to be sorted out in the future. Like toxic waste in a rubbish dump, losing that critical file will eventually cost dearly.
Think of Claus' problem. Moving from 2-4TB doubles the amount of data that needs to be backed up (how do you back up 2TB of storage at home?), means any restores take longer, means spending more time searching for that file you know you had once, but can't remember what you called it - and if you use an online service for backup means you are paying unnecessarily for each month.
Take my advice, spend the time in developing (a) a good naming standard for your home files (b) a good standard for directories for storing your home files (c) delete the stuff you don't need. Immediately. Period.
Monday, 17 November 2008
Chris Mellor just announced the news that EMC have bundled their Pi and Mozy acquisitions into a single entity, branded as Decho. I was far too slow and Storagezilla beat me to the mandatory EMC post.
So, with Mozy and Pi we now have our data and backups online in the EMC cloud - which conveniently arrived last week as Atmos.
I may have been somewhat overly negative towards EMC in previous posts (they're big boys, I'm sure they can take it), however the layering of cloud storage offerings with Atmos as the foundation (assuming they eat their own dog food and use it) and content/backup over the top does move EMC into a new and interesting market segment in offering storage services rather than just tin (or software for that matter).
Where's the logical conclusion as to where EMC are headed? Is the move to Storage-as-a-Service an implicit acceptance that, over time, hardware will become even more commoditised and that services are the future? In the long term, surely that's the ideal scenario for the end user; all data and services in "the cloud" somewhere with no need to know where/how the data is stored other than service level and performance guarantees. It's not likely to happen in the near future but as a long term trend, it is certainly compelling.
Thursday, 13 November 2008
I feel drawn to post on the details of Atmos and give my opinion whether it is good, bad, innovative or not. However there's one small problem. Normally I comment on things that I've touched - installed/used/configured/broken etc, but Atmos doesn't fit this model so my comments are based on the marketing information EMC have provided to date. Unfortunately the devil is in the detail and without the ability to "kick the tyres", so to speak, my opinions can only be limited and somewhat biased by the information I have. Nevertheless, let's have a go.
From a hardware perspective, there's nothing radical here. Drives are all SATA-II 7.2K 1TB capacity. This is the same as the much maligned IBM/XIV Nextra, which also only offers one drive size (I seem to remember EMC a while back picking this up as an issue with XIV). In terms of density, the highest configuration (WS1-360) offers 360 drives in a single 44U rack. Compare this with Copan which provides up to 896 drives maximum (although you're not restricted to this size).
To quote Storagezilla: "There are no LUNs. There is no RAID. " so exactly how is data stored on disk? What methods are deployed for ensuring data is not lost due to a physical issue? What is the storage overhead of that deployment?
Steve Todd tells us:
"Atmos contains five "built-in" policies that can be attached to content:
- Object de-dup
So, does that mean Atmos is relying on replication of data to another node as a replacement for hardware protection? I would feel mighty uncomfortable to think I needed to wait for data to replicate before I had some form of hardware-based redundancy - even XIV has that. Worse still, do I need to buy at least 2 arrays to guarantee data protection?
When any of these policies are attached to Atmos, COS techniques are used to automatically move the content around the globe to the locations that provide those services."
Front-end connectivity is all IP based, which presumably includes replication too, although there are no details of replication port counts or even IP port counts, other than the indication of 10Gb availability, if required.
One feature quoted on all the literature is Spin Down. Presumably this means spinning down drives to reduce power consumption; but spin down depends on data layout. There are two issues; if you've designed your system for performance, data from a single file may be spread across many spindles. How do you spin down drives when they all potentially contain active data? If you've laid out data on single drives, then you need to move all the inactive data to specific spindles to spin them down - that means putting the active data on a smaller number of spindles - impacting performance and redundancy in the case of a disk failure. The way in which Atmos does its data layout is something you should know - because if Barry is right, then his XIV issue could equally apply to Atmos too.
So to summarise, there's nothing radical in the hardware at all. It's all commodity-type hardware - just big quantities of storage. Obviously this is by design and perhaps it's a good thing as unstructured data doesn't need performance. Certainly as quoted by 'zilla, the aim was to provide large volumes of low cost storage and compared to the competition, Atmos does an average job of that.
This is where things get more interesting and to be fair, the EMC message is that this is a software play. Here are some of the highlights;
To quote 'zilla again:
"There is a unified namespace. Atmos operates not on individual information silos but as a single repository regardless of how many Petabytes containing how many billions of objects are in use spread across whatever number of locations available to who knows how many users."
I've highlighted a few words here because I think this quote is interesting; the implication is that there is no impact on the volume of data or its geographical dispersion. If that's the case (a) how big is this metadata repository (b) how can I replicate it (c) how can I trust that it is concurrent and accurate in each location.
I agree that a unified name space is essential, however there are already plenty of implementations of this technology out there, so what's new with the Atmos version? I would want to really test the premise that EMC can provide a concurrent, consistent name space across the globe without significant performance or capacity impact.
Metadata & Policies
It is true that the major hassle with unstructured data is the ability to manage it using metadata based policies and this feature of Atmos is a good thing. What's not clear to me is where this metadata comes from. I can get plenty of metadata today from my unstructured data; file name, file type, size, creation date, last accessed, file extension and so on. There are plenty of products on the market today which can apply rules and policies based on this metadata, however to do anything useful, then more detailed metadata is needed. Presumably this is what the statement from Steve means: "COS also implies that rich metadata glues everything together". But where does this rich metadata come from? Centera effectively required programming their API and that's where REST/SOAP would come in with Atmos. Unfortunately unless there's a good method for creating the rich metadata, then Atmos is no better than the other unstructured data technology out there. To quote Steve again:
"Rich metadata in the form of policies is the special sauce behind Atmos and is the reason for the creation of a new class of storage system."
Yes, it sure is, but where is this going to come from?
Finally, let's talk again about some of the built-in policies Atmos has:
- Object de-dup
On reflection I may be being a little harse on Atmos, however EMC have stated that Atmos represents a new paradigm in the storage of data. If you make a claim like that, then you need to back it up. So, still to be answered;
- What resiliency is there to cope with component (i.e HDD) failure?
- What is the real throughput for replication between nodes?
- Where is the metadata stored and how is it kept concurrent?
- Where is the rich metadata going to come from?
Oh, and I'd be happy to kick the tyres if the offer was made.
Tuesday, 11 November 2008
Monday, 10 November 2008
Yes, it's almost here folks. The blogosphere tells us so. First of all, there's Chuck Hollis' latest post pondering the issue of how the storage cloud works and why it's really difficult to pick up data quickly in geographically dispersed areas. He leaves us with the cliffhanger;
"The magic isn't in the hardware, it's in the software ..."
So, Hulk/Maui's a software product then...
Next there's StorageZilla, with his viral marketing approach. No technical details here, just cryptic comments relating to trendy cultural references - Dr Who - and some bloke in a space helmet. Clearly I'm not worthy as I didn't understand the second one at all.
This morning we have 'Zilla's full disclosure with his latest post.
All of this is prior to an official annoucement - nothing on EMC's press release site yet.
What's next? So, expect Barry Burke to post a technical assassination of the opposition over at the Storage Anarchist. Then we can have other bloggers putting their spin on it too. I can't be bothered to list them all; I'm sure you know who they are.
But wait - have I not just fallen into the viral marketing trap too by helping out EMC? D'oh, perhaps those folks at Hopkinton Towers are more clever than we think....
Wednesday, 5 November 2008
Tuesday, 4 November 2008
A recent post from Martin "The Bod" Glassborow got me thinking about the whole process of LUN consolidation. I've done lots of migrations where people quake at the thought of changing the LUN size from one array to another. Now, I almost always want to change LUN sizes, as the vendor specific ones - 8.43GB/13.59GB etc are pretty painful and wasteful at the same time.
There's another good reason to standardise on LUNs. If you've implemented a good dual-vendor strategy and sorted your firmware driver stack out, then you can position to take storage from any of your preferred vendors. There's nothing better than having all of your vendors sweating on that next 500TB purchase when they know you take your storage from either or EMC/HDS/HP/IBM.
If LUNs and the I/O stack are all standardised, you can move data around too. The difficult part as alluded to in Martin's post is achieving the restacking of data.
Here's the problem; SAN storage is inherently block based and the underlying hardware has no idea of how you will lay out your data. Have a look at the following diagram. Each LUN from a SAN perspective is divided into blocks and each block has a logical block address. The array just services requests from the host for a block of data and reads/writes it on demand. It is the operating system which determines how the file system should be laid out on the underlying storage. Each volume will have a standard location (or standard method of calculating the location) for what was called the VTOC (Volume Table of Contents), also known as the FAT (File Allocation Table) in DOS and MFT (Master File Table) in NTFS. There are similar constructs for other O/S versions like Linux but I'm not 100% certain of the terminology so won't risk the rath of getting it wrong.
The layout of data on a file system is not a trivial task. Apart from keeping track of files, there's the requirement to keep track of free space and to be able to recreate the file index in the case of corruption, so some kind of journalling is likely to be implemented. There are also features such as compression, Single Instancing, Encryption, etc which all add to the mix of understanding exactly how file data is laid out on disk.
Now think of how multiple LUNs are currently connected together. This will be achieved with either a Volume Manager (like VxVM), supplied as a separate product, or a native LVM (logical volume manager). All of these tools will spread the "logical" volume across multiple LUNs and will format the LUN with information to enable the volume to be recreated if the LUNs are moved to another host. VxVM achieves this by having a private area on each LUN which contains metadata to rebuild the logical volume. Each LUN can be divided into sub-disks and then recombined into a logical volume, as shown in this diagram.
So a physical LUN from an array may contain a whole or partial segment of a host volume, including LVM metadata. Determining what part, whether all the parts are on this array (and where) is a tricky task - and we're expecting that the transmission protocol (i.e. the fabric) can determine all of this information "on the fly" as it were.
My thought would be - why bother with a fabric-based consolidation tool? Products like VxVM provide a wide set of commands for volume migration, although not automated they certainly make the migration task more simple. I've seen some horrendous VxVM implementations, which would require some pretty impressive logic to be developed in order to understand how to deconstruct and reconstruct a volume. However life is not that simple, and host-based migrations aren't always easy to execute on, so potentially a product would be commercially viable, even if the first implementation was an offline version which couldn't cope with host I/O at the same time.
Funny, what's required sounds a bit like a virtualisation product - perhaps the essence of this is already coded in SVC, UVM or Incipient?
Monday, 3 November 2008
"Innovative - featuring new methods or original ideas - creative in thinking" - Oxford English Dictionary of English, 11th edition.
There have been some interesting comments over the weekend, specifically from EMC in regard to this post which I wrote on Benchmarketing started by Barry Burke and followed by Barry Whyte.
"Mark" from EMC points me to this link regarding EMC's pedigree on innovation. Now that's like a red rag to a bull to me and I couldn't help myself going through every entry and summarising them.
There are 114 entries, out of which, I've classified 44 as marketing - for example appointing Joe Tucci (twice) and Mike Ruettgers (twice) and being inducted into the IT Hall of Fame hardly count as innovation! Some 18 entries relate directly Symmetrix, another 18 to acquisition (nnot really innovation if you use the definition above) and another 7 to Clariion (also an acquisition).
From the list, I've picked out a handful I'd classify as innovating.
- 1987 - EMC introduce solid state disks - yes, but hang on, haven't they just claimed to have "invented" Enterprise Flash Drives?
- SRDF & Timefinder - yes I'd agree these are innovative. SRDF still beats the competition today.
- First cached disk array - yes innovation.
Here's the full list taken from the link above. Decide for yourself whether you think these things are innovative or not. Acquisitions in RED, Marketing in GREEN. Oh and if anyone thinks I'm being biased, I'm happy to do the same analysis for IBM, HP, HDS etc. Just point me at their timelines.
- Clariion CX4 - latest drives, thin provisioning?
- Mozy - acquisition
- Flash Drives - 1980's technology.
- DMX4 - SATA II drives and 4Gb/s
- Berkeley Systems etc - acquisition
- EMC Documentum - acquisition
- EMC study on storage growth - not innovation
- EMC floats VMware - acquisition
- EMC & RSA - acquisition
- EMC R&D in China
- EMC Clariion -Ultrascale
- EMC Smarts - acquisition
- Symmetrix DMX3
- Smarts, Rainfinity, Captiva - acquisitions
- EMC - CDP - acquisition
- EMC Clariion - Ultrapoint
- EMC DMX3 - 1PB
- EMC Invista - where is it now?
- EMC Documentum - acquisition
- Clariion AX100 - innovative? incremental product
- Clariion Disk Library (2004) - was anyone already doing this?
- DMX-2 Improvements - incremental change
- EMC VMware - acquisition
- EMC R&D India - not innovative to open an office
- EMC Centera - acquisition - FilePool
- EMC Legato & Documentum - acquisitions
- Clariion ATA and FC drives
- EMC DMX (again)
- EMC ILM - dead
- EMC Imaging System? Never heard of it
- IT Hall of Fame - hardly innovation
- Clariion CX
- Information Solutions Consulting Group - where are they now?
- EMC Centera - acquisition
- Replication Manager & StorageScope - still don't work today.
- Dell/EMC Alliance - marketing not innovation
- ECC/OE - still doesn't work right today.
- Symmetrix Product of the Year - same product again
- Joe Tucci becomes president - marketing
- SAN & NAS into single network - what is this?
- EMC Berkeley study -marketing
- EMC E-lab
- Symmetrix 8000 & Clariion FC4700 - same products again
- EMC/Microsoft alliance - marketing
- EMC stock of the decade - marketing
- Joe Tucci - president and COO - marketing
- EMC & Data General - acquisition
- ControlCenter SRM
- EMC Connectrix - from acquisition
- Software sales rise - how much can be attributed to Symmetrix licences
- Oracle Global Alliance Partner - marketing
- EMC PowerPath
- Symmetrix capacity record
- EMC in 50 highest performing companies - marketing
- EMC multiplatform FC systems
- Timefinder software introduced
- Company named to business week 50 - marketing
- EMC - 3TB in an array!!
- Celerra NAS Gateway
- Oracle selects Symmetrix - marketing
- SAP selects Symmetrix - marketing
- EMC Customer Support Centre Ireland - marketing
- Symmetrix 1 Quadrillion bytes served - McDonalds of the storage world?
- EMC acquires McDATA - acquisition
- EMC tops IBM mainframe storage (Symmetrix)
- Symmetrix 5100 array
- EMC 3000 array
- EMC BusinessWeek top score - marketing
- Egan named Master Entrepreneur - marketing
- EMC 5500 - 1TB array
- EMC joins Fortune 500 - marketing
- SRDF - innovation - yes.
- Customer Council - marketing
- EMC expands Symmetrix
- EMC acquires Epoch Systems - basis for ECC?
- EMC acquires Magna Computer Corporation (AS/400)
- EMC R&D Israel opens - marketing
- Symmetrix 5500 announced
- Harmonix for AS/400?
- EMC ISO9001 certification - marketing
- Mike Ruettgers named president and CEO - marketing
- Symmetrix arrays for Unisys
- Cache tape system for AS/400
- EMC implements product design and simulation system - marketing
- Product lineup for Unisys statement - marketing
- DASD subsystem for AS/400
- EMC MOSAIC:2000 architecture
- EMC introduces Symmetrix
- First storage system warranty protection - marketing
- EMC Falcon with rapid cache
- First solid state disk system for Prime (1989)
- Reuttgers improvement program - marketing
- First DASD alternative to IBM system
- Allegro Orion disk subsystems - both solid state (1988)
- EMC in top 1000 business - marketing
- EMC joins NYSE - marketing
- First cached disk controller - innovation - yes
- Manufacturing expands to Europe - marketing
- EMC increases presence in Europe and APAC - marketing
- Archeion introduced for data archiving to optical (1987)
- More people working on DASD than IBM - marketing
- EMC introduces solid state disks (1987)
- Storage capacity increases - marketing
- EMC doubles in size - marketing
- Product introductions advance computing power - marketing
- HP memory upgrades
- EMC goes public - marketing
- EMC announces 16MB array for VAX
- Memory, storage products boost minicomputer performance
- EMC offers 24 hour support
- Testing improves quality - marketing
- Onsite spares program - marketing
- EMC delivers first product - marketing
- EMC founded - marketing