The Storage Architect

Friday, 11 May 2007

Mine's bigger than yours - do we care?

Our resident storage anarchist has been vigorously defending DMX - here. It's all in response to discussions with Hu regarding whether USP is better than DMX. Or, should I say DMX-3 and Broadway (whoops, I mentioned the unmentionable, you'll have to shoot me now).

I have to say I enjoyed the technical detail of the exchanges and I hope there will be a lot more to come. Any insight into how to make what are very expensive storage subsystems work more effectively has to be a good thing.

But here's the rub. Do we care about how much faster DMX-3 is over USP? I doubt the differences are more than incremental and as I've both installed, configured and provisioned storage on 9980V/USP/8730/8830/DMX/DMX2/DMX3, I think I've enough practical experience to qualify it. (By the way, I loved StorArch's comment about how flexible BIN file changes are now. Well, they may be, but in reality I've found EMC cumbersome to release configuration changes).

Finally I'll get round to the point of this post; most large enterprise subsystems are of the same order of magnitude of performance. However I've yet to see any deployment where performance management is executed to such a degree that, hand on heart, the storage admins there can claim they sequeeze 100% efficient throughput. I'd make an estimate that things probably run 80% efficient, with the major bottlenecks being port layout, backend array layout and host configuration.

So the theoretical bantering on who is more performant than the other is moot; now, EMC, HDS or IBM, come up with a *self tuning* array then you've got a winner...

Wednesday, 9 May 2007

Simulator Update

I managed to get a copy of the Celerra Simulator last week and I've just managed to get it installed. Although it is simple, it is quite specific on requirements - it runs under VMware ACE as a Linux program. It needs an Intel processor and can't run on a machine with VMware already installed. Fortunately my test server fits the bill (once I uninstalled VMware Server). Once up, you administer through a browser.

At this stage that's as far as I've got - however it looks good. More soon.

Port Oversubscription

Following on from snig’s post, I promised a blog on FC switch oversubscription. It’s been on my list for some time and I have discussed it before, however it has also a subject I’ve discussed with clients from a financial perspective and here’s why; most people look at the cost of a fibre channel switch on a per port basis, regardless of the underlying feature/functionality of that port.

Not that long ago, switches from companies such as McDATA (remember them? :-) ) provided full non-blocking architecture. That is, they allowed full 2Gb/s for any and all ports, point to point. As we moved to 4Gb/s, it was clear that Cisco, McDATA and Brocade couldn’t manage (or didn’t want) to deliver full port speed as the port density of blades increased. I suspect there were be issues with ASIC cost and fitting the hardware onto blades and cooling it (although Brocade have just about managed it).

For example, on Generation 2 Cisco 9513 switches, the bandwidth per port module is an aggregate 48Gb/s. This is regardless of the port count (12, 24 or 48), so, although a 48 port blade can (theoretically) have all ports set to 4Gb/s, the ports on average only have 1Gb/s of bandwidth.

However the configuration is more complex; ports are grouped into port groups, 4 groups per blade of 12Gb/s each, putting even more restriction on the ability to use available bandwidth across all ports. Ports can be dedicated or use shared bandwidth within a port group. In a port group of 12 ports, set three to dedicated bandwidth of 4Gb/s and the rest are (literally) unusable. Whilst I worked at a recent client, we challenged this option with Cisco. As a consequence, 3.1(1) of SAN-OS allows the disabling of all restrictions so you can take the risk and set all ports to 4Gb/s and then you’re on your own.

How much should you pay for these ports? What are they actually worth? Should a 48-port line card port cost the same as a 24-port line card port? Or, should they be rated on bandwidth? Some customers choose to use 24-port line cards for storage connections or even the 4-port 10Gb/s cards for ISLs. I think they are pointless. Cisco 9513’s are big beasts; they eat a lot of power and need a lot of cooling. Why wouldn’t you want to cram as many ports into a chassis as possible?

The answer is to look at a new model for port allocation. Move away from the concepts of core-edge and edge-core-edge and mix storage ports and host ports on the same switch and where possible within the same port group. This would minimise the impact of moving off-blade, off-switch or even out of port group.

How much should you pay for these ports? I’d prefer to work out a price per Gb/s. From the prices I’ve seen, that makes Brocade way cheaper than Cisco.

Tuesday, 1 May 2007

Tuning Manager CLI

I've been working with the HiCommand Tuning Manager CLI over the last few days in order to get more performance information on 9900 arrays. Tuning Manager (5.1 in my case) just doesn't let me present data in a format I find useful, and I suppose that's not really surprising as, unless you're going to add a complete reporting engine into the product, then you'll be wanting to get the data out of the HTnM database and build your own bespoke reports.

So I had high hopes for the HTnM CLI, but I was unfortunately disappointed. Yes, I can drag out port, LDEV, subsystem (cache etc) and array group details, however I can only extract one time period of records at a time. I can display all the LDEVs for a specific hour, or a day (if I've been aggregating the data) but I can't specify a date or time range. This means I've had to script extracting and merging the data and the result is, it is sloooow. Really slow. One other really annoying feature - fields that report byte throughput sometimes report as "4.2 KB" sometimes as "5 MB" - which programmer thought a comma delimited output would want a unit suffix?

I'm expecting delivery of HTnM 5.5 (I think 5.5.3 to be specific) this week and here's what I'm hoping to find; (a) the ability to report over date/time range (b) the database schema to be exposed for me to extract data directly. I'm not asking much - nothing much more than other products offer. Oh, and hopefully something considerably faster than now.

Wednesday, 25 April 2007

What's your favorite fruit? EMC versus HDS

Nigel has posted the age old question, which is best EMC or HDS? For those who watch Harry Hill - there's only one way to sort it out - fiiiiight!

But seriously, I have been working with both EMC and HDS for the last 6 years on large scale deployments and you can bet I have my opinion - Nigel, more opinion than I can put on a comment on your site, so forgive me for hijacking your post.

Firstly, the USP and DMX have fundamentally the same architecture. Front end adaptor ports and processors, centralised and replicated cache and disks on back-end directors. All components are connected to each other providing a "shared everything" configuration. Both arrays use hard disk drives from the same manufacturers which have similar performance characteristics. Both offer multiple drive types, the DMX3 including 500GB drives.

From a scalability perspective the (current) USP scales to more front-end ports but can't scale to the capacity of the DMX3. Personally, I think the DMX3 scaling is irrelevant. Who in their right mind would put 2400 drives into a single array (especially with only 64 FC ports)? The USP offers 4GB FC ports, I'm not sure if DMX3 offers that too. The USP scales to 192 ports, the DMX3 only 64 (or 80 if you lose some back-end directors).

The way DMX3 and USP disks are laid out is different. The USP groups disks into array groups depending on the RAID type - for instance a 6+2 RAID group has 8 drives. It's then up to you how you carve out the LUNs - they're completely customisable to your choice of size. Although a configuration file can be loaded (like an EMC binfile) its usually never used and LUNs are user-created through a web interface to the USP SVP called Storage Navigator. LUN numbering is also user configured, so it's possible to carve all LUNs consecutively from the same RAID group - not desirable if you assign LUNs sequentially and put them on the same host. EMC split physical drives into hypers. Hypers are then recombined to create LUNs - two hypers for a RAID1 LUN, 4 hypers for a RAID5 LUN. The hypers are selected from different (and usually opposing) back-end FC loops to provide resiliency and performance. It is possible for users to create LUNs on EMC arrays (using Solutions Enabler), but usually not done. Customers tend to get EMC to create new LUNs via a binfile change which replaces the mapping of LUNs with a new configuration. This can be a pain as it has to go though EMC validation and the configuration has to be locked for new configurations until EMC implement the binfile.

For me, the main difference is how features such as synchronous replication are managed. With EMC, each LUN has a personality even before it is assigned to a host or storage port. This may be a source LUN for SRDF (an R1) or a target LUN (an R2). Replication is defined from LUN to LUN, irrespective of how the LUNs are then assigned out. HDS on the other hand, only allow replication for LUNs to be established once they are presented on a storage port and the pairing is based on the position of the LUN on the port. This isn't easy to manage and I think prone to error.

Now we come to software. EMC wipe the floor with HDS at this point. Solutions Enabler, the tool used to interact with the DMX is slick, simple to operate and (usually) works with a consistent syntax. The logic to ensure replication and point-in-time commands don't conflict or lose data is very good and it takes a certain amount of effort to screw data up. Solutions Enabler is a CLI and so quick to install and a "lite" application. There's a GUI version (SMC) and then the full blown ECC.

HDS's software still leaves a lot to be desired. Tools such as Tuning Manager and Device Manager are still cumbersome. There is CLIEX, which provides some functionality via the command line, but none of it is as slick as EMC. Anyone who uses CCI (especially earlier versions) will know how fraught with danger using CCI commands can be.

For reliability, I can only comment on my experiences. I've found HDS marginally more reliable than EMC, but that's not to say DMX isn't reliable.

Overall, I'd choose HDS for hardware. I can configure it more easily, it scales better, and - as Hu mentions almost weekly, it supports virtualisation (more on that in a moment). If I was dependent on a complex replication configuration, then I'd choose EMC.

One feature I've not mentioned earlier is virtualisation. HDS USP and NSC55 offer the ability to present externally connected arrays and present them as HDS storage. There are lots of benefits for this - migration, cost saving etc. I don't need to list them all. It's true that virtualisation is a great feature but it is *not* free and you have to look at the cost benefit of using it - or beat your HDS salesman up to give you it for free. Another useful HDS feature is partitioning. An array can be partitioned to look like up to 32 separate arrays. Great if you want to segment cache, ports and array groups to isolate for performance or security.

There are lots of other things I could talk about but I think if I go on much further I will start rambling...

Tuesday, 24 April 2007

Optimisation tools

Large disk arrays can suffer from an imbalance of data across their RAID/parity groups. This is inevitable even if you plan your LUN allocation as data profiles change over time and storage is allocated and de-allocated.

So, tools are available. Think of EMC Optimizer, HDS Cruise Control and Volume Migrator.

I've put a poll up on the blog to see what people think - I have my own views and I'll save them until after the vote closes next week.

Goodbye ASNP

It's all over. ASNP is no more. Not really a surprise as it stood for nothing useful. With 2500 members, it could have been so much more, however I think it won't be missed.

Kryptonite Discovered!

Totally off post but fantastic non the less!! Kryptonite Discovered

Monday, 23 April 2007

Hurrah for EMC

Hurrah! EMC has implemented SMI-S v1.2 in ControlCenter and DMX/Clariion (although the reference on the SNIA website seems to relate to SMI-S v1.1). Actually it seems that you need ECC v6.0 (not out yet and likely to be a mother of an upgrade from the current version) and I'd imagine the array support has been achieved using Solutions Enabler.

So quick poll, how many of you out there are using ECC to manage IBM DSxxx or HDS USP arrays? How many of you are using HSSM to manage ECC arrays? How many of you are using IBM TPC to manage anything other than DSxxx arrays??

Simulator Update

Following a few comments on the previous simulator post, it doesn't look like there are any more simulators out there for general use.

If anyone does know - feel free to comment!

Simulator Update

Following a few comments on the previous simulator post, it doesn't look like there are any more simulators out there for general use.

If anyone does know - feel free to comment!

Wednesday, 18 April 2007

The Power Question

I've seen a lot of discussion (and I think a bit of a theme at SNW) on power consumption in datacentres. Obviously the subjects of global warming and increased energy prices have put the subject at the centre of focus. But I think when datacentres are being built, there isn't an issue. The problem comes along as the datacentre fills up with equipment. Invariably, new equipment (especially in the storage world) is coming in denser and requiring more power per rack or square metre. So, as equipment is swapped out and replaced, the original calculations done on how much power per square metre is needed are no longer accurate and the balance tips from one of "have we got the space" for new equipment to one of "can we power the new kit up".

I don't see how this problem will be solved as datacentre planners will always cater for the power/cooling of today's products, not the mythical power demands of future products. Datacentres will therefore have a finite life, after which you may as well start again.

Here's a practical example; There is a manufacturer of highly dense storage arrays (that don't need to be powered up all the time) who can't deploy into a number of London datacentres I know because product density would cause the array to fall to the floor. The datacentres were never designed to take products of that weight...

Tuesday, 17 April 2007

Another Great Idea

I've another great idea for a software product (I have these ideas from time to time, but converting them into reality always proves difficult).

So, museum environments for backups. They're going to be a major headache going forward, even more than they are today as there are more demands on the timely keeping and retrieval of backups. What's needed is a product which can understand legacy backup products and do two things; (a) extract a copy of the backup catalog into a single database, based on a standard schema for backup data and (b) read the content of backup media directly without the need to use the old backup product.

This may seem like backup software companies giving away their IP but I don't think it is. I was in a discussion recently where EMC would not give support on (admittedly very) old versions of Legato, especially with respect to merging catalogs from multiple platforms. This leads to costly and risky options requiring the retention of legacy hardware (subject to failure), legacy software (no longer supported) and legacy media (prone to failure). The lack of a single catalog precludes the ability to easily identify and manage backups and backup images when multiple backup systems exist.

I wonder if any of the vendors out there would be happy to let me have copies and information on the defunct versions of their backup products?

Footnote; I am aware of the backup management products out there like Bocada; to my knowledge none of them actually merge the catalogs into a view at the file level or offer direct media restore.

Storage as a commodity

I just read a comment over at Zerowait regarding Netapp and proprietary hardware. It reminded me of something I was thinking about recently on the commoditisation of storage.

There's nothing worse to my mind than a storage vendor who has no competition. Inevitably in some organisations that situation can exist when a single supplier is chosen to supply (for example) switches, SAN or NAS. The difficulty though is how to avoid that situation. Most vendors would love to lock you into their proprietary tools and relating back to the above article link, Netapp is one I see who try that more than anyone. They have a bewildering array of interlinked product options; once your hooked (especially where you use a feature to retain long term backups via snapshot/vaults) then you're sucked into a dependency on their products which just isn't healthy.

What's the solution? Well, for me I like to commoditise storage functionality. Pick out those features which all vendors support and only use the proprietary features where absolutely necessary. At least then you can maintain multiple vendors all on the hook for your next piece of business.

Of course implementing commoditised storage is more difficult than just picking a few common product features. However as far as your users are concerned, a LUN is a LUN and NAS storage is NAS storage, with a few caveats on things like driver levels for HBAs and so on.

I've previously posted a modular storage comparison sheet. As an example, here are some of the features that almost all support:

RAID 5 protection
Consistent LUN size
dual pathing
Active/Passive failover
remote replication
Fibre Channel presentation
SNMP alerting
online code upgrades
hot swappable components

Before I get lots of comments saying "hold on, not all modular products are the same"; remember I'm not saying that. What I am saying is having a consistent set of requirements allows you to maintain a shortlist of vendors who can all enjoy the healthy competition of bidding for business. So, time to draw up a NAS spreadsheet....

Friday, 13 April 2007

AoE/FCoE/iSCSI

Robin harris discusses AoE from Coraid. I looked at this last year (reminder) as I saw it as a great way to get a FC/iSCSI solution at a low cost. However, before everyone rips out their FC SANs and runs to put an Ethernet solution in place, take one step back and consider the issues. Fibre Channel is successful because it works; because it is reliable. FC switches have features such as non-blocking architecture, QOS, preferred path and so on which help to remove or eliminate throughput or performance issues. Would ATA over Ethernet (or for that matter as it seems to be a topic of the moment) FC over Ethernet provide for that level of switch point to point bandwidth guarantee?

Consider also your monitoring tools. Both Brocade and Cisco offer features to do traffic redirecting (e.g. SPAN ports) to easily analyse SAN traffic without putting TAPs in place. Will AoE and FCoE offer that?

Consider security. Will AoE provide the same level of security as FC?

Without a doubt, you get what you pay for; however you should only pay for what you *need*. If you are running a mission critical application FC is still the best option - interoperability is more widely tested; diagnostic tools are mature; the technology is reliable. I do think there's a place for AoE, iSCSI and FCoE, but use it in the wrong place and what you save in cost, you may pay for later in downtime.

Wednesday, 11 April 2007

Where are all the simulators

I love the Netapp simulator (well, apart from the annoying issues with creating and deleting disks) and I use it all the time. It is great for testing ideas, testing scripting and generally refreshing knowledge on commands before having to touch real equipment. I use it with VMware (as I have probably mentioned before) and I can knock up a new environment in a few minutes by cloning an existing machine. Netapp have got a huge advantage in offering the tool as it enables customers who can't or won't put in test equipment to do work and protect their production environments.

So, where are all the other simulators? Is it just that I don't know they exist or do most vendors not provide them? For the same reasons as I mentioned above, if there were simulators for EMC DMX, HDS USP, Cisco and Brocade/McDATA switches, then there would be a huge opportunity for people to test and develop scripts, test upgrades and other useful work.

Would anyone else like a simulator? Can the vendors tell me why they don't produce them?

Monday, 9 April 2007

Improving efficiency

Hu posted an interesting view of storage utilisation here. His view is that virtualisation on the USP would improve on the average 30% utilisation. I have to say that I disagree with this. There are lots of reasons why storage remains unused in the enterprise:

inefficient administration by Storage Admins, SAs and DBAs (lost/orphan disks etc)
deliberate overallocation by SAs and DBAs (in an attempt to manage change control versus growth)
in process tasks (frame/host migrations, delayed decommissions)
data distribution for performance management
Storage Admin "buffers" to manage growth.

Many of these issues are process driven rather than due to the inadequacies of technology. Now, "true" virtualisation may address some of the problems listed above, especially those technologies which provide for thin provisioning or other overallocation methods. There are plenty of technologies on the market already offering this style of allocation however there are obviously shortcomings with the technology that could hold it back; most notably are performance and reporting.

Performance is seen as a problem due to the overhead of having to work out where blocks have been virtualised to. In addition, I/O must be written to the virtualisation device and rewritten to the physical storage creating a "dual" write scenario. Data distributed across multiple devices may not provide the same performance profile and lead to uneven response times. In fact this scenario already exists in existing enterprise storage. Data is written to cache and destaged at a later time; arrays may contain more than one type of disk size and type; data is mapped across physical disks in a RAID configuration.

Reporting is an issue as most people like to know where their data resides. There are good reasons for this; if hardware fails or if any disaster scenario is invoked then it is important to know what data is pinned where; is it still left in cache? If a RAID group fails, what data did it contain and from which volumes? In addition, overallocation creates its own problems. Predicting how virtual volumes will increase their storage usage is tricky and you most certainly don't want to get caught out refusing I/O write requests for production disks. This issue was very obvious with the early implementation of thin provisioning on Iceberg (a StorageTek storage subsystem) where reporting failed to cope with volumes that had been copied with snapshot.

Now, if HDS were to add thin provisioning to the USP, how good would that be....

Saturday, 7 April 2007

Distributed Backup

Following on from a previous post on RAID and backup, I've been doing some more thinking on how to back up consumer data from a PC workstation. I reckon I've got about over 200GB of data on my server, which previously was on my main workstation. I dabbled with the Linksys NSLU2 however I hated it; I was really nervous (a) about the fact I would lose access to the device (b) it couldn't cope with the volume of files and seemed to lose track of what I had allocated and (c) how I would recover the data from the USB drives I used if the device eventually packed up. In fact, I got rid of the NSLU2 when it did lose track of my data. I was lucky to find a Windows read-only driver capable of reading the NSLU2 format and I got my data back.

Getting back to the question in hand, how would I back up 200GB? I guess I could fork out a few grand for an LTO drive and tapes, but that's not cost effective. I could do disk to disk copy (which I do) but D2D isn't as portable as tape and much more expensive if I intend to maintain multiple copies. I should mention that I've automatically discounted DVD and HD-DVD/Blu-Ray due to lack of capacity and cost (the same applies to the latest optical drives too).

I could use one of the many network backup services on offer. About 10 years ago, I looked at the feasibility of setting up one of these services for the storage company I worked for at the time. It was almost feasible; Freeserve was doing "free" dial-up internet (you paid for just the cost of the calls) and companies such as Energis were selling virtual dial-up modems on very good terms. However the backup model failed as the cost to the customer just didn't stack up due to the length of time to copy files out to the backup service.

I think network backup services *could* be the best answer to safeguarding your PC/workstation data. However the existing services have issues for me; basically I don't trust someone else with my data, which could include bank details, confidential letters and files. Even if I can encrypt my data as it is transmitted to the network backup service, they still have *all* of my data and with enough compute power could crack my encryption key.

If anyone has examples of services which could provide 100% security, I'd be interested to know.

So, here's my idea. a distributed backup service. We all have plenty of free space on those 500GB drives we've installed. Why not distribute your backups amongst other users in a peer to peer fashion? There are two main drawbacks to my mind. First, how can I guarantee my data will always be available for me to access (PCs may be powered off) and secondly, how can I guarantee security?

Existing P2P services work by finding as many servers/PCs as possible which hold the data you want to download. Many may not be online; many may be online and running slow. By locating multiple copies of the required data, then hopefully one or more will be online and available for download.

The same can be applied to backups; split files up and distribute the fragments to P2P servers and index where they are. The fragments would need to be encrypted in some way to guarantee anonymity but make them common to files on both your machine and others. You then maintain an index to rebuild the backup data; all that then needs to be backed up is the index which could easily fit onto something like a CD-ROM. All data could then be recovered using just a small CD index, which could be recreated from anywhere.

There are a lot of unanswered issues; how would data be encrypted; how would the fragments be created and "de-duplicated", how would fragments be distributed across the P2P members to ensure availability? How would the fragments be created to prevent the actual original files from being discovered?

Still, it's only a concept at this stage. But using the internet and all that unused disk space out there could prove a winner.

Wednesday, 4 April 2007

Giving RAID the thumbs up

Just read Robin Harris' post at his new blog location; http://blogs.zdnet.com/storage/?p=116 and his comment on another blog discussing RAID. He quotes a VAR who has tracked disk failures and thinks RAID is an expensive luxury for desktops.

It's interesting to see the failure rates quoted, anywhere from 1-3%, which on the face of it, seems low. However when its *your* disk that has failed and the data is irretrievable, there's cold comfort to be had in failure rate statistics. I run RAID1 on my server; I have two 500GB SATA drives. Backing up that volume of data on a regular basis is a nightware without investing in a very expensive backup solution like LTO and it is a real disappointment to see tape hasn't kept pace with disk in terms of the capacity/cost ratio.

So, I'm sticking with RAID. I augment it with disk-to-disk backups because, yes, you do have to cater for the d'oh factor of user errors or even dodgy software which corrupts files too, but RAID works for me and that's all I need to worry about.

Thursday, 22 March 2007

Uh Oh Domino

It seems that the world is moving to Exchange for email messaging. Unfortunately there are some of us still using Lotus Notes/Domino.

As a messaging product, it seems to me to be reasonably efficient; our Domino servers can support upwards of a thousand users, perhaps 1-2TB of Notes mailboxes. Domino stores the mailboxes as individual files with the .nsf extension. Each of these is opened and held by the nserver.exe task. When using Netbackup with the Notes/Domino agent, the Netbackup client backs up all nsf files on a full backup and the transaction logs and changed nsf files (i.e. those with a new DBID) for an incremental backup. This creates a significant amount of hassle when it comes to performing restores.

A restore is either the full nsf file from the last full backup, or the nsf file plus transaction logs, which are then applied to the nsf file to bring the mailbox up to date. This process is incredibly inefficient because (a) transaction logs contain data for all users and must be scanned for the records relating to the restoring mailbox (b) the transaction logs need to be restored to a temporary file area, which could be considerable (c) the restored logs are disregarded after the restore has completed and so have to be restored again for the next mailbox restore.

So, I’ve been looking at ways to bin Netbackup and improve the backup/restore process. As servers are being rebuilt on Windows 2003 Server, I’ve been looking at VSS (Volume Shadowcopy Services). This is a Windows feature which permits snapshots of file systems to be taken in co-operation with applications and underlying storage. In this instance there isn’t a Lotus Domino provider, so any snapshots taken are dirty (however I did find the dbcache flush command which flushes updates and releases all nsf files). Netapp used to have a product called SnapManager for Lotus Domino which enabled Netapp snapshots of mailboxes using the Domino Backup API. The product has been phased out, as tests performed by Netapp show that dirty snapshots with the security of logs can be used to restore mailboxes successfully. IBM provide trial versions of Domino, so, I’ve downloaded and installed Domino onto one of my test servers under VMware and run the load simulator while taking snapshots with VSS. I’ve also successfully restored a mailbox from a snapshot so there’s no doubting the process works. However my simple task isn’t one of scale. Typical mailboxes are up to 1GB in size and there could be hundreds of active users on a system at any one time. My concern is whether VSS can manage to take snapshots with this level of activity (and not impact the O/S) but also whether the snapshots will be clean or what level of corruption we can expect.

The only way to test this is to implement on a full scale Domino environment and probably with live users. That’s where things could get interesting!

Friday, 16 March 2007

WWN Decoder

As pointed out by Richard, my WWN decoder stopped working when I redid my website. Here's a new link; http://www.brookend.com/html/resources/wwndecoder.asp.

Developing a Tiering Strategy

Implementing a storage tiering strategy is a big thing these days. Everyone should do it. If you don't then you're not a "proper" storage administrator. Being serious and moving away from the hype for a second, there is a lot of sense in implementing tiering. It comes down to 1 thing - cost. If disk and tape storage was free, we'd place all our data on the fastest media. Unfortunately storage isn't free and therefore matching data value to storage tiers is an effective way of saving money.

Choosing the Metrics

In order to create tiers it's necessary to set the metrics that define different tiers of storage. There are many to choose from:

Response time
Throughput
Availability (e.g. 5 9's)
Disk Geometry (73/146/300/500GB)
Disk interconnection (SATA/FC/SCSI)
Usage profile (Serial/Random)
Access Profile (24x7, infrequent)
Data value
Array Type (modular/enteprise)
Protection (RAID levels)

There are easily more, but these give you a flavour of what could be selected. In reality, to determine the metrics to use, you really need to look at what would act as a differentiator in your environment. For example, would it be really necessary to use 15K speed drives rather than 10K? Is availability important - should RAID6 be considered over RAID5? Is there data in the organisation that would exist happily on SATA drives rather than fibre channel? Choosing the metrics is a difficult call to make as it relies on knowing your environment to a high degree.

There are also a number of other options to consider. Tiers may be used to differentiate functionality, for example tiers could be used to specify whether remote replication or point-in-time copies are permitted.

Is It Worth It?

Once you've outlined the tiers to implement, you have to ask a simple question - will people use the storage tiers you've chosen? Tiering only works if you can retain a high usage percentage of the storage you deploy - it's no use deploying 20TB of one tier of storage and only using 10% of it. This is a key factor. There will be a minimum footprint and capacity which must be purchased for each tier and unless you can guarantee that storage will be used, any saving from tiering may be negated by unused resources. Narrow your tiering choices down to those you think are actually practical to implement.

Making the Move

So, the tiers are set, storage has been evaluated and migration targets have been identified. How do you make it worthwhile for your customers to migrate? Again, things come back to cost. Tiers of storage will attract differing costs for the customer and calculating and identifying the cost savings will provide a justification for investing in the migration. In addition, tiers can be introduced as part of a standard technology refresh - a process that regularly happens anyway.

Gotcha!

There are always going to be pitfalls with implementing tiering:

Don't get left with unusable resources. It may be appealing to identify lots of storage which can be pushed to a lower tier. However, if the existing tier of storage is not end-of-life or unless you have customers for it, you could end up with a lot of high tier unused storage which reflects badly on your efficiency targets. Make sure new storage brought in for tiering doesn't impact your overall storage usage efficiency.
Avoid implementing technology specific tiers which may change over time. One example; it is popular to choose to tier by drive size on the assumption that higher capacity drives offer a lower performance and therefore are matched to a lower tier. But what happens when the predominant drive type changes or you buy a new array in which the larger drives perform equally well compared to an older array? How should those tiers be classified?
Be careful when choosing absolute parameters for tiers. For example, it is tempting to quote response time characteristics in tiers. However, no subsystem can guarantee consistent response times. It may be more appropriate to set confidence limits, such as offering "<>

Iterative Process

Developing a tiering strategy is an iterative process which will constantly be refined over time. There's no doubt, that implemented correctly, it will save money. Just don't implement it and forget about it.

Wednesday, 14 March 2007

Standards for Measuring Performance

I had a bit of banter following a post I made on ITToolbox earlier in the week. The question was posed as to why major disk array vendors (other than Netapp) don't produce performance statistics. There are benchmarks; the Standard Performance Evaluation Corporation and Storage Performance Council for instance. However SPEC only covers NAS and neither HDS and EMC sign up for SPC.

Producing a consistent testing standard is extremely difficult. Each storage array vendor has designed their hardware to have unique selling points and more importantly, each vendor can't make their systems too close to their rivals otherwise there'd be more money made by lawyers than the vendors themselves.

Let's pick a few examples. In terms of architecture, DMX (EMC) and USP (HDS) have a similar design. They both have front-end channel adaptor cards, central cache and back-end loops on which (probably identical) hard disk drives are connected. However the way in which the physical storage is carved up is totally different.

HDS uses the concept of an array or parity group; a 6+2 RAID group has 8 disks of the same size which can be carved up into logical units (LUNs). The address at which these LUNs are mapped is up to the user, but typically LUNs will be dispersed across multiple array groups to ensure that consecutive LUNs are not mapped to the same physical disks. This process ensures that data is hitting as many spindles as possible.

EMC chooses another method. Each physical drive is divided into hypers, or logical slices. These are then recombined to make LUNs. RAID 1 LUNs have 2 hypers, RAID5 LUNs have 4 hypers. Each hyper is taken from a different back-end drive loop to improve performance and resiliency.

Now, the comments earlier in the week referred to Netapp. Their architecture comes from a NAS file serving design, which uses RAID 4 and has all the operations for RAID calculations handled in memory. LUNs are carved out of RAID "volumes" or aggregates.

So what is a fair configuration to use? Should it be the same number of spindles? How much cache? How many backend loops and how many front-end ports? Each vendor can make their equipment run at peak performance but choosing a "standard" configuration which sets a level playing field for all vendors is near impossible. Not only that, some manufacturers may claim their equipment scales better with more servers. How would that be included and tested for?

Perhaps rather than doing direct comparisons, vendors should submit standard style configurations, based on a GB capacity on preset LUN sizes against which testing is performed using common I/O profiles. This, with list price would let us make up our own minds.

Either way, any degree of testing won't and shouldn't stop healthy discussion!

Tuesday, 6 March 2007

Software Shortcomings

I was invited last week to speak at the first (of hopefully) many HDS User Forums in the UK. The subject was Storage Resource Management and I talked on the subject and my thoughts for about 30 minutes. One slide generated the most interest and I've included it here. It shows some of the issues current SRM products have and which are not really being addressed by software vendors.

A case in point is the installation of Device Manager I performed this week on a virtual domain. HiCommand Device Manager 5.1 is supported on VMware (2.5) however I couldn't get the software to install at all. I tried the previous version which worked fine, so I was confident the Windows 2003 build was OK. HDS pointed me at a new feature I'd not seen before, Data Execution Prevention which is intended to prevent certain types of virus attacks based on buffer overflows. Whilst this solved the problem, it didn't fill me with a great deal of confidence to think Windows judged HDS's software as a virus. With DEP enabled, the installation got further but still eventually failed. On HDS' advice, re-running the installation again worked.

At the forum, HDS presented their SRM roadmap. If it all comes to fruition then I'll be able to do my provisioning, monitoring and other storage management tasks using my Blackberry whilst sipping Pina Colada's on a Carribbean beach. Back in the real world, my concern is that if the existing tools don't even install cleanly, how am I expected to trust a tool which is moving my disks around dynamically in the background?

It's easy for me to target HDS in this instance but all vendors are equally culpable. I think there's a need for vendors to "walk before they can run". Personally, I'd have more trust in a software tool that was 100% reliable than one which offered me lots of new whizzy features. Vendors, get the basics sorted. Get confidence in your tools and build on that. That way I might get to do provisioning from the beach before I'm too old and grey to enjoy it.

Monday, 5 March 2007

Backing Up Branches

I've spent the last few weeks being heavily involved in a storage performance issue. Unfortunately I can't discuss the details as they're too sensitive (in fact, I can't even mention the TLA vendor in question), however it did make me think more about how we validate a storage architecture design will actually work.

As part of another piece of work I'm doing, I have been looking at getting branch or remote office data into core datacentre infrastructure, the premise being not to write tapes in remote locations and therefore be at risk of compromising personal information.

I've looked at a number of solutions, including the Symantec/Veritas PureDisk option (the flashy flash demo can be found here). What strikes me with this and other solutions is the lack of focus on data restores and meeting RTO and RPOs for applications that may run in branch locations. In fact, I think most products are aimed at the remote office which runs a small Windows implementation and therefore isn't worried too much about recovery time of their data.

The technologies such as PureDisk work by reducing the data flowing into the core datacentres, either by de-duplication, compression or both technologies. This is fine if (a) your data profile meets their view of compression (b) your applications/data produce a small percentage of changed data on a daily basis. Unfortunately for some applications this just doesn't occur. In addition, if you lose the site (or even easier, corrupt or lose the local metadata database) then you're looking at a total restore from core to recover an entire server. These solutions work on the basis that branch offices have low WAN bandwidth and the backup tools utilise this by making a small pipe appear as a much larger pipe through the aforementioned de-dupe and compression. This doesn't help if a full restore needs to be performed.

Some people may say "but you could manage the restore by sending a tape to the site containing the last backup" - however that defeats the original objective of keeping tapes out of the branch.
I'd like to see some real-world examples from Symantec/Veritas showing how long backup/restores would take with their product. Of course they're not going to provide that as it would immediately highlight the issues involved. Perhaps we need an independent backup testing lab to do that comparison for all backup products on the market.

Friday, 16 February 2007

iSCSI Part 3 (or is it part 4)

I've done the last part of my iSCSI evaluation. Previously I looked at the protocol itself and the ways to secure data. Transmission needs to be secured by either IPsec for a shared network or a dedicated network infrastructure. The last part of the jigsaw I checked out was how to validate the client; basic iSCSI authentication uses simply the name of the iSCSI initiator, which is nothing more than a plain-text field.

The standard iSCSI method is to use CHAP. This requires both the client and the target to provide a username and password for login to each other. I tested it out on my Netapp Simulator/Windows environment and of course it works. What I'm not sure about is how effective this is as a security method. It is necessary to retain password information for both client and target and store it elsewhere; there's no 3rd party authentication authority. Perhaps I'm being a little paranoid.

So there it is. I now know iSCSI. I have to say I like it. It's simple. It works. It is easy to implement. Security could be better, and could certainly be made more easy to manage, but perhaps that is related to the two implementations I've used (i.e. Netapp and Windows).

So what are the iSCSI best practices? I'd say:

Implement CHAP for client/target security.
Implement IPsec to encrypt your iSCSI traffic.
Place your devices on a dedicated network.
Use dedicated network cards where possible.

I hope to do a practical implementation of iSCSI soon. I can really see it as a practical alternative to fibre channel.

Thursday, 15 February 2007

Is ASNP another SNIA?

I received this email this week for the very first ASNP UK Chapter meeting (ASNP is a US based user association for storage people):

<------ START ------>
Dear ASNP Chapter Members of the UK,

Due to low registration numbers, we have decided to cancel the February 23rd ASNP Chapter meeting which was to be sponsored by QLogic.

My apologies for any inconvenience this may cause.

Meanwhile, we are working on a quick survey that we hope you will answer as we’re hoping to find out what type of meetings you’d like to be a part of going forward.

Please keep your eyes open for that survey and let us know!

<----- END----->

Until November I was UK Chair. I found it difficult to work out what ASNP was for; I couldn't see it collectively lobbying storage software/hardware vendors with issues or for new features, in fact all I could see was an organisation which occasionally offered vendor sponsored "training" or information sessions. I even attended the very first ASNP Summit in Long Beach in June 2004; despite registering, when I got there, no-one could find my registration details; everything was extremely US-focused.

If an organisation exists to further the interests of its members, then there should be a clear statement of what those aims are; otherwise why bother to exist at all?

Will ASNP continue? Probably. I notice though that they still don't charge for membership. Unless they start having more to offer, I can't see how they ever can.

Tuesday, 13 February 2007

Long term data retention

I spent some time earlier this week talking to COPAN. They produce high density storage systems, but not the sort of arrays you'd use for traditional data storage. Their product is pitched at the long term persistent storage market.

I'm sure you can read the website if you're interested however, I hadn't really thought through what this kind of technology could deliver. There are some fundamental issues the storage of "persistent" data needs to be sorted. For instance; how do you validate the data on your disk will still be there when you come to read it 12 months later? (disk Aerobics is the answer apparently; regular validation of disk content).

So, the target market for COPAN is long term data archive. They want you to keep your data on disk. Personally, I think if the price is right, then backup data on disk is a sensible proposition. Today's network connectivity and encryption technology means data doesn't need to be physically moved any longer. In fact, I'd suggest that removing the need to physically move data is the way forward. Disk-based data is inherently more reliable and accessible. COPAN (and others) have plenty of features that can make disk-based backup work.

Don't move the media. Just move the data. Sounds like a good strapline.

Thursday, 8 February 2007

Snow!

Totally off-subject but it snowed today in the UK. Most people hate it, I love it. We had this snowman built before breakfast.....

Wednesday, 7 February 2007

Write Acceleration

Reading Sangod's recent post on write acceleration, I couldn't help writing a response, as I've been looking at this whole subject recently.

First of all, I don't disagree with the concept of synchronous. Yes, the I/O must be confirmed (key word there) at the remote and local site before the host is given acknowledgement of the I/O being complete. Typically, enterprise arrays will cache a user I/O, issue a write to the remote array (which will also be cached), acknowledge the I/O to the host and destage at some later stage.

Sangod mentioned two techniques; cranking up buffer credits and acknowledgement spoofing. Buffer credits are the standard way in which the fibre channel protocol manages flow control. As each FC device passes data (for example an HBA to a switch) then the device can keep sending packets of data whilst the receiving device issues the R_RDY signal back. Buffer credits are essential as FC packets take a finite amount of time to travel a fibre optic cable. The longer the distance between devices, then the more packets which must be "on the line" in order to fully utilise the link. The rule of thumb is 1 buffer credit for each km of distance on a 2Gb/s connection. So if you don't have enough buffer credits, then you don't make best use of the bandwidth you have available. Having lots of data in transit does not compromise integrity as nothing has been confirmed to the originating device.

Moving on, there are devices which can perform write acceleration by reducing the SCSI overhead by acknowledgement spoofing. I've taken the liberty of loaning a couple of graphics from Cisco to explain how a SCSI transaction works.

When a SCSI initiator (source device) starts a write command, it issues a FCP_CMND_WRT to the target device. This is confirmed by the target with an FCP_XFER_RDY. The initiator then issues data transfer (FCP_DATA) repeatedly until all the data is transferred. The target confirms successful receipt of all of the data with FCP_RSP "Status Good". The initial preamble as the write request is started can be reduced by the switch connected to the initiator issuing an immediate FCP_XFER_RDY, allowing the initiator to immediately start sending data. The data transfer and the FCP_CMND_WRT operate in parallel, saving the time of this part of the exchange. No integrity is risked as nothing is confirmed by the target until all data is received.

I see no issue with this kind of spoofing as the initiator is not being told about the completion of the I/O before it has actually happened. What I do see as a concern is that the target may not be able to accept the data and therefore if the write fails, the source needs to be able to roll back if that happens.

In terms of getting best performance, multiple replication groups (e.g. multiple RA groups) would make best use of any WA technology. Cisco publish stats which show that. So WA could be good and safe.

Wednesday, 31 January 2007

Storage protocols for VMware

I've been doing more VMware work recently. The deployment I'm working on is using SAN presented disk. The storage started as 50GB LUNs, quickly grew to 100GB and now we're deploying on 200GB LUNs, using VMFS and placing multiple VM guests on each meta volume.

Now, this presents a number of problems. Firstly, it was clear the LUN sizes weren't big enough in the first place. Second, migrating guests to larger LUNs had to be an offline process; present the new LUNs, shutdown the guest, clone the guest, restart the guest, blow the old guest away. A time intensive process, especially if it has to be repeated regularly.

Using FC presented LUNs/metas also presents another problem; if we choose to use remote replication (TrueCopy/SRDF) to provide DR failover then all the VM guests on a meta have to go on that failover too. This may not (almost certainly not!) be practical.

Add in the issue with lack of true active/active multipathing and restrictions on the number of LUNs presentable to an ESX server and FC LUNs don't seem that compelling.

The options are to consider iSCSI or store data on CIFS/NFS. I'm not keen on the CIFS/NFS option, iSCSI seems more attractive. It pushes the storage management away from the ESX Server and onto the VM guest; security is managed at the array level, rather than within ESX. Personally I think this is preferable let ESX (system) administrators do their job etc etc. One last benefit; I can present as many iSCSI LUNs as I like of whatever size. It means I can also stripe multiple LUNs; something I'm unlikely to do on VMFS presented devices.

Therefore I think iSCSI could be a great option. Then I thought of one curve ball; what if I could do thin provisioning on FC? Here's the benefit. Imagine creating 20 VM guests on a server, all running Win2K3. Standard deployment is 10GB for the root/boot disk but I'm only actually using about 5. The remainder is left to allow for maintenance/patching/temporary space (we don't want to have to rebuild servers) - applications and data go on separate volumes. I'll use a 200GB meta. Unfortunately it's 50% wasted. But bring in thin provisioning and I can allocate 10GB drives with impunity. I can allocate 20 or 30 or 40! FC is back on the menu. Incidentally, I'm more than aware that iSCSI devices can already be presented thin provisioned.

Lots of people tell me, why bother with thin provisioning. I think in VMware I've found a perfect usage.

Monday, 29 January 2007

Buy Buy Buy

More acquisition news; Brocade and McDATA are finally just Brocade after the purchase completes. There's a snazzy new logo

which I can't decide is like a pair of red angel wings, or some convoluted join of the M and B from the two companies. Personally, I preferred the old logo depicting a fabric, it worked for me. I even had a lovely "personal" email from Michael Klayko (as I'm sure lots of other people did).

Anyway, moving on, what's more interesting is IBM bought SoftTek. Now they have a great product, TDMF, a piece of software that really made me go "wow!" when I first saw it in action. It literally is a transparent data mover. With the mainframe version (I was a proper storage admin when I started out on the mainframe) you could sync up many volumes in real time and perform an instant swap on them, allowing migration of TB of storage in seconds. I used the product to migrate data from an old storage array to new without any user impact or input at all. I hope IBM uses its new jewel well.

Sunday, 28 January 2007

iSCSI Security Part 2

To tie down my iSCSI test environment I've implemented IPsec between client and server. This allows me to encrypt either the traffic or headers of my IP stream. I chose ESP (Encapsulating Security Payload) as this gives full confidentiality to my data, rather than Authentication Header (AH) which provides integrity as to the source of the data.

Implementation on my Netapp Simulator and Windows client was easy; the ipsec command, a file setting on the filer and configuration of the IPsec settings through MMC on Windows. Once I'd dropped and relogged in the iSCSI targets, I did some testing.

Now before I go further, let me stress that this testing was only slightly scientific. Everything I'm using is virtual and on the same physical machine with one hard disk. The figures I'm quoting are indicative of the performance I received and not directly comparable with real-world systems, although you'll get the idea.

So I tested with HDTach and no IPsec enabled. Response time for a standard test was about 0.2ms and throughput ranged from 2-32MB/s over the course of the test (an average of around 17MB/s). With IPsec enabled, response time doubled to 0.4ms and throughput dropped significantly to 2-7MB/s with an average of 4.2MB/s. I repeated the tests a number of times with IPsec on and off and saw consistent results.

Just to be sure, I checked for other bottlenecks in my system. The virtualising PC was not bottlenecked, neither the filer or client. I can therefore put the performance change down to simply "more work to do".

Unsurprisingly, I couldn't check the IP packets with IPsec enabled. Although this is secure, it presents issues when diagnosing problems. I don't know whether tools exist to enable encrypted streams to be analysed, but Ethereal couldn't do it. My IPsec implementation was also rather simple - just a shared key for encrypting the traffic. If I was implementing this for real, I'd be discussing with the Network guys the best way to implement security.

Friday, 26 January 2007

Is it me?

..or do some people not like comments. Here's another iSCSI post from Marc Farley at Equallogic. I think he's saying iSCSI is good. I agree - good in the right circumstance.

Marc - either tell me how to or enable comments - please!!

iSCSI Security Part 1

I've done some more work on iSCSI. That meant rebuilding one of my iSCSI client environments as I totalled it by trying to add another drive to the VM guest. Anyway, panic over. I've got the client working again and I've been running traces to see how easy it is to locate relevant data in an unprotected iSCSI stream.

I mentioned before that I am using iSCSI on Win2K3 talking to a Netapp filer (simulator) with Ethereal on the host Win2K3 box to monitor I/O activity. I tried today to write to a test file with Ethereal running and tracing I/O activity. The tool is splendid in its interpretation of the data it sees, formatting the packets on the basis of the content. My save of a file containing some test data yielded some interesting results.

Firstly, I found Ethereal detected the logfile entries for NTFS being saved before my actual file. These are RCRD and RSTR records which NTFS uses to recover the filesystem should anything untoward happen before the data is actually committed to disk.

After this, I tracked the MFT entries being written. These are the actual file saves which contain my data. Non-encrypted I can see the content. Using Ethereal and a tool to map NTFS records, I could easily spy on data being stored on iSCSI volumes. Here are some screenshots:

The first shows the MFT entry that NFS stores for the file save. I used Winhex to examine the blocks on disk. Note the content of the file "The Quick Red.......", plus the file name "TestFile1.txt" in unicode, so using double characters. Contrast this to the Ethereal output which was scanning the network. This shows the same data captured by scanning the network. Ethereal is generous enough to

even format the data to allow me to locate the iSCSI data. This is no good for any organisation which must offer data security. Part II will discuss what can be done.

iSCSI Nonsense

I've just read Mark Lewis's latest blog entry. Unfortunately he doesn't seem to permit comments so I'll just have to take issue here.

For those who haven't read it, he talks about the recent talking up of iSCSI.

So, I have done that, so have others. He misses a huge point. IT is driven by technologists. The world is driven by technologists. Let me cite an example. Who knows about the standards relating to television broadcasts? - to be fair, who cares. Well, the people who care are those who want higher and better resolutions, the early adopters, the technologists. They say and so we all want 1080p (well done if you know what that means) because the technologists tell us that it is the best.

So, we all want to consider iSCSI because it will bring benefit to our businesses - the technologists tell us so. Absolutely no-one with an ounce of sense would say that iSCSI is going to kick FC's ass. On the contrary, what's being said is that iSCSI complements FC to a degree that customer choice will be the deciding factor.

iSCSI is maturing as a technology to complement existing offerings. It is a lower level technology play. SOA sits far above the protocol layer and is another discussion entirely. Whatever happens, something has to move the SOA data - I'd bet that either iSCSI or FC is going to be doing that for a long time going forward.

Thursday, 25 January 2007

Brocade/McDATA Merge Approved

The Brocade purchase of McDATA has been approved by both shareholders. The expected completion of the merger is 29 January.

I'm interested to see the merged product lines and how BrocDATA intends to support both product sets (especially the director class devices). There will be a lot of customers out there looking to see what bridging technology the merged company will produce and how the roadmap will look.

Whatever happens, Mc-cade needs to come out with something quickly or Cisco will be in and mercilessly stealing their market share.

Thursday, 18 January 2007

A confession

Nigel commented on my slow provisioning post. I have to agree that Device Manager has its faults. I'm doing some work on Device and Tuning Manager in the coming weeks as I'm installing both on VMware guests. So, I should have some performance feedback.

The comment highlights how software still has a long way to go.

A few (well quite a few) years ago I needed to purchase a storage management product for implementing storage quotas on the mainframe. For those who remember DFSMS, it was just at the time SMS as a product was being implemented on MVS/ESA.

So, I asked my boss if I could purchase the software. He said I needed to a competitive analysis with other products and explain why my choice was the best. Unfortunately there were no other equivalent products out there, so I pondered my options and decided to invent one.

Based on a character at the time, I called the product "Loads-a-space" after a character called "Loadsamoney" - see the picture. Loadsamoney was a crass plasterer who always boasted how much money he made, it was the 80's boom time. Loadsamoney was a creation of Harry Enfield, so I called my ficticious software house Enfield Software.

I added Loads-a-space from Enfield Software to my proposal as the inferior product, got my approval and my purchase. It helped that my boss was Canadian and didn't know much about British Comedy.

OK, now I've got that off my chest, Nigel, I'd be interested to know how big your Device Manager deployment is and what you run it on, just to do some kind of comparison

Tuesday, 16 January 2007

iSCSI Continued (2)

After my previous post on iSCSI testing I promised some more detail. So my test environment is based on the Netapp Simulator version 7.2.1, which you can download if you're a Netapp customer - not sure if it's available to non-customers (it should be as it is a great marketing tool) but I guess if you want to find out you could ask Dave.

Netapp filers export LUNs as iSCSI devices. As far as I can tell, the LUN implementation on ONTAP is effectively a qtree (i.e. a share) based on the way I created it. Anyway, once created, I associated the LUN with an initiator group and the initiator group has the access associated with it (hope you're all following this).

The screenshot here shows the output from the LUN show command and igroup show command which list the LUNs I created and their association. You can see the igroup can be used to provide access to a number of servers based on the iqn, iSCSI Qualified Name, which is used to reference an iSCSI target or initiator device. The iqn seems to be a "gentlemens agreement" format, based on the reverse DNS of the server on which the iSCSI device resides, plus the date and month that domain was registered. In this case I registered the test server I have the iSCSI initator on and I got "iqn.1991-05.com.microsoft" followed by the specific server identifier of "vmware2.vmware.brookend.com", the name of my server itself.

My first inclination (being a hacker of old) was to spoof this, so I configured another server (XP this time) with the iSCSI initiator and changed its iqn. Voila I can access the same disks, albeit from another IP address. Being iSCSI and block data, the filer simulator didn't care about multiple access (which is fine) and I spent some time trying to break the shared LUN writing and reading data from both sources.

Needless to say this example highlighted how simple security does (or doesn't) work. I know CHAP authentication is available and I suspect there are many more security options that I need to investigate, so that's my next thing. Getting standards and security right I think will be more important than making sure the network performs.

More details to come.

Slow Provisioning

Poor provisioning tools annoy me. I've been annoyed today. I've been changing some VMware metas from 100GB to 200GB on a DMX. Unfortunately they were already presented (but not used) and replicated with SRDF. So I had to:

"Not Ready" the R1 and R2 drives
Unmask the LUNs from the FA
Split the SRDF relationship
Break the SRDF relationship
Unmap the LUNs from their FAs
Dissolve the metas
Create the metas
Re-establish and resync SRDF
Map the LUNs to the FA
Mask the LUNs to the hosts

10 steps which take some considerable time to write validate and execute. I don't do this stuff often enough to justify writing scripts to help me out; but I think this should be a vendor thing - a software tool with various configure options that creates the symconfigure and associated commands for you and indicates the steps you will have to perform. ECC is *supposed* to do it but it doesn't. Roll on some good software.

Sunday, 14 January 2007

more about iSCSI

I mentioned as a "Storage Resolution" to look more in-depth at iSCSI. Well I've started doing just that today.

The first thing I thought I needed was a working environment. I'm not keen on investing in an entire storage array (at this stage) to do the testing (unless some *very* generous vendor out there wants to let me "loan" one) so I've build a virtual environment based on a number of free components.

I've a dedicated VMware testing machine recently built which has a dual Core Intel processor, 2GB of RAM and a SATA drive. Nice and simple. It runs Win2K3 with the free VMware server, onto which I've created another Win2K3 R2 partition and a Linux partition running Fedora Core 6. This is where my iSCSI "target" will sit.

For those unfamiliar with SCSI terminology, the source disk or disk system presents LUNs which are referred to as targets. The host accessing those LUNs is the initiator; simply put the host initiates a connection to a target device, hence the names. My iSCSI target in this instance is a copy of the Netapp simulator running on Linux.

Most people are probably aware of the simulator. If not, Dave Hitz talks about it here. I've created a number of disks into an OnTAP volume and out of that created a LUN. LUNs can be presented out as FC or iSCSI, in this instance I've presented it out as iSCSI.

By default the simulator doesn't enable iSCSI so I enabled it with the standard settings. This means my target's iSCSI address is all based on Netapp defaults. I'm going to work on what the best practices should be for these settings over the coming days. Anyway, I've presented 2 LUNs and numbered them LUN 4 and LUN 9.

At the initiator (host) end, I've used my Win2K3 Server and installed the iSCSI initiator software from Microsoft. This gave me a desktop icon to configure the settings. Again, I've ended up with the default names for my iSCSI initiator, but that doesn't matter; all I had to do was specify in the iSCSI initiator settings the IP address of my target, log on and it finds the LUNs (oh, one small point, I had to authorise the initiator on the simulator). Voila, I now have 2 disks configured to my Windows host which can be formatted as standard LUNs.

As a performance test, I ran HdTach against the iSCSI LUNs on Win2k3. I got a respectable 45MB/s throughput, which isn't bad bearing in mind this environment is all virtual on the same physical machine.

All the above sounds a bit complicated, so I'll break it down over the coming days as to what I had to do; I'll also explain the iSCSI settings I needed to make and my experiments with dual pathing and taking the iSCSI devices away from Windows in mid-operation.

Thursday, 11 January 2007

WWN Decoding

The WWN Decoder page on my main site is updated. It's at http://www.brookend.com/html/main/wwndecoder.asp

If I've done it correctly, it now handles EMC up to DMX3 and HDS USP/NSC/AMS and also indicates the model type of the discovered device.

If anyone has samples from their arrays which they are willing to share, let me know and I'll validate them to see if it helps amend the decoder.

Friday, 5 January 2007

Manic Miner and Storage Resource Management

I tried a bit of nostalgia the other day. From a "freebie" CD-ROM I installed a games emulator for the ZX Spectrum, a personal computer that was hugely popular in the '80s. The game I installed was called Manic Miner, one of the original platform games. At the time (1983) it was a classic and (shamefully) I even hacked the copy I had to remove the protection (you had to load a 4 or 6 digit code from a sheet of blue paper, which couldn't be photocopied). When my children saw the game, they fell about laughing, not surprising when you compare it to their latest play, Star Wars Battlefront.

It made me think how things have changed in 20 years; from 32x24 graphics to 1280x1024 with advanced polygon shading etc. What has this to do with storage? Well, I ponder on what will happen to Storage Resource Management in the next 20 years.

I think what we'll see is artificial intelligence-based software managing our data. The software will proactively fix hardware faults, relocate data based on our usage/value policies, provide CDP and CDR, deliver optimum performance and make all storage administrators obsolete.

Er, well all except the last one; yes I do think the worries we have about SRM tools will be resolved, however I think with the growth in capacity, complexity and features of todays storage, that Storage Administrators will be needed for a long time to come.

Tower of Tera

Lots of talk today about the 1 terabyte drive from Hitachi. In fact the drive is more likely to be about 931GB based on the dubious practice of using decimal 1000's rather than binary (whilst we're on that subject, the concept of decimal versus binary does annoy me - what with that and overhead, on some of the AMS's I've installed, a 300GB drive comes out as 267GB).

So, yes, I want a 1TB drive - no idea what I want to put on it, or how I'll back it up - but I want one.

Thursday, 4 January 2007

Hybrid Storage Alliance

Fujitsu, Hitachi, Toshiba, Samsung and Seagate are getting together as the Hybrid Storage Alliance to promote the use of HDDs with large amounts of additional cache. They've set up a (not so snazzy) website at www.hybridstorage.org (check out the builder on the features page - when did you last see a builder using a laptop, never mind understanding what a hard disk is).

It's good news. I love the idea and have mentioned my thoughts before. I also think that onboard cache provides more options to develop the successor to RAID, although I'm still thinking how it could be done.

Sandisk also announced the device shown on the right - a solid state HDD. After 50 years, the hard disk is seeing some exciting changes.

Full details of the Sandisk announcement at http://www.sandisk.com/Corporate/PressRoom/PressReleases/PressRelease.aspx?ID=3654.

Wednesday, 3 January 2007

My Favourite De-Duplication Technology

Here's one of my favourite websites; www.shazam.com. In fact it isn't the website that is the favourite thing, it's what Shazam do. In the UK (apologies for US readers, I don't know your number), dialling 2580 down the middle of your 'phone and holding your mobile up to a music source for 30 seconds will give you a text back with the track title and the artist. Seeing this for the first time is amazing; as long as the track is reasonably clear, any 30 second clip will usually work. I've astounded (and bored) dozens of friends and it only costs me 50p each time.

So, Shazam got me thinking. How can they track the almost millions of music tracks in existence today and match this to a random clip of music I provide over a tinny link from my mobile phone? The most obvious issues are those of quality; I've almost only ever used the service in a bar with a lot of background noise (mostly druken colleagues). That aside, I tried to see how they could have indexed all the tracks and still allowed me to provide a random piece of the track against which they match.

I started thinking about pattern matching and data de-duplication as it exists today. Most de-dupe technology seems to rely on identifying common patterns within data and indexing those against a generated "hash" code which (hopefully) uniquely references that piece of data. With suitable data containing lots of similar content then a lot of duplication can be removed from storage and referenced as pointers. Good examples would be backups of email data (where either the user or a group of users share the same content) and database backups where only a small percentage of the database has changed. The clever de-dupe technology would be able to identify variable length patterns and determine variable start positions (i.e. byte level granularity) when indexing content. This would be extremely important where database compression re-aligns data on non-uniform boundaries.

Now this is where I failed to understand how Shazam could work; OK, so the source content could be de-duped and indexed, but how could they determine where my sample occurred in the music track? A simple internet search located the following presentation from the guy (Avery Wang) who developed the technology. The detail is here http://ismir2003.ismir.net/presentations/Wang.PDF. The de-dupe process actually generates a fingerprint for each track, highlighting specific unique spectogram peaks in the sounds of the music, then uses a number of these to generate hash tokens via a technique called "combinatorial hashing". This uniquely identifies a track, but also provides the answer as to how any clip can be used to identify a track; the relative offsets of each hash token is used to identify the track, so the absolute offset of the sample isn't important.

Anyway, enough of the techie talk, try Shazam - amaze your friends!

Tuesday, 2 January 2007

Storage Resolutions

The new year is here. Everyone loves to make resolutions to say how they are going to improve their lives. Personally I think it is nonsense; if you want to change, then you can do it any time rather than the abitrary time of new year.

Anyway, enough of my humbug. Here's a few storage resolutions I hope to maintain:

iSCSI - I haven't paid enough attention to this. I think iSCSI is due to hit its tipping point this year and get much more widespread adoption.
WAFS - Wide Area File Systems interest me. Any opportunity to reduce the volume of data being moved across networks while centralising the gold copy strikes me as a sensible idea.
CDP - There are some interesting products around providing continuous data protection. They aren't scalable yet, but when they are I can see them being big.
NAS Virtualisation - OK, I know how it works and what the products are; I just need to get into more detail.
CAS - I've always seen CAS as pointless. It's time to give it a second chance.

So that's the technology side covered. What about process?

ILM - I think it is time to harp on about proper ILM - i.e. that which is integrated into the application rather than the poor efforts we've seen to date. I think application development needs to be addressed to cover this.
SRM - how about some proper tools which actually do the job of managing the (large scale) process of storage deployment? More thought required here.
Cost Management - I believe there are lots of options for managing and reducing cost, I should expound on them more.
Technology Refresh - Always a problem and certainly needs more thought for mature datacentres.

Hmm, funny how each year's resolutions end up sounding just like the ones the year before?

Netapp 1 EMC 0

My RSS reader just picked up a lovely report from Netapp countering an EMC report showing that with MS-Exchange workloads, EMC was better than Netapp (CX3-40 v 3050). It just shows how when vendors are challenged to defend their products, they can make them work much better than the "standard" configuration. I can't help thinking it would be better if these products did this without needing the vendor to do a lot of configuration work.

Original report here.... http://www.netapp.com/library/tr/3521.pdf

Thursday, 21 December 2006

Understanding Statistics

I've been reading a few IDC press releases today. The most interesting (if any ever are) was that relating to Q3 2006 revenue figures for the top vendors. It goes like this:

Top 5 Vendors, Worldwide External Disk Storage Systems Factory Revenue 3Q2006 (millions)

EMC: $927 (21.4%)
HP: $760 (17.6%)
IBM: $591 (13.7%)
Dell: $347 (8.0%)
Hitachi: $340 (7.9%)

So EMC comes out on top, followed by the other usual suspects. EMC gained market share from all other vendors including "The Others" (makes me think of Lost - who are those "others"?). However, IDC also quote the following:

Top 5 Vendors, Worldwide Total Disk Storage Systems Factory Revenue 3Q2006 (millions)

HP: $1406 (22.7%)
IBM: $1250 (20.2%)
EMC: $927 (15.0%)
Dell: $507 (8.2%)
Hitachi: $348 (5.6%)

So what does this mean? IDC defines a Disk Storage System as at least 3 disk drives and the associated cables etc, to connect them to a server. This could mean 3 disks with a RAID controller in a server. Clearly EMC don't ship anything other than external disks as their figures are the same in each list. HP make only 50% of their disk revenue from external systems, the rest presumably are disks shipped with servers, IBM even less as a percentage, Dell about $160m. The intruiging one was Hitachi - what Disk Storage Systems to they sell (other than external) which made $8m of revenue? The source of my data can be found here: http://www.idc.com/getdoc.jsp?containerId=prUS20457106

What does this tell me? It says there's a hell of a lot of DAS/JBOD stuff still being shipped out there - about 30% of total revenue.

Now, if EMC were to buy Dell or the other way around, between them they could (just) pip HP to the post. Are EMC and Dell merging? I don't know, but I don't mind starting a rumour...

Oh, another intesting IDC article I found referred to how big storage virtualisation is about to come. I've been saying the word "virtualisation" for about 12 months in meetings just to get people to even listen to the concept, even when the meeting has nothing to do with the subject. I'm starting to feel vindicated.

Wednesday, 20 December 2006

Modular Storage Products

I’ve read a lot of posts recently on various storage related websites asking for comparisons of modular storage products. By that I’m referring to “dual controller architecture” products such as the HDS AMS, EMC Clariion and HP EVA. The questions come up time and time again, usually comparing IBM to EMC or HP and little comparison to HDS, but lots of people recommending HDS.

So, to be more objective, I’ve started compiling a features comparison of the various models from HDS, IBM, EMC and HP. Before anyone starts, I know there are other vendors out there – 3PAR, Pillar and others come to mind. At some stage, I’ll drag in some comparisons to them too, but to begin with this is simply the “big boys”. The spreadsheet attached is my first attempt. It has a few gaps where I couldn’t determine the comparable data, mainly on whether iSCSI or NAS is a supported option and the obvious problem of performance throughput.

So, from a simple physical perspective, these arrays are pretty simple to compare. EMC and HDS give the highest disk capacity options, EMC, HDS and IBM offer the same maximum levels of cache. Only HDS offers RAID6 (at the moment), most vendors offer a range of disk drives and speeds. Most products offer 4Gb/s front-end connections and there are various options for 2/4Gb/s speeds at the back end.

Choosing a vendor on physical specifications alone is simple using the spreadsheet. However there are plenty of other factors not included here. First, there’s performance. Only IBM (from what I can find) offers their arrays to scrutiny by the Storage Performance Council. Without a consistent testing method, any other figures offered by vendors are completely subjective.

Next, there’s the thorny subject of feature sets. All vendors offer variable LUN sizes, some kind of failover (I think most are active/passive), multiple O/S support, replication and remote copy functionality and so on. Comparing these isn’t simple, though as the implementation of what should be common features can vary widely.

Lastly there’s reliability and the bugs and gotchas that all products have and which the manufacturers don’t document. I’ll pick an example or two; do FC front-end ports share a multiprocessor? If so, what impact does load on one port have on the other shared port? What downtime is required to do maintenance, such as code upgrades? What is the level of SNMP or other management/alerting software?

The last set of issues would prove more difficult to track so I’m working on a consistent set of requirements from a product. In the meantime, I hope the spreadsheet is useful and if anyone can fill the gaps or wants to suggest other comparable mid-range/modular products, let me know.

You can download the spreadsheet here: http://www.storagewiki.com/attachments/modular%20products.xls

Tuesday, 19 December 2006

New RAID

Lots of people are talking about how we need a new way to protect our data and that RAID has had it. Agreed, going RAID6 gives some benefits (i.e. puts off the inevitable failure by a factor again), however the single problem to my mind with RAID today is the need to read all the other disks when a real failure occurs. Dave over at Netapp once calculated the risk of re-reading all those disks in terms of the chance of a hard failure.

The problem is, the drive is not involved in the rebuild process - it dumbly responds to the request from the controller to re-read all the data. What we need are more intelligent drives combined with more intelligent controllers; for example; why not have multiple interfaces to a single HDD? Use a hybrid drive with more onboard memory to cache reads while the heads are moving to obtain real data requests. Store that data in memory on the drive to be used for drive rebuilds. Secondly, why do we need to store all the data for all rebuilds across all drives? Why with a disk array of 16 drives can't we run multiple instances of 6+2 RAID across different sections of the drive?

I'd love to be the person who patents the next version of RAID....

Tuesday, 5 December 2006

How low can they go!

I love this picture. Toshiba announced today that they are producing a new 1.8" disk drive using perpendicular recording techniques. This drive has a capacity of 100GB!

It will be used on portable devices such as music players; its only 54 x 71 x 8 mm in size, weighs 59g and can transfer data at 100MB/s using an ATA interface.

I thought I'd compare this to some technology I used to use many years ago - the 3380 disk drive.

The model shown on the right is the 3380 CJ2 with a massive 1.26GB per unit and an access time similar to the Toshiba device. However the transfer rate was only about 3MB/s.

I couldn't find any dimensions for the 3380, but from the picture of the lovely lady, I'd estimate it is 1700 x 850 x 500mm which means 23,500 of the Tosh drives could fit in the same space!

Where will we be in the next 20 years? I suspect we'll see more hybrid drives, with NAND memory used to increase HDD cache then more pure NAND drives (there are already some 32GB drives announced). Exciting times...

Monday, 4 December 2006

Is the Revolution Over?

I noticed over the weekend that http://www.storagerevolution.com's website was down. Well it's back up today, but the forums have disappeared. I wonder, is the revolution over for JWT and pals?

Thursday, 30 November 2006

A rival to iSCSI?

I've said before, somethings just pass you by, and so it has been with me and Coraid. This company uses ATA-over-Ethernet, a lightweight IP storage protocol to connect hosts to their storage products.

It took me a while before I realised how good this protocol could be. Currently iSCSI uses TCP/IP, which encapsulates SCSI commands within TCP packets. This creates an overhead but does allow the protocol to be routed over any distance. AoE doesn't use TCP, it uses its own protocol so doesn't suffer the TCP overhead, but can't be routed. This isn't that much of an issue as there are plenty of storage networks which are locally deployed.

Ethernet hardware is cheap. I found a 24 port GigE switch for less than £1000 (£40 a port). NICs can be had for less than £15. Ethernet switches can be stacked providing plenty of redundancy options. This kills fibre channel on cost. iSCSI can use the same hardware; AoE is more efficient.

AoE is already being bundled with Linux, drivers are available for other platforms. Does this mean the end of iSCSI? I don't think so; mainly because AoE is so different and also more importantly it isn't routable. However what AoE does offer is another alternative which can use standard cheap, readily available products. From what I've read, AoE is simple to configure too; certainly with less effort than iSCSI. I hope it has a chance.

Wednesday, 22 November 2006

Brocade on the up

Brocade shares were up 10.79% today. I know they posted good results but this is a big leap for one day. I wish I hadn't sold my shares now! To be fair, I bought at $4 and sold at $8 so I did OK - and it was some time ago.

So does this bode well for the McDATA merge? I hope so. I've been working on Cisco and the use of VSANs and I'm struggling to see what real benefit I can get out of them. Example: I could use VSANs to segregate by: Line of Business; Storage Tier; Host Type (Prod/UAT/DEV). Doing this immediately gives me 10's of combinations. Now the issue is I can only assign a single storage port to one VSAN - so I have to decide; do I want to segment my resources to the extent I can't use an almost unallocated UAT storage port when I'm desperate for connectivity for production? At the moment I see VSANs as likely to create more fragmentation than anything else. There's still more thinking to do.

I'd like to hear from anyone who has practical standards for VSANs. It would be good to see what best practice is out there.

Tuesday, 14 November 2006

Backup Trawling

I had a thought. As we back up more and more data (which we must be, as the amount of storage being deployed increases at 50-100% per year, depending on who you are) then, there must be more of an issue finding the data which needs to be restored. Are we going to need better tools to trawl the backup index to find what we want?

Virtually There

VMware looks like it is going to be a challenge; I'm starting to migrate standard storage tools into a VMware infrastructure. First of all, there's a standard VMware build for Windows (2003). Where should the swap file go? How should we configure the VMFS that it sits on? What about the performance issues in aligning the Windows view of the (EMC) storage with the physical device so we get best performance? What about Gatekeeper support? and so on.

It certainly will be a challenge. First thing that I need to solve, is if I have a LUN presented as a raw device to my VM guest and that LUN ID is over 128, although the ESX server can see it, when ESX exports it directly to the Windows guest, it can't be seen. I've a theory that it could be the generic device driver on Windows 2003 that VMware uses. I can't prove it (yet) and as yet no-one on the VMware forum has answered the question. Either they don't like me (lots of other questions posted after mine have been answered) or people don't know....

Monday, 13 November 2006

Continuous Protection

I've been looking at Continuous Data Protection products today, specifically those which don't need the deployment of an agent and can use Cisco's SANTap technology. EMC have released RecoverPoint (a rebadged product from Kashya) and Unisys have SafeGuard.

SANTap works by creating a virtual initiator, duplicating all of the writes to a standard LUN (target) within a Cisco fabric. This duplicate data stream is directed at the CDP appliance which stores and forwards it to another appliance located at a remote site where it is applied to a copy of the original data.

As the appliances track changed blocks and apply them in order, they allow recovery to any point in time. This potentially is a great feature; imagine being able to back out a single I/O operation or to replay I/O operations onto a vanilla piece of data.

Whilst this CDP technology is good, it introduces a number of obvious questions on performance, availability etc, but for me the question is more about where this technology is being pitched. EMC already has replication in the array in the form of SRDF which comes in many flavours and forms. Will RecoverPoint complement these or will, in time, RecoverPoint be integrated into SRDF as another feature?

Who can say, but at this stage I see another layer of complication and decision on where to place recovery features. Perhaps CDP will simply offer another tool in the amoury of the Storage Manager.

Friday, 11 May 2007

Wednesday, 9 May 2007

Tuesday, 1 May 2007

Wednesday, 25 April 2007

Tuesday, 24 April 2007

Monday, 23 April 2007

Wednesday, 18 April 2007

Tuesday, 17 April 2007

Friday, 13 April 2007

Wednesday, 11 April 2007

Monday, 9 April 2007

Saturday, 7 April 2007

Wednesday, 4 April 2007

Thursday, 22 March 2007

Friday, 16 March 2007

Wednesday, 14 March 2007

Tuesday, 6 March 2007

Monday, 5 March 2007

Friday, 16 February 2007

Thursday, 15 February 2007

Tuesday, 13 February 2007

Thursday, 8 February 2007

Wednesday, 7 February 2007

Wednesday, 31 January 2007

Monday, 29 January 2007

Sunday, 28 January 2007

Friday, 26 January 2007

Thursday, 25 January 2007

Thursday, 18 January 2007

Tuesday, 16 January 2007

Sunday, 14 January 2007

Thursday, 11 January 2007

Friday, 5 January 2007

Thursday, 4 January 2007

Wednesday, 3 January 2007

Tuesday, 2 January 2007

Thursday, 21 December 2006

Wednesday, 20 December 2006

Tuesday, 19 December 2006

Tuesday, 5 December 2006

Monday, 4 December 2006

Thursday, 30 November 2006

Wednesday, 22 November 2006

Tuesday, 14 November 2006

Monday, 13 November 2006

My Personal Profile

My Company

Subscribe To

What Am I Doing?

Blog Archive

FEEDJIT Live Page Popularity

FEEDJIT Live Traffic Map

FEEDJIT Live Traffic Feed