The Storage Architect: 2007

Monday, 17 December 2007

Taking out the trash

In a recent post, Hu Yoshida refrences an IDC presentation discussing the rate of growth of structured versus unstructured data. It seems that we can expect unstructured data to grow at a rate of some 63.7% annually. I wonder what actual percentage of this data represents useful information?

Personally I know I'm guilty of data untidiness. I have a business file server on which I heap more data on a regular basis. Some of it is easy to structure; Excel and Word documents usually get named with something meaningful. Other stuff is less tangible. I download and evaluate a lot of software and end up with dozens (if not hundreds) of executables, msi and zip files, most of which are cryptically named by their providers.

Now the (personal) answer is to be more organised. Every time I download something, I could store it in a new structured folder. However life isn't that simple. I'm on the move a lot and may download something at an Internet cafe or elsewhere where I'm offline from my main server. Whilst I use offline folders and synch a lot of data, I don't want to synch my entire server filesystem. The alternative is to create a local image of my server folders and copy data over on a regular basis, trouble is, that's just too tedious and when I have oodles of storage space, why should I bother wasting my time? There will of course come a time when I have to act. I will need to upgrade to bigger or more drives and I will have (more) issues with backup.

How much of the unstructured data growth out there occurs for the same issues? I think most of it. I can't believe we are really creating real useful content at a rate of 63.7% per year. I think we're creating a lot of garbage that people are too scared to delete and can't filter adequately using existing tools.

OK, there are things out there to smooth over the cracks and partially address the issues. We "archive", "dedupe", "tier" but essentially we don't *delete*. I think if many more organisations operated a strict Delete Policy on certain types of data after a fixed non-access time, then we would all go a long way to cutting the 63.7% down to a more manageable figure.

Note to self: spend 1 hour a week tidying up my file systems and taking out the trash.....

Wednesday, 12 December 2007

2.5" Enterprise Arrays

I was asked the question today, when will Enterprise arrays support 2.5" drives as standard? It's a good question, as at first glance the savings should be obvious; smaller, lower power drives, more drives in an array and so on.

However things aren't that simple. Doing the comparisons and working out some of the basic calculations such as Watts per GB or GB per cm3 then 2.5" drives don't offer that much of a saving (if at all). I've posted some sample comparisons here.

I'm not aware of any vendors who are planning to offer 2.5" drives in Enterprise arrays. Looking at the mechanics of making the conversion, then there would be a few issues; first is the interconnect, SAS versus FCAL, however that should be an easy one to resolve. Second, there's the physical layout of the drives and prividing maintenance access to each of them. That might prove more tricky, achieving a high density footprint and providing access to each drive individually.

If anyone is aware of anyone planning to use 2.5" drives as standard, please let me know.

Tuesday, 27 November 2007

Analysis: Adaptec

This is an ongoing series of posts looking at storage companies and their investment potential.

** DISCLAIMER: This and related blog entries are for fun only and do not represent investment advice. You should make your own opinions on investments or consult a financial adviser **

Background

Adaptec has a well known history for the manufacture of server and PC interface cards to connect SCSI hard drives and tape devices. More recently, the company has produced a range of NAS appliances under the brand name of SnapServer, athough it continues to produce SCSI, SAS and SATA adaptors.

Market Details

Adaptec is quoted on the the NASDAQ market with the code ADPT. Current price is $3.25.

Shares outstanding: 121,073,000
Market Capitalisation: $393,729,396
Earnings Per Share: -$0.13
P/E Ratio: n/a
Yield: n/a

* figures from http://www.nasdaq.com on 26 November 2007.

Adaptec is running at a loss. Looking at the company financials, net income for the last financial year was just under $31,000,000 on falling sales of $255,000,000. Sales have dropped steadily over the last 3 years. Net income over the last 3 quarters has also run at a loss. However in the announcement of results for the first quarter 2008. Adaptec signalled their intention to restructure and cut losses, reducing the workforce by 20%.

Competitors

The NAS market is a competitive one; there are huge number of NAS appliance manufacturers in the current market, from small scale to enterprise class. For HBAs, the parallel SCSI market has gradually reduced, with many manufacturers deploying integrated solutions within their server products.

Outlook

The outlook for Adaptec looks tough. This is clearly spelled out in the 1Q2008 announcement. SCSI and SAS adaptors have become commodity items, reducing the revenue from these products, despite the need for continuous investment. Adaptec lost out on an OEM deal and significantly affected their business. The NAS market will be tough and require innovative product features.

At this stage, I think Adaptec shares are not worth investing in. I would rate them as a SELL. To change this position, I would expect to see a significant uptake in the SnapServer products before they are a worthwhile purchase.

Monday, 26 November 2007

Using Windows Shrink

I've been messing about recently with the "shrink" option which is now available on Windows Vista under Disk Management. This allows a partition to be reduced in size, subject to there being sufficient free space available.

I wanted to reduce the size of my "C:" drive as the standard installation for my laptop didn't come with any spare disk space or a separate partition for data. I always prefer to have a separate partition for data in case I have to re-install the O/S at any time.

So, my C: drive has plenty of spare space, at around 67GB. Right-clicking the partition in Disk Management and clicking the "shrink volume" option presents the option box as shown below.

Unfortunately, despite the fact that plenty of space is free (and in fact is also free contiguously, the shrink command fails to release the 40GB I'm trying to reclaim.

Vista accepts the command however spends about 2 minutes churning over the hard disk to eventually fail.

Clearly the layout of the disk is a problem, although Vista doesn't give me any clues to what's happening. On another track, I've been looking at HDD fragmentation and I now have PerfectDisk 8.0 installed on my laptop. (**Disclaimer** I paid for this software!).

Performing a defragmentation analysis shows me the layout in the next picture.

There's an unmovable file at the end of the C:\ drive, indicated by the red areas. Unfortunately PerfectDisk doesn't identify what specific file this is, although I know it is likely to be the page/swap file, the hibernation file or other control areas for NTFS.

I performed an offline defrag to tidy the NTFS areas however this didn't resolve the problem. I then removed completely the swap file and this turned out to be the file holding this space. I could then run shrink and reclaim my 40GB of space, which I've allocated to a swap volume and a data volume.

The shrink command is helpful, but it would be more helpful if the results were consistent and the command didn't simply fail when the free space isn't at the end of the volume. It also rather annoying that Microsoft has removed the graphic representation of a fragmented volume from their built-in defrag tool. Defrag can be run from the command line; try defrag c: -a -v from a command window. The output is not particularly helpful

Analysis report for volume C: OS
Volume size = 103 GB
Cluster size = 4 KB
Used space = 85.75 GB
Free space = 17.57 GB
Percent free space = 17 %
File fragmentation
Percent file fragmentation = 1 %
Total movable files = 121,296
Average file size = 633 KB
Total fragmented files = 659
Total excess fragments = 4,104
Average fragments per file = 1.03
Total unmovable files = 21,926
Free space fragmentation
Free space = 17.57 GB
Total free space extent = 86
Average free space per extent = 209 MB
Largest free space extent = 11.10 GB
Folder fragmentation
Total folders = 17,206
Fragmented folders = 38
Excess folder fragments = 772
Master File Table (MFT) fragmentation
Total MFT size = 141 MB
MFT record count = 144,197
Percent MFT in use = 99
Total MFT fragments = 2

...and is certainly not graphical.

I suggest finding a good defragmentation tool - I don't care which one, but one that can (a) move/reorganise system files (b) display a graphical representation of the volume, including identifying individual files and (c) will consolidate freespace.

Wednesday, 21 November 2007

Revenue and Customs in records blunder

The Revenue and Customs (IRS equivalent) in the UK have managed to lose two CDs containing the details of 25 million UK citizens (adults and children) who claim child benefit. For those who don't know, Child Benefit is an allowance paid to all children in the UK irrespective of their parents' income. This means the data lost effectively covers every child in the UK and their parents who are responsible for them.

There have been lots of data loss issues reported in the last year, mostly within large corporations. What amazes me is that systems are designed to allow a single individual to export so much sensitive data in one download and to be able to put this data onto a CD. Even if a "Junior Official" had broken the rules, its about time systems were designed to prevent data exports like this to happen in the first place.

Systems must be designed to be "closed loops" with data being exported only to other secure systems across secure links, rather than relying on sending CDs in the post. Our own government should be setting the standard for industry. Unfortunately this will just become another reason to mistrust the tax man.

Monday, 19 November 2007

Brocade Update

Since I wrote about Brocade shares on 7th November, the share price has dropped from $9.28 to $7.74, or 16.6%. That changes the P/E ratio from 17.19 to 14.33. Does that make them more investable? Perhaps. In August the shares dropped as low as $6. In the last 52 weeks, the lowest price has been $4.79. I think if the price gets to $5 then I'll invest. That seems to me like a reasonable price. I'm not sure what is driving the negative sentiment against Brocade (other than perhaps the general market downturn).

Bear in mind, those of us in the UK that the $/£ is still at 2.05, making US investments even more attractive.

On another investment note, 3Par are back to their IPO price of $14 after their 2nd day of trading....

Saturday, 17 November 2007

3Par float

3Par finally floated on Friday this week. You can read the official announcement here. 7.5 million shares were released at a target price of $14 (about £7 in real money). In total, there are 60,015,700 shares of common stock now outstanding, valuing the company at a shade over $840 million. This isn't bad for a company with just over $66m in revenues for 2007, with a $15.5m loss.

It's interesting that loss has been running at $15-17m per year from the figures quoted on the SEC filing, despite the rise in revenue. At the end of the first day of trading the shares were up by $1.75 or 12.5%. To be honest, I would have expected the shares to rise higher than this however on reflection, perhaps not.

3Par sell themselves as "utility" storage and have majored on the concept of thin provisioning. Unfortunately this isn't a USP for them any more. HDS have this on their enterprise products, even EMC have announced thin provisioning is coming to DMX. So, what is 3Par, than just another modular storage provider?

Wednesday, 7 November 2007

Analysis: Brocade

This is the next of a series of posts on storage companies.

** DISCLAIMER: This and related blog entries are for fun only and do not represent investment advice. You should make your own opinions on investments or consult a financial adviser **

Background

Brocade Communication Systems Inc is one of the leading manufacturers of storage networking hardware. As one of the original developers of the fibre channel protocol, Brocade is positioned at the centre of storage networking technology. In addition to storage hardware, Brocade also markets and sells FAN technology, including Wide Area File Services and file virtualisation software.

In January 2007 Brocade purchased (merged with) McDATA Corporation reducing the major players in the fibre channel market from three to two.

Market Details

Brocade is quoted on the NASDAQ market with the ticker code BRCD.

Shares Outstanding: 389,774,000
Market Capitalisation: $ 3,617,102,720
Earnings Per Share: $ 0.54
P/E Ratio: 17.19
Yield: None

Figures from http://www.nasdaq.com/ on 6 November 2007

Brocade does not declare dividends, so all future value in the shares needs to be gained from the increase in value of the shares. With a P/E Ratio of over 17, Brocade's shares are priced to expect future value. Clearly as an investment, the question is whether this is likely to occur.

Competitors

As previously mentioned, Brocade merged with one of their major competitors in January 2007. In the fibre channel switch space, this leaves Cisco Systems as the only major competitor for high end fibre channel switches. In the FAN market there are many competitors, including Acopia, Netapp, EMC/Rainfinity and others.

Outlook

The fibre channel market is maturing and as such, growth in this market is incremental. Port prices (the usual way FC hardware is sold) have reduced from over $2000/port to less than $500 in the current market. Brocade are certainly not bringing anything new to the party in terms of their FC business. In fact, FC is likely to be eroded by iSCSI and potentially FCoE (Fibre Channel over Ethernet) which poses the possibility of relegating expensive fibre channel hardware in favour of standard Ethernet technology.

Consequently I see times being tough for Brocade. The McDATA purchase wasn't an easy one. McDATA were on a slide and I believe Brocade acquired them for market share and their customer base. In the FC space, integrating the McDATA technology with that of Brocade initially proved a nightmare as the roadmap for FC wasn't clear. Brocade now faces a significant challenge against Cisco and so needs the other non-FC markets to grow their business. For a company that currently does not declare dividends that makes the stock purchase a simple one; how quickly will revenue and profit grow to increase the asset value of the shares?

I think slowly and so I'd make Brocade a "hold".

On the subject of Seagate, Stephen posted and asked whether I saw it as a sell, hold or buy. At this stage I think I would like to hold a hard disk manufacturer in my virtual portfolio, however whether that is Seagate or WD (Western Digital) remains to be seen. I'll make the decision once I've evaluated WD.

Tuesday, 6 November 2007

Equallogic gets Bought

Dell have announced their intention to purchase Equallogic, an iSCSI hardware vendor.

The purchase price is $1.4 billion in cash but the deal doesn't complete for some time, late 2008 or 2009.

Is this an attempt for Dell to move on their storage business? I've never taken them seriously on the storage front, especially as they sell EMC equipment at an effective loss. I could never see the point of that, even if it did gain them some market share.

I wonder how many organisations which would have purchased Equallogic will now not bother?

Monday, 5 November 2007

USP-V does SATA

So the rumours (here and here) are true. Here's the announcement to prove it. HDS are going to support 750GB SATA II drives in the USP.

This is an interesting position HDS are taking as thin provisioning will be able to take use of the enhanced drive capacity making the USP-V even more efficient than the DMX-4 with SATA drives.

However I do wonder whether HDS have been pushed into supporting SATA on the USP-V. I was under the impression that thin provisioning on external drives (the standard HDS line - use External SATA storage rather than configuring it within the USP itself) wasn't going to be available in the initial release. Perhaps HDS had to support SATA in order to get best usage out of the thin provisioning option and to answer customer complaints about using thin provisioning with expensive storage.

What I'd like to see next is how HDS will now position internal versus external storage. At what point do externally connected SATA drives become more cost effective than internal ones? This announcement seems to muddy the waters from that perspective.

I imagine we will get an announcement from Hu explaining how logical it is and how it is all part of the ongoing strategy....

The Case of Decimal v. Binary

We all know that disk drive manufacturers have been conning us for years with their definition of what constitutes a gigabyte. There are two schools of thought; the binary GB, which is 1024x1024x1024 or 1,073,741,824 bytes and the decimal GB which is quoted as 1000x1000x1000 or 1 billion bytes. The difference is significant (7.4%) and something that has annoyed me for years, mainly for having to explain to "management" why the 100TB of storage they bought stores less than 100TB of data (significantly less with some manufacturers).

Seagate have just lost a class action law suit (see here) which means they are forced to give a 5% rebate to customers plus some free backup software.

Unfortunately, those of us who have purchased storage subsystems from vendors will not be able to claim as the case refers to customers who bought stand-alone units:

"You are a member of the settlement class if, between March 22, 2001 and September 26, 2007, you purchased in the United States a new Seagate brand hard disc drive from an authorized Seagate retailer or distributor, separately as a Seagate product that was not pre-installed into and bundled with a personal computer or other electronic device."

So I guess EMC/HDS/IBM can breathe easy at this stage. It does beg the question though, when are our large array vendors going to quote the real figures for the usable storage available on an array? I have various calculators I use to help me work out the usable storage (before and after LUN creation and with varying RAID types) as I need to present real rather than throretical figures when sizing arrays. I think we need some more transparency here.

Wednesday, 31 October 2007

CU Free

Does anyone use CU Free? Here's the reason for my question.

I've just implemented a migration from a pair of 9980V and 9970Vs to a single USP in one site and a 9970V and 9980V remaining in the other site. All of the MCU->RCU relationships (4 of them) are being used between the USP and the 99XXV boxes.

If I implement another USP in the site with the 9900's I want to replicate from the existing USP. Will CU Free let me exceed the MCU->RCU restriction or is it just a helpful way of saving me having to enter all the paths for all the CU relationships I want to use? i.e. is the restriction of 4 still there regardless?

Monday, 29 October 2007

Analysis: Seagate

This is the first of a series of posts looking at storage companies and their investment potential.

** DISCLAIMER: This and related blog entries are for fun only and do not represent investment advice. You should make your own opinions on investments or consult a financial adviser **

Background

Seagate's main business is the manufacture and distribution of hard disk drives. The company was founded in 1979 by Al Shugart and Finis Conner. Shugart is regarded as the father of modern hard disks, being involved with the teams that invented both the first hard drive at IBM in the 1950's and with inventing the floppy disk. Today, Seagate offers products for business and consumer markets including the latest portable devices. These range from 8GB 3600RPM Compact Flash drives to the sixth generation of the Cheetah drive - a 15K RPM drive with 450GB capacity.

Market Details

Seagate is quoted on the NYSE with the code STX.

Shares outstanding: 528,788,000
Market Capitalisation: $14.55 billion
Earnings Per Share: 2.28c
P/E Ratio: 11.82
Yield: 1.45%

* figures from http://www.nyse.com/ on 29 October 2007.

These are lots of interesting numbers, but what do they mean? I've linked to Wikipedia entries which explain what most of the important terms are.

The P/E Ratio gives a good idea of how fairly valued the shares are. It is a ratio of the price of the shares compared to the earnings of the company, therefore the lower the number, the better. For comparison, here are a few more P/E ratios; Netapp - 28.79, EMC - 39.03, Google - 47.4, Brocade - 17.67. Seagate therefore seems low, but P/E can't be looked at in isolation. Higher P/E ratios may indicate a company with potentially higher future growth prospects. I would say that Seagate's business is purely incremental growth as they are not likely to be bringing a new product class to market or radically changing their business model.

Yield indicates how valuable the last dividend was as a percentage of the share price, so at 1.45%, the return on $100,000 of stock would be only $1450 per year. If dividends were the only reason for investing, it would take 69 years to return your investment! Obviously Seagate is declaring dividends, so value in the shares is being realised in both capital appreciation and dividend earnings.

Competitors

The hard drive industry is small. In fact Seagate recently acquired Maxtor (although the brand is still retained) and made the industry smaller. The major competitors in the hard drive market are Hitachi Global Storage Technologies (which is a combination of the Hitachi and old IBM storage businesses) and Western Digital. HGST and Seagate together own the market for 15K Enterprise (FC) drives.

Outlook

The hard drive market is continually challenged to increase capacity, improve performance and reduce the power and cooling demands of hard drives. All manufacturers are innovating to get ahead of the competition, however most advances seem to be small steps rather than giant leaps and so no one vendor in the market stands out as having a big competitive leap over the others. It is certainly true to say that the hard disk has become a commodity item.

As a technology leader and considering the future demand for storage (which shows no signs of diminishing), Seagate is set to continue to grow their business. At this stage whether Seagate will form part of my virtual portfolio, only time will tell, however I would say that one HDD vendor will be there.

Storage Stocks

Josh's recent posts have reminded me of a little piece of work I started but didn't finish a few weeks ago.

I've been compiling a list of storage companies which are publicly traded and trying to determine which I feel are value for money as an investment.

** DISCLAIMER ** This blog does not provide investment advice and all opinions on the values of stocks are mine entirely. You should not act upon these opinions but make your own judgements on the merits of investing in any company.

Phew, now that's done, let's get on with it. I guess the first question is why am I doing it? Well, of all the industries in which to invest, I would like to hope I understand the most about storage IT companies than any other. I say hope, as understanding what makes a company good as an investment will be not just their current figures, valuation etc, but rather there future potential for generating revenue and business. That's where things get difficult. There are the easy major players which are guaranteed to make revenue; just think of EMC, Cisco, IBM, Sun, Seagate, Netapp and so on. These companies have established businesses and make money. However not all pay dividends, so money has to be made on capital growth from some of the shares. There are also plenty of blips and gotchas to deal with. For example, Sun and Netapp - how has their current "misunderstanding" over patent rights affected shares?

Then there are the startups which then go IPO. Compellent recently floated. 3Par are planning to. How can these businesses be evaluated (other than by gut instinct) to see whether they are worthwhile?

So, for fun only (which is the main reason for doing this) I will be attempting to review one stock per day, which I will discuss and give my opinion on. I stress - *my opinion*. I could be (and probably will be!) wrong, however those I believe are worth investing in, I will start a "fantasy" investment portfolio just to see how things go!

If you have any opinions or comments, feel free to add them. The first company under review tomorrow will be Seagate.

Thursday, 18 October 2007

Storage Expo

Yesterday and today is Storage Expo in the UK. I haven't bothered to visit for a few years but I decided to go this year and see if things have changed. I even took my camera with the thought of taking a few pictures.

Unfortunately I was less than impressed. Yes it was a good opportunity to catch up with old friends I haven't seen (some since the last time I went to Expo!) but there wasn't anything new to see that I wasn't already aware of. Worst of all, nobody was giving away freebies, which was a huge disappointment.

Anyway within an hour of wandering and chatting we'd hit the pub. Perhaps next year I'll avoid the show and just go directly to th pub first!

Tuesday, 16 October 2007

New Celerra Simulator

Thanks to all those who drew my attention to the new version of the Celerra Simulator. EMC have done the right thing and made this version freely downloadable to anyone with a Powerlink account. This is a good thing as it gives everyone a chance to have a go. Just go into Powerlink and search on Celerra Simulator; the new version is 5.5.29.1.

There have been a few changes made with this version. Firstly there aren't the unwieldly restrictions on the ACE (I think that was the Assured Computing Environment) which meant installing only on an Intel platform and having no other version of VMware installed on the target machine. This was a real pain as I use AMD a lot and I had to install on a machine that was less than desirable. EMC also time expired the previous versions which required a patch to extend the expiration date. Fortunately the latest version has no such restriction.

So I've installed the simulator under VMware on my test machine. After a few niggles with the network settings I managed to get onto the web-based administration tool. So far I've found the simulator slow. That could well be my test machine or the way I've configured the VMware guest the simulator runs in. I need to do some experimenting to get that right.

If you're not familiar (like me) with Celerra, using the simulator can be good and bad; good because you can familiarise yourself with the terminology the product uses; bad because there's a new learning curve to get anything working. In particular to see whether the simulator can actually store any data (which the Netapp simulator can).

Download and have a go - I'd be interested in everyone's feedback.

Monday, 15 October 2007

Disk Sizes Continue to Dazzle

HGST (Hitachi Global Storage Technologies) announced yesterday that they have managed to further miniaturise the drive heads they use in the hard disk drives. I hadn't realised exactly how small these recording heads were; apparently 2000 times smaller than the width of a human hair. Called "current perpendicular-to-the-plane giant magnetoresistive1 (CPP-GMR) heads" (I copied that from the press release to save me getting it wrong) which is a snappy title for any technology, these new heads will apparently move us closer to 4TB drives.

As I previously posted, hard drive technology these days is just amazing. All I can say, is keep it up guys!

Wednesday, 10 October 2007

Your Data on a Knife Edge

I read this interesting article on the BBC website today. It talks about how two European scientists (Albert Fert and Peter Grunberg) have won the Nobel prize for physics for GMR (giant magnetoresistance). This technology has enabled hard drives to be made smaller and to hold more data. What I liked most was the following comment, used to describe the technology:

"A computer hard-disk reader that uses a GMR sensor is equivalent to a jet flying at a speed of 30,000 kmph, at a height of just one metre above the ground, and yet being able to see and catalogue every single blade of grass it passes over."

What a great description of the (extremely cheap) technology we simply rely on every day to provide us our data. With those kinds of tolerance levels, our information really is on a knife edge.

So next time, you moan about a disk failure, just imagine being on that jet...

Monday, 8 October 2007

Storage VMotion

I read the following interesting news release from VMware today. It covers lots of new features for the future release (3.5) of VMware but the one that caught my eye discusses a new feature called VMware Storage VMotion.

This feature will apparently allow the migration of a VMware virtual disk between different storage systems in the same manner that VMotion allows a virtual host to be moved between physical servers. I'm interested in how VMware will have chosen to implement this as there are lots of places in the stack they could choose to do it. For example, will there be any integration with array vendors' technology (like SRDF/TrueCopy) to manage the replication at the lowest level? Will the replication be managed via a virtual target/initiator in the fabric or will the VMware O/S manage the duplexed data writes at the application layer?

The difficulty of using replication outside of the application layer will be managing data integrity and also speed of replication cutover as the VMware guest is shifted to the new physical location. Add into that the complexity that each member of the VMware "cluster" will probably want read/write access to the virtual disks on which the virtual hosts are defined, then technologies like SRDF aren't going to work.

What about the CDP products? These seem to be a good logical fit as they replicate and track each block change independently, but I think the same issues of read/write access will apply and therefore these products will be equally unsuitable.

I think it is likely VMware will implement a "standard" cluster with multiple disks being written to by all members of the cluster and using IP to manage synchronisation. This may be good for a local solution but in reality what does VMotion then buy you? As a tool for managing the location of virtual machines across a farm of servers then VMotion is a fantastic tool. I just love the ability to move a host around to manage performance/workload on physical machines and to provide the ability to take physical servers out of a complex in order to do maintenance work.

But with today's 99.999% available storage subsystems, which can be maintained and expanded online without outage, is there any benefit in being able to move a VMware host to another storage system, unless that storage system is remotely located?

Storage VMotion sounds like a great idea but I'm not sure of the practical use of it - especially bearing in mind there will be a significant cost associated with the new feature.

Wednesday, 3 October 2007

SPC

According to Wikipedia, lightning can travel at a speed of 100,000 MPH, however I think storage vendors are even faster than lightning when it comes to highlighting or dissing the competition.

Mere microseconds after reading Claus Mikkelsen's blog on the USP-V SPC figures, there are posts from BarryW and BarryB, doing the highlighting and dissing respectively (I almost wrote respectfully there; that would have been a funny typo). BarryB must have no real work to do other than to write his blog, looking at the size of the posts he does!

Anyway. I'm not going to comment on the results because the others have done that enough already and I don't think the details are that relevant. I think what's more relevant is the stance EMC are taking in not providing figures for customers on the performance of their equipment. I can't decide whether its a case of arrogance and therefore a feeling they don't need to provide details because as BarryB says, the customer will buy anyway, or is it because the DMX will not match up to the performance of its competitors. I think it is a mixture of both.

EMC aren't an array vendor any more and haven't been for a long time. OK, it is the product they're most remembered for historically, but their reach is now so wide and deep I think Symmetrix isn't the focus of a lot of their attentions. If it was, DMX4 would not just scale by the GB, it would have more connectivity, more cache and EMC would have been the *leader* in the implementation of technology like thin provisioning, not the also ran.

On reflection, I think EMC should provide SPC figures. If DMX is better than the others and is "Simply the Best" prove it; bragging starts to sound hollow after a while.

Friday, 28 September 2007

Storage Standards - Arrays

Storage Standards - Arrays

After a recent posting by Stephen I thought it would be good to discuss standards and the sort of standards I'd look to implement when designing an infrastructure from scratch. Obviously it isn't always possible to start from a clean slate and consequently many sites have what I describe as "partial implementations of multiple standards" where different people have put their stamp on an environment but not necessarily been able or wanted to go back and change the previous configuration. Still, let's assume we have a little flexibility and have a blank canvas, so to speak.

Objectives

I think it's worth having an idea of what you are trying to achieve by laying down standards in the first place. I think it boils down to a number of issues:

Risk: A messy configuration poses more risk to data loss. If Storage Admins aren't sure whether disks are in use and to which servers then devices can easily be re-used inadvertently or perhaps be neglected from replication and cause issues in a DR scenario. For me, reducing risk is my main reason for adherence to a set of rigourous standards (sounds like an airline, where they always tell you as you board that their main priority is your safety, I always thought it was to make money from me).

Efficiency: Storage is easy to overallocate and allocations today are very inefficient. We keep getting told this on the surveys that come out on a weekly basis. In almost no time at all we'll have more wasted data than atoms in the known universe or something like that. I think the figures get quoted as anywhere between 30-70% wastage. Either way, that's a lot of money being wasted on spinny stuff which doesn't need to be. With sensible standards, at least the storage can be accounted for and attributed to an owner (who can then be charged lots for the privilege), even if that owner isn't then efficient themselves. Efficiency at the host level is a whole separate argument to be discussed later.

Manageability: When the storage environment is easy to understand, it is easier to administer. I once worked in a site which had four EMC 8830 arrays (two local, two in a remote site) all connected to each other. The local arrays had SRDF'd gatekeepers! (Warning: That was an EMC-based joke; apologies to all the hardended HDS fans out there who don't get that one). Needless to say, locating storage which could be reused and determining hosts which had or did not have replication was a time consuming job. Each allocation took 3-4 times more than necessary and half my time was spent attempting to clean the environment up.

So now we know what the reasons are, perhaps we can look at some of the standard to apply. Unfortunately most standards will tend to be very generic, as local restrictions will constrain exactly how arrays are laid out.

Use RAID protection. This may seem a little obvious, however what I mean by this statement is that you should be reviewing your use of RAID and the RAID format used by tier. Lower tiered storage may be more suited to a higher RAID level but high performance data may need RAID 1/10. Either way, you need it and you should have a standard per tier.

Distribute data across as many physical disks as possible. This also may seem like common sense but achieving it isn't always that simple. As disks have become larger, then more LUNs can be carved from each RAID group. This can have a negative impact on performance if only a small number of array groups are in use. There are also issues for upgrading arrays; a single 4-drive array group can provide nearly a TB of storage using 300GB drives (much more with new 500 and 750 drives as they become the norm), so the physical restrictions of the hardware become more apparent. It can be a real challenge to balance cost versus performance if your organisation insists on only adding small increments of storage as they are needed.

Keep LUN sizes consistent. I don't like having a large number of differing LUN sizes. In fact I prefer to have just one if I can get away with it, however it isn't always that simple. LUN sizes should be defined in blocks (as most arrays use 512 byte blocks) and be the same even across storage from different vendors. This makes any type of migration easier to achieve. One tricky problem is choosing an appropriate LUN size. I think choosing an appropriate size or sizes borrows heavily from historical numbers but you should consider the size of your array groups (or physical disks) when planning LUN sizes. The more LUNs in a RAID group then the higher risk of contention at the physical level. Consider keeping any LUNs sizes as multiples of each other; as systems grow, LUNs can then be consolidated down to larger sizes.

Use a dispersed method for numbering LUNs. Both EMC and HDS (can't speak for IBM) will recommend an inital configuration in which consecutively numbered LUNs are taken from different array groups one at a time and then repeated until all data is allocated. This means whichever group of sequential LUNs numbers you choose they will automatically be balanced across array groups. I have worked in sites that have used both methods and I can say that sequential numbering adds a significant amount of extra workto the allocation process.

Don't dedicate array groups to specific LUN sizes. It may be nice to use one array group to create smaller LUNs for say a log device. This is *not* a good idea as you will end up creating an I/O bottleneck on those volumes. If you must have differing LUN sizes, create an even number from each array group.

Distribute physical disks across all back-end directors. This may seem obvious but it is possible to unevenly balance disks across back-end directors, especially if storage is purchased in increments and different drive sizes are used. Keep things consistent, distribute the disks of each size evenly across the available directors.

Distribute hosts across all front-end directors. There are two possible ways to distribute hosts, by capacity and by performance. You should decide which is more important for you and load balance accordingly. Continue to monitor both performance and allocations to ensure you are gettingt the best out of your FEPs.

Dedicate replication links from each front-end director card. I like to ensure I have plenty of ports for replication (I've seen one issue recently, vendor name withheld, where lack of processor power in terms of FEPs for replication caused a performance problem, which was resolved by having more replication links (but not necessarily more cross site bandwidth), so I reserve at least one port/port pair on each FED card for this purpose.

Dedicate specific FE ports for replicated and non-replicated storage. I prefer if possible to dedicate FEPs to production/replicated and non-replicated hosts in order to ensure that the same performance is available on both the primary and DR sites. If a DR host is also used as a UAT or test box, then place those disks on a separate FEP; that's what SAN is for!
Dedicate array groups for replicated and non-replicated storage. This seems to contradict some of the other statements however, from a logistical point of view and if enough storage is available, it can be attractive to reserve out certain array groups for replicated and non-replicated storage, ensuring that the same arrays and LDEV/LUN numbers are used on replicated boxes.
Allocate host storage from as many array groups as possible. When performing an allocation, try and spread the load across as many RAID groups as possible.
Make use of your LDEV/LUN ranges. On HDS systems as an example, I like to reseve out CU ranges for tiers, so 00-0F for tier 1, 10-1F for tier 2 and 30-3F for command devices, external devices etc.

This is by no means an exhaustive list and I'm sure I will remember more. If anyone has any suggestions on what they do, I'd be interested to hear!

Tuesday, 25 September 2007

Storage Futures or is it Options?

One of the trickiest problems in the storage industry is managing demand. Internal customers seem to think that storage isn't physical and we just have tons of the virtual stuff we can pick out of the air as required. I don't think they expect to talk to the server teams and find they have dozens of servers sitting spinning just waiting for the next big project, but for some reason with storage they do.

So this lack of foresight causes demand problems as arrays have to be managed. Whilst we could simply add a new RAID group or bunch of disks when customer demand requires it, allocating all the new storage on the same RAID group, chances are performance would "suck". Really we want to add storage in bulk and spread the workload.

Similar problems occur when arrays cannot be expanded and a new footprint has to be installed (which can have significant lead time and ongoing costs, for instance fabric ports for connectivity). I can hear the bleating of many a datacentre manager now, asking why yet more storage is needed and where it will go in the already overcrowded datacentre.

The standard charging model is to have a fixed "price guide" which lists all the storage offerings. Internal customers are then charged on arrears for their storage usage. Some companies may have an element of forward planning but they are torturous processes and anyway, someone always comes along with an allegedly unforseen requirement.

Ideally, the answer is for all storage users to manage their own demand, estimating BAU (Business As Usual) growth and requirements for new products. Unfortunately, the penalties for lack of planning don't usually exist and poor practices perpetuate.

So now about offering futures (or options) on storage? Internal customers can purchase a right to buy (option) or an obligation to buy (future) storage for some time in the future, say 1, 3, 6 or 12 months ahead. In return they receive a discounted price. Storage hardware costs from vendors are always dropping so the idea of charging less in the future in order to gain more of an insight into demand is probably not an unreasonable concept.

Futures/Options could also work well with thin provisioning. Storage is pre-allocated on a virtual basis up front, then provided on the basis of futures contracts by adding more real storage to the array.

Now the question is, to use futures or options? Well, perhaps futures best represent BAU demand as this is more likely to be constant and easily measurable. Options fits project work where projects may or may not be instigated. Naturally futures would attract a higher discount than options would.

I think I need to make up a little spreadsheet to test this theory out...

Monday, 24 September 2007

PSSST....Green Storage

HDS announced today a few amendments to the AMS/WMS range. The most interesting is the apparent ability to power down drives which are not in use a-la-Copan.

According to the press release above, the drives can be powered down by the user as necessary, which presents some interesting questions. Firstly, I guess this is going to be handled through a command device (which presumably is not powered down!) which will allow a specific RAID group to be chosen. Imagine choosing to power down a RAID group someone else is using! Presumably all RAID types can be supported with the power down mode.

One of the cardinal rules about hardware I learned years ago was never to power it off unless absolutely necessary; the power down/up sequence produces power fluctuations which can kill equipment. I'm always nervous about powering down hard drives. I've seen the Copan blurb on all the additional features they have in their product which ensures the minimum risk of drive loss. I'd like to see what HDS are adding to AMS/WMS to ensure power down doesn't cause data loss.

Finally, what happens on the host when an I/O request is issued for a powered down drive? Is the I/O simply failed? It would be good to see this explained as I would like to see how consistency is handled, especially in a RAID configuration.

However, any step forward which makes equipment run cooler is always good.

The announcement also indicated that 750GB SATA drives will be supported. More capacity, less cooling....

NTFS Update

I did some more work on my NTFS issue on Friday. As previously mentioned, I was seeing NTFS filesystems with large levels of fragmentation even after drives were compressed.

The answer turns out to be quite simple; Windows doesn't consolidate the free space blocks which accumulate as files are created and deleted. So, as a test I started with a blank 10GB volume and created a large file on it. Sure enough the allocation occurs in a small (2 or 3) number of extents. I then deleted the large file and created 10,000 small (5K) files and deleted those too. I then re-created the large file, which immediately was allocated in 100's of small fragments and needed defragmentation immediately. The large file was created using the freespace blocks freed up from the small files.

What's not clear from the standard fragmentation tool provided with Windows is that the free space created by the deletion of files is added to a chain of free space blocks. These free space blocks are never consolidated even if they are contiguous (i.e. as in this instance where I deleted all the files on the disk). This means even if you *delete* everything on a volume, then the free space is still fragmented and files will be created with instant fragmentation. The other thing to note is that the standard Windows defragmenter doesn't attempt to consolidate those segments when a drive is defragmented, it simply ensures that files are re-allocated contiguously. It also doesn't report that fact either.

I'm currently downloading Diskeeper, which allegedly does consolidate free space. I'm going to trial this and see how it affects my fragmentation problem.

Incidentally, I used one of Sysinternals' free tools to look at the map of my test drive. Sysinternals were bought by Microsoft in the summer of 2006, however you can find their free tools here. I used Diskview to give me a map of the drive and understand what was happening as I created and deleted files. What I would like, however is a tool which displays the status of free space fragments. I haven't found one of those yet.

So, now I have an answer, I just have to determine whether I think fragmentation causes any kind of performance issue on SAN-presented disks!

Friday, 21 September 2007

Problems Problems

This week I've been working on two interesting (ish) problems. Well, one more interesting than the other, one a case of the vendor needing to think about requirements more.

Firstly, Tuning Manager (my old software nemesis) strikes again. Within Tuning Manager it is possible to track performance for all LUNs in an array. The gotcha I found this week is that the list of monitored LUNs represents only those allocated to hosts and is a static list which must be refreshed each time an allocation is performed!

This is just lack of thought on behalf of the developers not to provide a "track everything" option so it isn't necessary to keep going into the product, selecting the agent, refreshing the LUN list and tagging them all over again. No wonder allocations can take so long and be fraught with mistakes when Storage Admins have to include in their process the requirement to manually update the tuning product. I'm still waiting for confirmation that there isn't a way to automatically report on all LUNs. If there isn't then a product enhancement will be required to meet what I want. In the meantime, I'll have to ensure things are updated manually. So if you configured Tuning Manager and the LUN list when you first installed an array, have a quick look to see if you're monitoring everything or not.

I'm sure some of you out there will point out, with good reason, why HTnM doesn't automatically scan all LUNs, but from my perspective, I'm never asked by senior management to monitor a performance issue *before* it has occurred, so I always prefer to have monitoring enabled for all devices and all subsystems if it doesn't have an adverse affect on performance.

Second was an issue with the way NTFS works. A number of filesystems on our SQL Server machines show high levels of fragmentation, despite there being plenty of freespace on the volumes in question. This fragmentation issue seems to occur even when a volume is cleared and files are reallocated from scratch.

A quick trawl around the web found me various assertions that NTFS deliberately leaves file clusters between files in order to provide an initial bit of expansion. I'm not sure this is true as I can't find a trusted source to indicate this is standard behaviour. In addition I wonder if it the way in which some products allocate files; for instance if a SQL backup starts to create a backup file it has no real idea how big the file will become. NTFS (I assume) will choose the largest block of freespace available and allocate the file there. If another process allocates a file almost immediately, then it will get allocated just after the first file (which may only be a few clusters in size at this stage). Then the first file gets extended and "leapfrogs" the second file, and so on, producing fragmentation in both files.

I'm not sure if this is what is happening, but if this is the way NTFS is working then it would explain the levels of fragmentation we see (some files have 200,000+ fragments in a 24GB file). In addition, I don't know for definite that the fragmentation is having a detremental impact on performance (these are SAN connected LUNs). Everything is still speculation. I guess I need to do more investigation...

Saturday, 15 September 2007

Pause for Thoughtput

I've just read a couple of Gary O's postings over at Thoughtput, the blog from Gear6.

In his article "Feeding the Virtual Machines", he discussed NAS and SAN deployment for a virtual environment and makes the bold claim:

"Most people tend to agree that NAS is easier and more cost effective than SANs for modern data center architectures."

I have to say that I for one don't. Anyone who's had to deploy hardware such as Netapp filers will know there's a minefield of issues around security, DNS and general configuration, which unless you know the products intimately are likely to catch you out. I'm not saying SAN deployments are easier, simply that both SAN and NAS deployments have their pro's and con's.

The second post, Shedding Tiers questions the need to tier storage in the first place and Gary makes the comment:

"If money were no object, people would keep buying fast drives"

Well, of course they would. I'd also be driving a Ferrari to work and living in Cannes with a bevvy of supermodels on each arm but unfortunately like most people (and businesses) I have champagne tastes and beer money...

Tiering is only done to save money as Gary rightly points out, but putting one great honking cache in front of all the storage seems a bit pointless. After all, that cache isn't free either and what happens if those hosts who are using lower tier storage don't need the performance in the first place?

I almost feel obliged to use BarryB's blogketing keyword.... :0)

Friday, 14 September 2007

SAN Virtual Appliances

LeftHand, FalconStor, Arkeia and Datacore all now offer VMware appliance versions of their products. I'm in the process of downloading them now and I'm hoping to install over the next few days and do some testing. I've previously mentioned some VM NAS products which I've installed but not reported back on. I'll try to summarise all my findings together.

It seems that the market for virtual appliances (certainly in storage) is getting bigger. I think this is a good thing but I'm not sure that the virtualisation technology today provides capabilities to allow all vendors to virtualise their products. I suspect that the iSCSI brigade will get best benefit out of this wave of technology but fibre channel will not, as (from my experience) VM products don't directly pass through fibre channel hardware to the VM guests (I'm aware of how RDM works in a VMware environment but I don't think pass-through of target devices is sufficient).

Will IBM produce an SVC Virtual Appliance? I doubt it, but products such as Invista should be perfect candidates for virtualising as they don't sit in the data path and the controller parts aren't critical to performance. So EMC, show us your commitment to Invista and make 3.0 the virtual version!

Wednesday, 12 September 2007

Green Poll

Here are the results from the green poll (13 votes only :-( )

Q: Is the discussion of green storage just hype?

54% - Yes it is hype, vendors are riding the bandwagon
15% - No, it is an important issue and vendors are solving it
15% - I'm not sure still deciding
15% - No, it is an important issue and vendors are not solving it

Highly unscientific due to the poll size but I have to agree that a lot of the discussion is smoke and mirrors.

While we're on the subject I had a look at Western Digital's "green" hard drives. They are claiming with a little bit of clever code, they can reduce the power demands of their higher end SATA range. Here's a clip of the specific new features taken from their literature:

IntelliPower™ — A fine-tuned balance of spin speed, transfer rate and cache size
designed to deliver both significant power savings and solid performance.
IntelliSeek™ — Calculates optimum seek speeds to lower power consumption,
noise, and vibration.
IntelliPark™ — Delivers lower power consumption by automatically unloading the
heads during idle to reduce aerodynamic drag.

The use of these techniques is claimed to reduce idle time to 4.0W and average read/write to 7.5W per drive. I've had a look at other manufacturers and this is a saving of about 4W per drive. WD make plenty of statements as to how much this represents in cost and no doubt it is a good thing that manufacturers are thinking in this way, however it does make me think we should be examining exactly what data we're storing on disk if we are happy with a just large saving in idle time. If this data is not inactive then obviously the power savings are less, but there's no free lunch here and if data is active then a drive is going to use power. SATA drives may be able to compromise on performance but I can't imagine anyone purchasing nice fast 15K drives will want to compromise in any way. (While I think of it, developing a tiered storage strategy should include evaluating the "cost" of accessing the data in power terms)

Tuesday, 11 September 2007

USP-VM

Hitachi has announced (10th September) the availability of a new storage array, the USP-VM. At first glance this appears to be the USP-V equivalent of the NSC55 as it has very similar characteristics in terms of cache cards, FEPs etc. Unfortunately HDS have provided links to specification pages not all of which include the USP-VM references. Bit sloppy that.

I've previously deployed a number of NSC55's and within 6 months wondered whether they were the right decision. They weren't as scalable as I needed and there were a few features (such as BED ports and FED ports sharing the same interface card) which were a bit of a concern (imagine losing a FEP and having to take 1/2 of all your BE paths offline to fix the problem). I'm always reminded of the DMX1000/2000/3000 range when I think of the NSC model as these EMC arrays weren't expandable and of course a DMX1000 quickly filled up....

Hu describes the USP-VM as "Enterprise Modular" in his blog entry. This may be a bit generous as (a) I doubt the USP-VM will be priced as low as modular storage and (b) I don't think it will support the whole range of disks available in a modular array. I say "think" as the link to the capacity page for the USP products doesn't yet include the USP-VM.....

Friday, 7 September 2007

Virtualisation Update

Thanks to everyone who commented on the previous post relating to using virtualisation for DR. I'm looking forward to Barry's more contemporaneous explanation of the way SVC works.

I guess I should have said I understand Invista is stateless - but I didn't - so thank's to 'zilla for pointing it out.

So here's another issue. If SVC and USP cache the data (which I knew they did) then what happens if the virtualisation appliance fails? I'm not just thinking about a total failure but a partial failure or another issue which compromises the data in cache?

I was always worried that a problem with a USP virtualising solution was understanding what would happen if a failure occurred in the data path. Where is the data? What is the consistency position? A datacentre power down could be a good example. What is the data status as the equipment is powered back up?

Using Virtualisation for DR

It's good to see virtualisation and the various products being discussed again at length. Here's an idea I had some time ago for implementing remote replication by using virtualisation. I'd be interested to know whether it is possible (so far no-one from HDS can answer the question on whether USP/UVM can do this, but read on).

The virtualisation products make a virtue out of allowing heterogenous environments to be presented as a unified storage infrastructure. This even means carving LUNs/LDEVs presented from an array into consituent parts to make logical virtual volumes at the virtualisation level. Whilst this can be done, it isn't a requirement and in fact HDS sell the USP virtualisation on the basis that you can virtualise an existing array through the USP without destroying the data, then use the USP to move the data to another physical LUN. Presumably the 1:1 mapping can be achieved on Invista and SVC (I see no reason why this wouldn't be the case). Now, as the virtualisation layer simply acts as a host (in USP's case a Windows one - not sure what the others emulate) then it is possible (but not usually desirable) to present storage which is being virtualised to both the virtual device and a local host, by using multiple paths from the external array.

If the virtualisation engine is placed in one site and the external storage in another, then the external storage could be configured to be accessed in the remote site by a DR server. See example 1.

Obviously this doesn't gain much over a standard solution using TC/SRDF other than perhaps the ability to asynchronously write to the remote storage, making use of the cache in the virtualisation engine to provide good response times. So, the second picture shows using a USP as an example, a 3 datacentre configuration where there are two local USP's providing replication between each other but the secondary LUNs in the "local DR site" are actually located on external storage in a remote datacentre. This configuration gives failover between the local site pair and also access to a third copy of data in a remote site (although technically, the third copy doesn't actually exist).

Why do this? Well, if you have two closely sited locations with existing USPs where you want to retain synchronous replication and don't want to pay for a 3rd copy of data then you get a poor man's 3DC solution without paying for that third data copy.

Clearly there are some drawbacks; you are dependent on comms links to access the secondary copy of data and in a DR scenario performance may be poor. In addition, as the DR USP has to cache writes, it may not be able to destage them to the external storage in a timely fashion to prevent cache overrun due to the latency on writing to the remote external copy.

I think there's one technical question which determines whether this solution is technically feasible and that is; how do virtualisation devices destage cached I/O to their external disks? There are two options I see; firstly they destage using an algorithm which minimises the amount of disk activity or they destage in order to ensure integrity of data on the external disk in case of a failure of the virtualisation hardware itself. I would hope the answer would be the latter rather than the former here, as if the virtualisation engine suffered some kind of hardware failure, I would want the data on disk to still have write order integrity. If this is the case then my designs presented here should mean that the remote copy of data would still be valid in case of loss of both local sites, albeit as an async copy slightly out of date.

Can IBM/HDS/EMC answer the question of integrity?

Wednesday, 5 September 2007

Invista

There's been a few references to Invista over the last couple of weeks, notably from Barry discussing the "stealth announcement".

I commented on Barry's blog that I felt Invista had been a failure, due to the number of sales. I'm not quite sure why this is so, as I think that virtualisation in the fabric is utimately the right place for the technology. Virtualisation can be implemented at each point in the I/O path - the host, fabric and array (I'll exclude application virtualisation as most storage managers don't manage the application stack). We already see this today; hosts use LVMs to virtualise the LUNs they are presented; Invista virtualises in the fabric; SVC from IBM sits in that middle ground between the fabric and the array and HDS and others enable virtualisation at the array level.

But why do I think fabric is best? Well, host-based virtualisation is dependent on the O/S and LVM version. Issues of support will exist as the HBAs and host software will have supported levels to match the connected arrays. It becomes complex to match multiple O/S, vendor, driver, firmware and fabric levels across many hosts and even more complex when multiple arrays are presented to the same host. For this reason and for issues of manageability, host-based virtualisation is not a scalable option. As an example, migration from an existing to a new array would require work to be completed on every server to add, lay out and migrate data.

Array-based replication provides a convenient stop-gap in the marketplace today. Using HDS's USP as an example, storage can be virtualised through the USP, appearing just as internal storage within the array would. This provides a number of benefits. Firstly driver levels for the external storage are now irrelevant (only requiring USP support, regardless of the connected host), the USP can be used to improve/smooth performance to the external storage, the USP can be used for migration tasks from older hardware and external storage can be used to store lower tiers of data, such as backups or PIT copies.

Array-based replication does have drawbacks; all externalised storage becomes dependent on the virtualising array. This makes replacement potentially complex. To date, HDS have not provided tools to seamlessly migrate away from one USP to another (as far as I am aware). In addition, there's the problem of "all your eggs in one basket"; any issue with the array (e.g. physical intervention like fire, loss of power, microcode bug etc) could result in loss of access to all of your data. Consider the upgrade scenario of moving to a higher level of code; if all data was virtualised through one array, you would want to be darn sure that both the upgrade process and the new code are going to work seamlessly...

The final option is to use fabric-based virtualisation and at the moment this means Invista and SVC. SVC is an interesting one as it isn't an array and it isn't a fabric switch, but it does effectively provide switching capabilities. Although I think SVC is a good product, there are inevitably going to be some drawbacks, most notably those similar issues to array-based virtualisation (Barry/Tony, feel free to correct me if SVC has a non-disruptive replacement path).

Invista uses a "split path" architecture to implement virtualisation. This means SCSI read/write requests are handled directly by the fabric switch, which performs the necessary changes to the fibre channel headers in order to redirect I/O to the underlying target physical device. This is achieved by the switch creating virtual initiators (for the storage to connect to) and virtual targets for the host to be mapped to. Because the virtual devices are implemented within the fabric, if should be possible to make the virtual devices accessible from any other fabric connected switch. This poses the possibility of placing the virtualised storage anywhere within a storage environment and then using the fabric to replicate data (presumably removing the need for SRDF/TrueCopy).

Other SCSI commands which inquire on the status of LUNs are handled by the Invista controller "out of band" by an IP connection from the switch to the controller. Obviously this is a slower access path but not as important in performance terms as the actual read/write activity.

I found a copy of the release notes for Invista 2.0 on Powerlink. Probably about the most significant improvement was that of clustered controllers. Other than that, the 1.0->2.0 upgrade was disappointing.

So why isn't Invista selling? Well, I've never seen any EMC salespeople mention the product never mind pushing it. Perhaps customers just don't get the benefits or expect the technology to be too complex, causing issues of support and making DR an absolute minefield.

If EMC are serious about the product you'd have expected them to be shoving it at us all the time. Maybe Barry could do for Invista what he's been doing in his recent posts for DMX-4?

Monday, 3 September 2007

PDAs

Slightly off topic, I know, but I mentioned in the last post (I think) that I'd killed my PDA. I had the iPAQ hx4700 with the VGA screen which could be rotated to work in landscape rather than portrait mode. It was a big machine and slightly cumbersome but the screen made up for everything else as I could watch full motion videos on the train to work!

But it died as it didn't survive a fall onto a concrete floor. Not the first time this had happened but previously I'd been lucky.

So I needed a new PDA and fast. I was hoping HP had improved upon the 4700 model with something more swish in a smaller design and with that same great screen.

It wasn't to be. All the models are either hybrid phone devices or use the dull QVGA format which I had on an PDA over 5 years ago! I don't want a phone hybrid as I think people look like t****rs when they use them as a phone.

I decided to look further afield and see what other machines were out there. I had previously been a fan of Psion, who made the best PDA/organisers ever (Psion 5mx in case you're asking), but their demise effectively forced me down the iPAQ route. There are other PDA vendors out there but I'd never heard of most of them and I didn't see any good reviews.

In the end I went for the rather dull hx2700 which I received today. It is dull, dull, dull, dull.

It is me or have HP put all their innovation into phone/PDA hybrids? Is this what people expect of a PDA these days? Maybe that's why HP have ignored the non-phone models; the lack of decent competition doesn't help too.

Thank goodness for competition in the storage market or my job would be as boring as my time management!

Thursday, 30 August 2007

Holiday is over...

I'm back from my annual summer break (hence the lack of posts for a couple of weeks). I managed to resist the temptation to go online while away (partly because I only had access to the 'net via a 32K modem and mostly because of the hard stares from my wife every time I went near the computer)....

I addition, I trashed my PDA by dropping it onto a concrete car park floor so I had no wireless access either. Loss of my PDA was annoying (if not just for the cost of replacement) but for the potential loss of access to data. Fortunately being a good storage guy, I write all my data to an SD card so didn't lose any access.

Any-hoo, before I went away, I solved my Cisco problem. Turns out it was a bug; upgrading to OS 3.1(1) and above then caused routing issues with FCIP links created after the upgrade. A supervisor swap sorted the problem. The resolution was quite simple but the effort to get there was significant, despite the huge amount of diagnostics built into Cisco equipment. Still, I learned a lot and as usual it showed me that however hard I work, I'll never know everything (or probably even 10% of the storage world).

Right, back to catching up with my RSS backlog. These days its almost impossible to keep up with the daily posts!

Thursday, 16 August 2007

Using that 8Gbps

Symantec/Veritas announced this week the release of Netbackup 6.5. It seems to me like the company has been talking about this version of the product forever and as is mentioned on other blogs, there are lots of new features to play with.

The ones I've been looking forward to are called SAN Media Server and SAN Client. These allow the SAN to be used (via FC/SCSI) as the transport method for backup data between the client and the media server.

For Symantec, implementing this feature is more difficult than it sounds; A standard HBA operates as an initiator with a storage device (disk or tape) acting as the target. I/O operations are instigated by the initiator to the target which then replies with the results. With the SAN Media Server, Symantec have had to re-write the firmware on the HBAs in the media server to act as a SCSI target. This allows it to receive backup data from the client, from when it is then handled as normal by the media server.

This option has a huge number of potential benefits; If you have already provided a SAN connection to a host (and possibly a production IP connection), there's no need now to provide a backup LAN connection too; just use the SAN. On new installations, this could save a fortune in network ports and would use a lot of potentially wasted fibre channel bandwidth. Throughput could be vastly improved; most fabrics run at 2Gbps today compared to host 1Gbps (OK, the data has to be read from the disk too, but most hosts have two HBA cards, so 4Gbps of aggregate bandwidth). Fabrics tend to be more well structured and therefore suffer less with bottleneck issues.

The only downside I can see is the need to provide one honkin' great media server to suck up all that backup data!

Monday, 13 August 2007

8-gig fibre channel

The Register reported last week that Emulex and Qlogic have announced the imminent availability of 8Gbps fibre channel products.

I have to ask the question...Is there any point?

Firstly, according to the article, the vendors of switch products aren't clarifying their position (both have 10Gb/s technology already). In addition, if a host can push data out at 8Gb/s levels, it will quickly overrun most storage arrays.

How many people out there are using 10Gb/s for anything other than ISLs? (In fact, I think 10Gb/s HBAs don't exist, please prove me wrong :-) ). How many people are even using them for ISLs? If you look at Cisco's technology you can see that you will still be limited to the speed of the line card (about 48Gb/s) even with 10Gb/s, so other than a few less ISL lines, what has it actually bought you?

I still think at this stage we need to focus on using the bandwidth we have; sensible layout and design and optimise fabric capacity.

One final point, presumably 8Gb/s devices will run hotter than existing 4Gb/s - even if we can't/don't use that extra capacity??

Wednesday, 8 August 2007

Patent Everything

Slightly off-post....

About 6 years' ago on a previous life, I used to sell music online. That project failed, however one of the spin-offs I looked into was the concept of virtual jukeboxes. I guess the data issues were slightly storage related. A virtual jukebox would get its music tracks from a broadband connection and locally cache popular tracks.

Anyway, one idea I had was to use SMS messaging to allow customers to select tracks in a bar/pub/etc. I didn't pursue the idea, although I had been messing with SMS services. It now transpires that this idea has been implemented, some 6 years after I thought of it.

Perhaps someone had thought of it some time ago, but I'd like to think I was first. Shame I didn't have the courage of my convictions to see how patentable the idea was.

Hopefully I'll get a second chance with another idea in the future!

Monday, 6 August 2007

No Datacentres in Leicester?

I read something today that quoted BroadGroup's 2006 Power and Cooling Survey. It states that the average UK datacentre uses more power in one year than the city of Leicester.

Now, surely that means there can't even be one average size datacentre in Leicester, or no-one else in the City uses electricity....

Recyclable Storage

I've just had the details through of our new recycling "rules" at home. I now have *four* bins; one for bottles, one for green/garden waste, one for recyclables (plastic bottles, paper, etc) and one for remaining rubbish. Theoretically you'd think that there would be almost nothing in the main bin, but surprisingly, lots of plastic items aren't recyclable, like yogurt pots and food trays. I don't know a great deal about plastics, but you would think that we would be using recyclable plastics wherever possible, especially for disposable items.

That got me thinking about recyclable storage. There's been a lot of discussion about green storage from a power/cooling perspective but how about the recycling of equipment when it reaches the end of "useful" life?

Now, I know that most vendors will recycle components, especially in order to keep older systems running where customers choose not to replace or upgrade, but for some vendors (Netapp seems to be one), there's an aim to keep older hardware off the market (check out ZeroWait for example). What happens to that kit? Is it reused? What happens to old hard drives? Do they get recycled? It would be useful to see how vendors are reducing the amount of natural resources they consume when manufacturing products.

By the way, the WEEE directive in the EU is worth looking into. Is something like this happening in the US?

Sunday, 5 August 2007

Netapp/Cisco

I've been a little quiet on the blog front over the last week, mainly because I've been away on business and I didn't take my laptop ( :-( ). I travelled "lite", which I'm not normally used to doing and that meant taking only the essentials. In fact, as I didn't have any checked baggage, I forgot about a corkscrew in my washbag, which was summarily extracted from me at the security checks at Heathrow.

Anyway enough of that, I've also had another issue to resolve attempting to link two Cisco fabrics via FCIP. It's a frustrating problem which has taken up more of my time than I would like and I still haven't managed to resolve it. Both fabrics already successfully move data via FCIP links, will connect to each other (and the end devices are visible and logged in) but the initiator HBA can't see any targets in the same zone.

These sorts of problems become annoying to resolve as most vendors take you through the level 1 process of problem determination (which translates to "you are an idiot and have configured it wrong") then level 2 ("Oh, perhaps there is a problem, send is 300GB of diagnostics, traces, configurations, date of birth, number of previous girlfriends etc") who get you to "try this command" - usually things you've already tried to no avail, because you actually know what you are talking about.

I'm almost at level 3 ("we've no idea what's causing the problem, we will have to pass to the manufacturer"). Hopefully at this stage I will start to get some results. Does anyone out there have a way to bypass all this first level diagnostics nonsense?

The other thing that caught my eye this week was the comment on Netapp and their targets miss. There is lots of speculation on what has occurred; here's my (two penn'orth/two cents).

Netapp had a great product for the NAS space, there's no doubting that. They made a great play of expanding into the Enterprise space when NAS-based storage became widely accepted. Some features are great - even something as simple as snapshots, replication and flexclones. However I think they now have some fundamental issues.

The Netapp base product is not an Enterprise storage array for NAS/FC/iSCSI. It doesn't scale to the levels of DMX-4 and USP. I think it is a mistake to continue to sell the Netapp appliance against high end arrays. Those of us who deploy USP/DMX technology regularly know what I mean.
The original Netapp technology is hitting a ceiling in terms of its useful life. The latest features customers demand, such as multi-node clustering can't be achieved with the base technology (hence the Spinnaker acquisition).
The product feature set is too complicated. There are dozens of product features which overlap each other and make it very difficult to determine when developing a solution, which is the right to choose (some have fundamental restrictions in the way they work that I found even Netapp weren't clear about).
Netapp developed a culture of the old IBM - that is to say expecting their customers to purchase their products and deriding them if they didn't choose them, attempting to resurrect the old addage "No-one Ever Got Fired for Buying IBM" to "No-one should get fired for buying Netapp".

I think I found point 4 most difficult to deal with; Netapp seemed to think they had a right to be No. 1 selection, almost forcing technical people to have to justify why *not* to buy Netapp.

Perhaps a little humility is long overdue.

Wednesday, 25 July 2007

Getting more from Device Manager

I complained on a previous post about the lack of features in Device Manager. Consequently I've started writing some software to alleviate this situation. Here's the first of hopefully a series of tools to plug some of the gaps.

HDvM Checker will query a host running Device Manager & HDLM agent and return the version. Just enter a host name and click check. This is an early tool and I've done limited testing. I'd be grateful for any feedback. You can download it here.

*Disclaimer* - if you decide to use this software it is entirely at your own risk.

SATA in Enterprise Arrays

In a previous post on DMX-4 I discussed the use of SATA drives in enterprise arrays. A comment from our Storage Anarchist challenged my view of the resilience of SATA drives compared to FC.

Now unless I've been sleeping under a rock, the storage industry has over the last 5 years pummelled us with the warning that SATA are not enterprise arrays, the technology having derived from PC hardware. They were good for 8/5 rather than 24/7 work and not really suitable for large volumes of random access.

Has that message now changed? Were we fooled all along or is this a change of tack to suit the marketers?

What do you think? Are SATA drives (and I mean an *entire* array) in an enterprise array acceptable now?

Tuesday, 24 July 2007

EMC posts higher earnings

EMC posted higher earnings today. Some 21% up on the same quarter last year. It's amazing they've been able to manage double-digit growth for some 16 quarters.

Interestingly (as reported by The Register) the shares ended the day down. However the shares have risen by over 100% over the last 12 months and since March have risen steadily, so investors can't complain. Wish I'd bought some last year!

VTL Poll Results

The VTL poll is finished; results are:

We've had it for years - 41%
We've done a limited implementation - 14%
We're evaluating the technology - 27%
VTL doesn't fit within our strategy - 14%
We see no point in VTL technology - 5%

Obviously this poll is highly un-scientific but it seems most people agree that VTL is worth doing and are either doing it or planning doing it.

A new poll is up relating to green storage.

EMC Power - link?

Thanks to Barry/Mark for their posts/comments on power usage. Call me stupid (and people do) but I can't find the EMC Power Calculator for download on Powerlink as 'zilla suggests (although I did find a single reference to it).

Can you post the link guys? Is it EMC internal only? If so, any chance on sharing it with us? If I can get a copy I'll do more detailed calculations on the components too.

Friday, 20 July 2007

DMX-4 Green or not?

After the recent EMC announcements on DMX-4, I promised I would look at the question of whether the new DMX-4 is really as green as it claims to be. I did some research and the results are quite interesting.

Firstly we need to set the boundaries. One of the hardest part of comparing hardware from different manufacturers is that they are intrinsically different (if they were too similar, the lawyers would be involved) and that makes it difficult to come up with a fair comparison. So, I've divided the comparisons into controller and disk array cabinets. Even this is difficult. The DMX has a central controller cabinet which contains only batteries, power supplies, interface boards and so on. The USP however uses half of the central controller for disks. The DMX has 240 drives per cabinet, however the USP has 256 disks per cabinet. This all needs to be taken into consideration when performing calculations.

Second, I want to explain my sources. I've tried to avoid the marketing figures for two reasons; firstly they usually refer to a fully configured system and secondly they don't provide enough detail in order to break down power usage by cabinet and by component. This level of detail is necessary for a more exact comparison. So, for the USP and USP-V, I'm using HDS's own power calculation spreadsheet. This is quite detailed, and allows each component in a configuration to be specified in the power calculation. For EMC, I'm using the DMX-3 Physical Planning Guide. I can't find a DMX-4 Planning Guide yet, however the figures on the EMC website for DMX-4 are almost identical to those for DMX-3 and it's as close as I can get.

DMX-3/4

The DMX figures are quite simple; the controller cabinet (fully loaded) takes 6.4KVA and a disk cabinet 6.1KVA. A fully configured controller cabinet has 24 controller slots, up to 8 global memory directors and 16 back and front-end director (FED) cards. A typical configuration would have eight 8-port FED cards and 8 BED cards connecting to all 4 disk quadrants. EMC quote the disk cabinet figures based on 10K drives. Looking at Seagate's website and standard 10K 300GB FC drives, each requires 18W of power in "normal" operation, so 240 drives requires 4.32KVA. The difference between this figure and the EMC value will cover when drives are being driven harder and the power supplies and other components which need powering within a disk cabinet. We can therefore work on an assumption of 25.4W per drive on average.

Now the figures for the controller cabinet are interesting. Remember EMC have no drives in the controller cabinet so all the power is for controllers, charging batteries and cache. So all that 6.4KVA is dedicated to keeping the box running.

USP

The HDS power calculator spreadsheet is quite detailed. It allows specific details of cache, BEDS, FEDs and a mix of 73/144/300GB array groups. A full USP1100 configuration has 1152 drives, 6 FEDs, 4 BEDs and 256GB of cache. This full configuration draws 38.93KVA (slightly more than the quoted figure on the HDS website. Dropping off 64 array groups (an array cabinet) reduces the power requirement to 31.50 KVA or 7.43KVA for the whole cabinet. This means the controller cabinet draws 9.21KVA and in fact the spreadsheet shows that a full configuration minus disks draws 5.4KVA. The controller cabinet has up to 128 drives in it, which should translate to about 3.7KVA; this is consistent with the 9.21KVA drawn by a full controller cabinet. The 7.43KVA in a cabinet translates to 29W per drive, making the HDS "per drive" cost more expensive.

This is a lot of data, probably not well presented but it shows a number of things;

There's an inescapable power draw per drive which can't be avoided; this equates to about 20W per drive.
The controller frame needs about 6KVA and this varies only slightly depending on the number of controllers and cache.
The HDS controller is slightly more efficient than the EMC.
The HDS disk array is slightly less efficient than the EMC.

Neither vendor can really claim their product to be "green". EMC are playing the green card by using their higher density drives. There's no doubting that this does compute to a better capacity to power ratio, however these green power savings come at a cost; SATA drives are not fibre channel and not designed for 24/7 workloads. Whilst these drives provide increased capacity, they don't provide the same level of performance and DMX systems are priced at a premium so you want to get the best bang for your buck. However, if EMC were to price a SATA-based DMX competitively, then the model is compelling, but surely that would take business away from Clariion. What's more likely to happen is customers choosing to put some SATA drives into an array and therefore see only modest incremental power savings.

So what's the future? Well, 2.5" drives currently offer up to 146GB capacity at 10K and only half the power demands, which also translates into cooling savings. Is anyone using these in building arrays? Hybrid drives with more cache should allow drives to be spun down periodically, also saving power. Either way, these sorts of features shoudn't come at the cost of the levels of performance and availability we see today.

One final note of interest...HDS are quoting figures for the USP-V. These show a 10% saving over the standard USP, despite the performance improvements...

Tuesday, 17 July 2007

DMX-4

I've had a quick look over the specifications of the new DMX-4 compared to the DMX-3. There aren't really a lot of changes. The backend director connectivity has been upped to 4Gb/s and presumably that's where the 30% throughput improvement comes from (with some Enginuity code changes too I guess).

There are a number of references to energy efficiency, however the "old" and "new" DMX cooling figures are the same and power figures are almost identical. The improved energy efficiency I think is being touted due to the availability of 750GB SATA drives for DMX (not now but later) but in reality that's not going to be a significant saving unless you're filling your entire array with SATA drives. One statement I want to validate is the following:

"Symmetrix DMX-4 is the most energy efficient enterprise storage array in the world, using up to 70 percent less power than competitive offerings."

There are some security enhancements - but there would have to be in order to justify the RSA purchase....

On the positive side, having the option of SATA drives is a good thing - I'd use them for Timefinder copies or dump areas. I wouldn't fill an array with them though.

Perhaps the most surprising announcement is (in green for extra emphasis):

In addition, EMC plans to introduce thin provisioning capabilities for Symmetrix DMX in the first quarter of 2008, enabling customers to further improve storage utilization and simplify storage allocation while continuing to improve energy efficiency.

Whoa there, I thought from all the recent posts (especially this) that Virtualisation/Thin Provisioning was something to be used with care. It will be interesting to see how EMC blogkets this one...

Performance Part V

Here's the last of the performance measurements for now.

Logical Disk Performance - monitoring of LDEVs. There are three main groups Tuning Manager can monitor; IOPS, throughput (transfer) and response time. The first two are specific to particular environments and the levels for those should be set to local array performance based on historical measurement over a couple of weeks. Normal "acceptable" throughput could be anything from 1-20MB/s or 100-1000 IOPS. It will be necessary to record average responses over time and use these to set preliminary alert figures. What will be more important is response time. I would expect reads and writes to 15K drives in a USP to perform at 5-10ms maximum (on average) and for 10K drives to perform up to 15ms maximum. Obviously synchronous write response will have a dependency on the latency of writing to the remote array and that overhead should be added to the above figures. Write responses will also be skewed by block size and number of IOPS

Reporting every bad LDEV I/O response could generate a serious number of alerts, especially if tens of thousands of IOPS are going through a busy array. It is sensible to set reporting alerts high and reduce them over time until alerts are generated. These can then be investigated (resolved as required) and the thresholds reduced further. LDEV monitoring can also benefit from using Damping. This option on an Alert Definition allows an alert to be generated only if a specific number of occurrences of an alert are received within a number of monitoring intervals. So, for instance, an LDEV alert could be created when 2 alert occurrences are received within 5 intervals. Personally I like the idea of Damping as I've seen plenty of host IOSTAT collections where a single bad I/O (or handful of bad I/Os) are highlighted as a problem when 1000s of good fast IOPS are going through the same host.

This is the last performance post for now. I'm going to do some work looking at the agent commands for Tuning Manager, which as has been pointed out here previously, can provide more granular data and alerting (actually I don't think I should have to run commands on an agent host, I think it should all be part of the server product itself, but that's another story).