Monday 17 December 2007

Taking out the trash

In a recent post, Hu Yoshida refrences an IDC presentation discussing the rate of growth of structured versus unstructured data. It seems that we can expect unstructured data to grow at a rate of some 63.7% annually. I wonder what actual percentage of this data represents useful information?

Personally I know I'm guilty of data untidiness. I have a business file server on which I heap more data on a regular basis. Some of it is easy to structure; Excel and Word documents usually get named with something meaningful. Other stuff is less tangible. I download and evaluate a lot of software and end up with dozens (if not hundreds) of executables, msi and zip files, most of which are cryptically named by their providers.

Now the (personal) answer is to be more organised. Every time I download something, I could store it in a new structured folder. However life isn't that simple. I'm on the move a lot and may download something at an Internet cafe or elsewhere where I'm offline from my main server. Whilst I use offline folders and synch a lot of data, I don't want to synch my entire server filesystem. The alternative is to create a local image of my server folders and copy data over on a regular basis, trouble is, that's just too tedious and when I have oodles of storage space, why should I bother wasting my time? There will of course come a time when I have to act. I will need to upgrade to bigger or more drives and I will have (more) issues with backup.

How much of the unstructured data growth out there occurs for the same issues? I think most of it. I can't believe we are really creating real useful content at a rate of 63.7% per year. I think we're creating a lot of garbage that people are too scared to delete and can't filter adequately using existing tools.

OK, there are things out there to smooth over the cracks and partially address the issues. We "archive", "dedupe", "tier" but essentially we don't *delete*. I think if many more organisations operated a strict Delete Policy on certain types of data after a fixed non-access time, then we would all go a long way to cutting the 63.7% down to a more manageable figure.

Note to self: spend 1 hour a week tidying up my file systems and taking out the trash.....

Wednesday 12 December 2007

2.5" Enterprise Arrays

I was asked the question today, when will Enterprise arrays support 2.5" drives as standard? It's a good question, as at first glance the savings should be obvious; smaller, lower power drives, more drives in an array and so on.

However things aren't that simple. Doing the comparisons and working out some of the basic calculations such as Watts per GB or GB per cm3 then 2.5" drives don't offer that much of a saving (if at all). I've posted some sample comparisons here.

I'm not aware of any vendors who are planning to offer 2.5" drives in Enterprise arrays. Looking at the mechanics of making the conversion, then there would be a few issues; first is the interconnect, SAS versus FCAL, however that should be an easy one to resolve. Second, there's the physical layout of the drives and prividing maintenance access to each of them. That might prove more tricky, achieving a high density footprint and providing access to each drive individually.

If anyone is aware of anyone planning to use 2.5" drives as standard, please let me know.