Wednesday, 9 July 2008

Distributed Computing Nirvana

Last year I blogged about the concept of storage futures (or options - I'd have to go back and check which) which would allow storage charges to be based on a forward pricing model. The logic for this was to penalise those "customers" who don't bother to take the time and plan their storage demand requirements. By charging more closer to the delivery time of the storage, customers are dis-incentivised to ask for new storage at the last minute.

The evolution of cloud computing has the opportunity to deliver on my original idea. I'm sure I don't have to explain Cloud computing to anyone, but in case you're not aware, it is effectively distributed computing for the Web age. Amazon Web Services is probably the most popular service (in terms of awareness). The Amazon services provide the ability to create a virtual machine, perform database services, manage queues and of course store data in their S3 Simple Storage Service. I'll discuss my experiences on Cloud Storage in more detail in another post.

Many large organisations are facing issues with meeting the power and cooling demands in their datacentres. This is being driven by the increase in computing power and storage density, achieved by the use of blade server technology and virtualisation. Although more computing is being achieved in the same physical space, for some organisations their business model demands more computing to take place in order to gain business advantage. Think of pharmaceutical companies who are using software to model organic chemical interactions rather than perform the experiments in the lab.

I suspect if you investigate the use of computing in a lot of these datacentres, then only a small percentage of the computing power will be dedicated to core business operations. There will be many applications providing anciliary services such as reporting, financials, batch processing, reconciliation, inventory and so on. Many will not be time or location dependent and could easily be removed from the core datacentres for processing elsewhere.

Obviously this change of processing requires a different operating model. A key trend in the industry is to consolidate into a small number of large (and expensive) datacentres, but by operating in this way, companies are artifically constraining their growth into the size of these datacentres and setting a timeline which will require new datacentres to be built before expansion can continue.

So what is the answer? Computing could to move to be location independent except for only those critical components which can't suffer the effects of latency. As an example, take file archiving. If data has been unreferenced for more than a specific time (say 3-6 months) then move it into the storage cloud. The data can be duplicated in multiple locations automatically to provide redundancy. Note that I'm assuming all the issues of security have been investigated, discussed, resolved and implemented.

Immediately redundant data is out of the datacentre and the cost of storage reduced to a service charge, which is likely to be signficantly lower than the cost in the primary location.

Some organisations may decide that the cloud is too unsafe for their data. In this instance this is where a more appropriate datacentre strategy needs to be developed. Rather than having a small number of "megacentres", smaller location critical sites can be built for primary data, with other sites developed in locations offering the cheapest space/power costs. In this way, large organisations could effectively operate their own computing (and storage) cloud.

I think the options for real distributed computing are really exciting. They provide the opportunity to "green" the storage environment over and above the simple task of deploying larger disk drives and bigger storage systems.

No comments: