In my previous post I started the discussion on how cloud storage could actually be useful to organisations and not be simply for consumer use.
One of the big issues that will arise is the subject of standards. To my knowledge, there is no standard so far which determines how cloud storage should be accessed and how objects should be stored. Looking at the two main infrastructure providers, Amazon and Nirvanix, the following services are offered:
S3 (Simple Storage Service) - storage of data objects up to 5GB in size. These objects are basically files with metadata and can be accessed via HTTP or BitTorrent protocols. The application programming interface (API) uses REST/SOAP (which is standard) but follows Amazon's own standards in terms of functions to store and retrieve data.
Elastic Block Store (EBS) - this feature offers block-level storage to Amazon EC2 instances (elastic compute cloud) to store persistent data outside of the compute instance itself. Data is accessed at the block level, however it is still stored in S3.
Storage Delivery Network (SDN) - provides file-based access to store and retrieve data on Nirvanix's Internet Media File System. Access is via HTTP(S) using standard REST/SOAP protocols but follow Nirvanix's proprietary API. Nirvanix also offer access to files with their CloudNAS and FTP Proxy services.
The protocols from both Amazon and Nirvanix follow standard access methods (i.e. REST/SOAP) but the format of the APIs are proprietary in nature. This means the terminology is different, command structures are different, the method of storing and retrieving objects is different and the metadata format for referencing those objects is different.
Lack of standards is a problem. Without a consistent method for storing and retrieving data, it will become necessary to program to each service provider implementation, effectively causing lock-in to that solution or creating significant overhead for development.
What about availability? Some customers may choose not to use one service provider in isolation, in order to improve the availability of data. Unfortunately this means programming to two (or potentially more) interfaces and investing time to standardise data access to those features available in both products.
What's required is middleware to sit between the service providers and the customer. The middleware would provide a set of standardized services, which would allow data to be stored in either cloud, or both depending on the requirement. This is where RAIC comes in:
RAIC-0 - data is striped across multiple Cloud Storage infrastructure providers. No redundancy is provided, however data can be stored selectively based on cost or performance.
RAIC-1 - data is replicated across multiple Cloud Storage infrastructure providers. Redundancy is provided by multiple copies (as many as required by the customer) and data can be retrieved using the cheapest or fastest service provider.
Now there are already service providers out there offering services that store data on Amazon S3 and Nirvanix SDN; companies like FreeDrive and JungleDisk, however these companies are providing cloud storage as a service rather than offering a tool which integrates the datacentre directly with S3 and SDN.
I'm proposing middleware which sits on the customer's infrastructure and provides the bridge between the internal systems and the infrastructure providers. How this middleware should work, I haven't formulated yet. Perhaps it sits on a server, perhaps it is integrated into a NAS application, or a fabric device. I guess it depends on the data itself.
At this stage there are only two cloud storage infrastructure providers (CSIPs), however barriers to entry in the market are low; just get yourself some kit and an API and off you go. I envisage that we'll see lots of companies entering the CSIP space (EMC have already set their stall out by offering Atmos as a product, they just need to now offer it as a service via Decho) and if that's the case, then competition will be fierce. As the offering count grows, then the ability to differentiate and access multiple suppliers becomes critical. When costs are forced down and access becomes transparent, then we'll truly have usable cloud storage.