Tuesday, 16 September 2008

How much is your hard drive worth?

Have you ever thought of renting out your hard drive?

So here's the problem. Everyone needs backup. Most people are too lazy or inept to perform backups regularly. There are tools out there to use. There are online tools too which will help with the problem - but there's also a fee for using them (more about that later).

The thing is, we all have plenty of free space on our hard disks and this is set to increase as we deploy larger drives. So why not "rent out" a chunk of your unused storage to store other people's backups and backup/restore using a peer-to-peer model?

Now, there are some drawbacks. If your backups are dispersed across multiple servers, when you want to perform a restore, your data may not be available online. You could store multiple copies on different servers, but that then becomes more costly. It may mean you have to give up more of your hard disk freespace than you currently store, in order to get access to duplicate backup copies. Why? Well, for a shared backup service to work, everyone needs to provide an equitable amount of storage - you provide as much space for backup as you use.

A solution to this is to de-duplicate. There is a big opportunity to remove redundant copies of data (think of all your operating system files) which, if files are broken up into chunks, could provide the ability to remove a significant amount of redundant information.

However, here comes the killer blow. De-duplication will not work. It will not work because you will want to encrypt your data if it is going to be stored on someone else's server or PC. Even worse, this data is going to be stored on the server of someone you don't know and you can guarantee that means some of those people will be less than honest and looking to gain criminal advantage from accessing your data.

As soon as encryption is brought into the mix, de-duplication can't be used to reduce the volume of data stored as everyone's encrypted data will essentially look like random data and therefore the peer-to-peer backup & de-duplication model becomes untenable.

I mentioned earlier using online backup services and this weekend I subscribed to Mozy, which does offer a free 2GB service. Unfortunately free also means restricted, so I couldn't backup any data on network shares. This made my test a little limited, but at least I could try the features Mozy offer by backing up some data from my laptop hard drive. After an hour watching my backup data being encrypted and crawling to the Internet at a snail's pace, I gave up.

To be honest I don't know why I was so surprised with the limitations of online backup. A little over 10 years ago, I was involved in a project at StorageTek where we looked into offering an online backup service. The software was there, however telecoms was the inhibitor due to the cost of connection. At the time (especially in the UK, but less so the US), there was a lot of "free" Internet access around, which although free as a subscription, relied on local rate 'phone calls to make money. This is great for infrequent browsing but no good for a permanent connection and the cost to the customer of being online overnight was too high, and so the project was eventually shelved.

Although the cost base has shifted and "all you can eat" broadband is common, services like Mozy are not free. The "Pro" service is $3.95 plus $0.50/GB, per month (I assume this means there's a monthly $3.95 although it isn't clear). For my almost 1TB of data, I'd be looking at close to $500/month or $6000 a year!

When I see a bill like that, two things come to mind:

(a) - what is all this data and do I *really* need it? What can I prune?

(b) - how can I structure my data better so I only back up the stuff I really need to secure online? What other ways are there of protecting my content that doesn't change much?

So, I intend to not waste my time writing a P2P backup service. Instead I will invest some time structuring my data, cleaning up the garbage, looking at one-off static content backups and other options for that data (like DVD/Blu-Ray backups kept in a firesafe) and perhaps use cloud storage backups in a limited fashion.


Mark said...

For DR purposes you'll want a full copy to hand.

If the upload hurts imagine the restore when you're in trouble?

I use a HD for a full DR copy and offsite *some* material with Mozy.

And yeah classifying the garbage helps. There's stuff I just skip as it won't matter all that much if it goes away.

Homme said...


Storage said...


On Storage Monkeys we are doing some public testing a free peer-to-peer service called Wuala. If you choose not to share your disk space then you can pay for space.

James Orlean

BarryWhyte said...

I started writing this thinking of the distinction between end user "copies" and production systems... however maybe there is no distinction.

For example, I have my work laptop (using now) and I have a desktop machine at my desk. I end up emailing files between these machines, as windows local sharing is a nightmare... but then I have a "desktop" - a la gaming machine at home, and a media DVD sized PC connected to the TV. So keeping files in sync, and backups, not to mention those ireplacable photos of the kids...

What we need is a global file system that detects duplicate files, updated versions of files, changes in those files... maybe de-dups the common parts so as to enable regressions, but also duplicates them...

Thats when I realized this isn;t just an end user (local) problem but a global industry issue too...

PPS. Those legal folks are dragging their heels, will be in touch next week :)

Chris M Evans said...

Guys, thanks for the pointers to other online services, I'll check them out.

Barry, sounds like you're asking for the (almost) impossible. I've done the same - emailed myself files back and forth to save the hassle of trying to remember which is the valid copy on which machine. It always gets too complicated...

Oh and no worries about the other stuff - I'm in the US next week anyway.