A quick check on Twitter this morning shows me they're up to message number 1,179,118,180 or just over the 1.1 billion mark. That's a pretty big number - or so it seems, but in the context of data storage devices, it's not that big. Let me explain...
Assume Twitter messages are all the full 140 characters long. That means, assuming all messages are being retained, that the whole of Twitter is approximately, 153GB in size. OK, so there will be data structures needed to store that data, plus space for all the user details, however I doubt whether the whole of Twitter exceeds 400GB. That fits comfortably on my Seagate FreeAgent Go!
If every message ever sent on Twitter can be stored on a single portable hard drive, then what on earth are we storing on the millions of hard drives that get sold each year?
I suspect the answer is simply that we don't know. The focus in data storage is to provide the facility to store more and more data, rather than rationalise what we do have. For example, a quick sweep of my hard drives (which I'm trying to do regularly) showed half a dozen copies of the Winzip installer, the Adobe Acrobat installer plus various other software products that are regularly updated, for example the 2.2.1 update of the iPhone software at 246MB!
What we need is (a) common sense standards for how we store our data (I'm working on those), (b) better search and indexing functionality that can make decisons based on the content of files - like the automated deletion of defunct software installers.
There's also one other angle and that's when network speeds become so fast that storing a download is irrelevant. Then our data can all be cloud-based and data cleansing becomes a value add service and someone else's problem!