The Storage Architect: July 2008

Wednesday 30 July 2008

Drobo Again

As previously discussed I've been gradually moving data over to my Drobo. I recently invested in a DroboShare, which lets me share the contents of the Drobo as NAS.

Although I run my data through a Windows Server, I have been toying with the idea of having a large majority of the content available through a NAS share rather than let the Windows box do the work. In this way, if the server is down, then I'm not affected. This is effectively what the Drobo can achieve, however configuring it wasn't as simple as it seems.

The first issue I encountered was understanding whether, when plugging a Drobo with data on it, I would need to reformat the Drobo to work with the DroboShare (turns out you don't reformat and the data just gets passed through). This "feature" wasn't at all clear in the manual so I decided not to use my primary disk storage and data to test the theory and I just plugged the Drobo alone into the network.

At this point, I'd expect the Drobo to get a DHCP address and be visible in the Drobo dashboard software, but it wasn't. A quick check of DHCP on the server and a trace from Ethereal and I was sure the Drobo had requested and successfully picked up an IP address, but it wouldn't appear on the dashboard.

With fingers, toes and anything else I could find, all crossed, I decided to plug the Drobo into the Droboshare. As if by magic, the drive appeared, presented through the Drobo - with all data intact.

I wouldn't advise anyone to try this kind of cavalier approach but I guess that is part of the underlying Drobo design of simplicity, however I think I like the idea of having a bit more control of my storage.

After a few days of use, I started to notice problems; firstly the DroboShare wasn't always visible; it wasn't clear in the first place how the Drobo would present itself and after tweaking the settings I managed to get it named as \\DROBO\ on my network, but it wasn't always accessible and it wasn't clear why.

The second problem was to do with security - or more precisely the lack of it. Anyone with children will recognise the desire to prevent little fingers tinkering where they shouldn't. For me, that means making my music, video and picture shares all read-only to everyone except me. I can't achieve that with DroboShare.

Eventually the teething problems caused me to take the DroboShare out and resort to my previous configuration. I'm disappointed but perhaps I shouldn't have expected anything over and above a basic NAS setup. Since reverting to plain Drobo, I've had seamless operations once again.

Second hand DroboShare anyone?

Tuesday 29 July 2008

The cuts are happening again

The Register has reported that some UK banks are forcing their contractors to take a 10% pay cut or take a hike. It is usual for contractors to take the hit first as they are an easy target compared to terminating permanent staff and to be fair, as a contractor/consultant myself, if the chop comes then so be it, it's part of the nature of the business.

However, the downside is that the most able contractors are likely and will surely leave first. Many of those who remain will have a lethargic attitude to work and overall, the company suffers.

In addition, Barclays are offshoring more jobs, presumably to further save costs. In my experience, moving jobs offshore doesn't work. It simply moves skills away from the business and causes the customers to lose control over their IT operations. This isn't necessarily a reflection on poor quality of work from the outsourcing suppliers; far from it, I'm sure most are equally as competent or skilled as their UK counterparts. The issue is more to do with operating remotely from the customer; not understanding the "local" issues; not understanding local culture and not being part of the team who see and chat with each other every day. As someone who currently works with multiple clients for 1-2 days a week, I experience this problem even though I know my customers well. Imagine if you've never met most of the people you are providing complex services for.

One final note; Barclays claim they worked out they were paying "over the market rate". That's a phrase I've never understood. It assumes all contractor's skills are equally good (or bad) and so they can be paid, or have their pay cut to the same degree. It also assumes that all contractors were taken on and overpaid by an equal amount. I wonder if they did the same market comparision for permanent staff and considered cutting their salaries by an equivalent amount? Somehow I think not...

Monday 28 July 2008

Size of the Storage Blogosphere

I use Feed Demon to track all of my RSS subscriptions. The product integrates with Newsgator online and now with the iPod Touch. The benefit of having an online location for synchronising my feeds means I can run multiple Feed Demon copies from the different laptops I use on a daily basis.

I need this level of integration as I now have over 110 feeds just focused on storage (111 in fact).

It is incredible that the number of blogs just related to storage is increasing on an almost daily basis.

How many feeds do you read?

Friday 25 July 2008

Enterprise Connectivity with iPod Touch

Last year I blogged about my replacement iPaq after my previous one failed. As I said at the time, it was dull, dull and more dull.

Earlier this year I bought an iPod Touch while in the US. It is a wonderful device. The screen is fantastic, the quality of the applications is great and the video is so watchable. Unfortunately, email integration with my Exchange Server was only via IMAP. The integration worked fine, however it didn't integrate my calendar or contacts. For a while I used the notes application until I jailbreak'ed and installed a to-do list application, although it wasn't that good.

Now I've installed version 2.0 of the 'Touch software (following on from its release on the iPhone). I have full Exchange integration and the ability to install applications (Omnifocus for time management being the first).

Finally my 'Touch has become indispensable. I can't see anything else on the market to touch it (no pun intended)! Now I have a problem...do I invest in the new iPhone....

Thursday 24 July 2008

A Dying Industry (slightly off topic)

The BBC reported today of the next stages of the government and record companies' attempts to crack down on illegal file sharing. Persistent offenders will have their internet connections cut off. Unfortunately, the genie is out of the bottle in terms of the ability for music to be copied and shared.

In the late 90's, I founded a music company that distributed music by CD. You chose the tracks (from our catalogue) and we cut them to disk and shipped them to you. We also had digital downloads DRM protected with Microsoft's encryption. This was *before* Apple release their iTunes store. At the time, the record labels would not give us any new material and we were restricted to back catalogue and inferior content. This ultimately was the demise of the business as our customers couldn't understand why we didn't have access to any music track in the world.

At the time, the record companies' view was that providing us content that could be distributed on CD was risky and would allow people to copy the music. Well, of course it would, in exactly the same way people were *already* copying CDs purchased from record stores! Their principled approach changed when Apple rocked up with large volumes of cash for access to their catalogue and the digital download era started for real.

Now I see a music industry which for years acted in a protectionist fashion, overcharged for their content and milked the customer with countless re-releases and compilations, trying to target anyone they can in order to protect their revenue stream (a bit like when SCO sued IBM).

Unfortunately the genie I mentioned earlier is out there, alive and well and doesn't need the Internet. Music content can easily be swapped on portable hard drives, CD & DVD-ROMs, memory sticks, flash memory cards and so on. The Internet just provides the opportunity to spread music wider than physical media can.

Ultimately I think all copyrighting and protection systems, whether electronic or legal, will fail or be circumvented. We already know that artists are changing their revenue models away from the recorded content into concerts and other revenue streams. Most record companies have moved back from DRM protected content on digital downloads and Sony scored the biggest faux pas with the rootkit they deployed on CDs in 2005.

There will always be people who want something for nothing, as the release of Radiohead's recent album "In Rainbows" shows. However it also shows that people will pay (the average pay price for the album in a survey was believed to be around £4) for decent content and perhaps that represents the crux of the problem - not selling stuff people don't actually want.

Wednesday 23 July 2008

Recycling Drives - Update

Last week I posted about wasted hard drives, removed from arrays and crushed to prevent the leak of sensitive data.

I contacted HGST and Seagate to get some additional background. Here are their responses, slightly edited to correct any spelling mistakes but otherwise intact.

Seagate

(a) when will the technology be deployed in Enterprise FC drives? Our OEMs are currently developing with the Cheetah 15K.6 FDE, a drive that Seagate has already in production.

(b) is the technology proprietary to Seagate? - No, this will becompliant with the Trusted Computing Group's spec. All hard drive vendors are participating in this Trusted Computing Group and we expect that they will have self-encrypting drives that will be inter-operable with ours.

(c) is DriveTrust accepted by the US Government and other similar organisations as secure enough to treat a drive as "wiped" if the encryption keys are removed? Endorsement from National Security Agency (NSA) has already been received for the 1st Self-Encrypting Drive Model-the Momentus(r) 5400 FDE hard drive, for protection of information in computers deployed by U.S. government agencies and contractors for national security purposes.

(d) are any of the "big" manufacturers (EMC/HDS/IBM) looking to deploy DriveTrust enabled drives in storage arrays? IBM and LSI have both publicly announced that they will do so. Note that Hitachi has also just announced a self-encrypting drive, the Deskstar E7K1000, a drive designed for business critical storage systems.

(e) Where do the drives go when they're wiped for final disposal? Extra shipping is involved to ship a drive to a special data destruction service facility, where it can be degaussed or shredded, and then the drive must be shipped to [be] environmentally disposed of. Alternatively, a drive may be over written, a process that takes hours and hours, using energy and tying up system resources, and then may be re-purposed.

HGST

My name is Masaru Masuda, working on product planning for Hitachi GST. Let me try to answer your question. Like Raj mentioned below, we have already supported bulk encryption feature for 2.5" and 3.5"and will support it to Enterprise product next year. With the bulk encryption feature, user data on the HDD media is automatically and always encrypted by the SoC inside [the] HDD. The security feature has two basic functions. One is active protection of data (encryption with secret key) and secure erase of the drive by deleting the encryption key for repurposing or disposal. As you pointed out, Standardization is a key for security. Therefore, a non profit security organization called TCG (Trusted Computing Group) was formed as described in the page 5 and 6 of the attached package. We have been very actively involved in the activities of TCG and plan to pick up security feature based on TCG standards which will be implemented from next year.The security market is still small but it has been growing steadily due to the data security concern and also as a fast and cheap solution for repurposing of drives in Server applications or disposal of failed drives. Also we have had a recycling process for drives failed in the internal testing and for drives returned from the field.

Thanks to both companies for their responses.

So it seems to me that in the future there will be no excuse for scrapping drives. I think the retirement process for HDDs should form part of the "green measurement" of storage.

Tuesday 22 July 2008

Five Storage Strategies that May Save You Money

Infoworld have a great article here discussing how to save money on storage in these tight times we are experiencing. Here's a summary - with my opinions of course!

Play Hardball with Vendors. Ah if it was only that simple. It may be possible to find another vendor selling hardware marginally cheaper, but are you ready for it? There are few companies who have got their storage deployment to a level where interoperability allows them to take storage from any of a range of vendors. Fewer still who have calculated the cost of migration or the full operating expense for each vendor's product; it isn't all about the hardware cost alone.

Avoid New Purchases By Reclaiming What You Have. So what tools will you use to achieve this? Do you know how and why you are missing storage which can be reclaimed? Storage reclamation is usually an ad-hoc process run when storage gets tight or when admins have the time. Some places may have written scripts to automate the discovery of wastage but it isn't easy. I use my own software tool and can highlight about 10 separate categories but you need to be careful of the law of diminishing returns.

Audit Backup and Replication Configurations To Cut Waste. Right, so throw away some backups. Are you *sure* you can do that? Surely the data owner needs to validate whether that backup copy isn't needed any longer...

Rethink Storage Network Decisions. I like this one. Basically, find a cheaper piece of hardware - which may include DAS! Again, interoperability and migration costs have a big impact here. Any marginal savings may be wiped out by the cost of moving to another platform.

Use a Tiering Methodology That Delivers Results Simply. Finally a point I agree with. Tiering can be implemented easily by taking a common sense approach to moving data to a cheaper layer of disk. This doesn't have to involve a migration project but can be achieved as users request more storage.

All of these options are great, however they fail to attack the underlying issue of rising costs - that fact that more data is being generated each day. Simply asking users to question whether new storage is needed or existing allocations can be re-used is as easy as implementing new technical solutions.

Remember - keep it simple!

Monday 21 July 2008

The Defrag Debate

I was asked again this week whether defragging of hard drives on Windows servers is really necessary. This is quite pertinent as the cost of deploying an enterprise-wide defrag tool can be significant and any opportunity to save money has to be a good one.

I discussed fragmentation last year (here) when looking into a problem which turned out to be the lack of free space consolidation. however I didn't discuss the potential performance impact.

So, reflecting on whether defragmentation is required or not, I'd say that in most cases the benefits are minimal. Here's why...

Firstly, hard drives have on-board cache; the larger and more "enterprise" the drive, then the larger the cache. We are also likely to see more and more hybrid drives on the market, which will have large amounts of fast memory fronting access to the old moveable components. Cache will mask the impact of fragmented files during writes as data will be written to cache and confirmed as written by the drive. The data can then be written to disk asynchronously afterwards. Obviously if cache becomes overrun, then the benefit will be negated.

Second, operating systems have built in file system performance techniques, including lazy writing and file prefetch. These features will try and minimise the latency issues of reading and writing to disk.

Third, if the data being accessed is a database, the database itself will also have lazy writer processes to asynchronously write data to disk.

Now all of the above applies directly to "traditional hard drives". In systems which have RAID controllers with onboard cache, then the issues will be less. Where storage is taken from a SAN with an enterprise or modular array, all reads and writes will occur through cache, giving the best options for performance and masking the underlying hard drive mechanics.

So, can fragmentation actually help? In fact, I have seen one instance when this occurred and it required slightly special circumstances. The file system was made up from a number of concatenated LUNs using a volume manager. As the server had multiple CPUs and the storage was SAN connected, then multiple I/Os could be issued to the volume. With more fragmentation, then the I/Os are spread across all of the LUNs and performance was increased.

Thursday 17 July 2008

Why Tape Technology Just Doesn't Cut It

There have been a raft of tape announcements in the last week, most notably the two 1TB wannabee's IBM and Sun. For around a mere $37,000, plus the cost of a cartridge, I can backup 1TB of my most precious data. HP have also announced plans to extend the life of the DAT/DDS tape drive.

If you are a large enterprise customer then the cost of these drives may be justified (although I struggle to see how, when LTO4 drives can be had for about $5000 a piece) and I'm sure actual versus list price will be much lower.

The thing is, hard drives just continue to outpace tape growth. With 1.5TB drives on the way, and 1TB SATA drives available for less than $200, then disk-to-disk is much more appealing than tape at this rate. Obviously I'm riding roughshod over the issues of disk power consumption and portability but my point is that tape just isn't keeping up the pace in either capacity or throughput.

The whole issue is especially true in the small business area where it is easy to purchase terabytes of primary storage but backup to tape is really time consuming.

Why can't tape produce the equivalent bit density of disk? Is it the more fragile nature of the medium? Clearly tape is more flimsy than a rotating sheet of metal; the T10000 cartridge tape is 6.5 microns thick and the tape itself covers approximately 11.5 square metres, much more than the total surface area of the spinning plates in a hard drive.

I guess we will just have to accept tape capacity will never be good enough. That's just the way it is.

By the way, Sun get a big fat zero in the RSS ratings for not providing their news in RSS feed format....

Wednesday 16 July 2008

Destroying hard drives, what a waste

Many financial and government organisations choose to destroy the hard drives that are declared as failing and removed from their arrays. They use products like this which make the hard drive unusable.

What happens to these hard drives? I presume they just end up in landfill and aren't recycled. Is it beyond the wit of man to find another solution?

First of all, a large number of these drives haven't actually failed. They've been marked as having a potential to fail by the array and before a hard failure occurs, the data is moved off to a hot spare. Naturally it is more efficient to copy parity generated data like RAID-5/6 to another drive than to read all drives in the parity group and rebuild the data.

Second, we are imbedding encryption directly into the drive itself. Can't we simply create a drive where the keys can be wiped in the event that the drive needs to be recycled? This seems to me to be the simplest and most elegant solution.

Incidentally, I checked the Hitachi Global Storage Technologies (HGST) website and could only see some nice words about caring about the environment but nothing specific relating to recycling itself.

Tuesday 15 July 2008

Green IT with HP

Last Wednesday I passed a pleasant evening chatting with a number of people from HP on the subject of "Green IT". I happen to think that "Green IT" is an oxymoron as IT is never going to deliver computing power using 100% recyclable energy and components. However, IT can certainly improve its green credentials from the position it occupies now.

The HP representatives included the EMEA VP for Marketing, one of the Sales Managers in HP's Power and Cooling Solutions division, the EMEA Environmental Strategies and Sustainability Manager and the UK and Ireland head of Innovation and Sustainable Computing. As you can imagine this gave the opportunities for plenty of lively debate.

For me, there were a number of highlights; firstly HP admitted and recognises that almost all organisations are attacking the green issue not for a sense of altruism but because being green has a direct impact to the bottom line, whether that is in reducing costs or in acquiring new business.

Second, there's the degree of how complex and unstructured the whole green debate is. Is the aim to reduce carbon footprint or to recycle precious resources (like metals)? How should all of these initiatives be measured? What's a good or bad measure? I think I need more time to mull it over.

An interesting side issue of the discussions relates to how HP have selected the bloggers with which to interact. This is being achieved in conjunction with external agencies who obviously follow the market. My concern is how to HP will determine who is an influencer and who is simply spouting hot air. There's got to be a scientific (ish) basis for this; perhaps it's readership size, perhaps it is references to their blog, perhaps it's the level of comments. Perhaps it is based on keyword count and/or other semantic scanning.

However it is achieved, companies like HP will need to ensure that the tranche of their marketing spend directed at bloggers is appropriately spent. It will be really interesting to see how this develops.

Monday 14 July 2008

Three Degrees of Bill Gates

I heard on the wireless today that Bill Gates has a LinkedIn Profile (well hasn't everyone). Just out of interest, I thought I'd see how far I am from the man himself and I was surprised to find I was only 3 steps away, despite Bill only having 5 connections. I was impressed. Much better than the Six Degrees of Kevin Bacon.

On a related note, I've set up a "Storage Bloggers" group on LinkedIn for those of us who run Storage Blogs online. Drop me a note/comment if you want to join the group (I won't publish those comments). I'll also be putting up a page to list and link the blogs too.

Is the Oyster Failure a Red Herring?

This weekend there was a "fault" on the UK London Underground Oyster card system which caused chaos for many travellers and rendered a large number of cards inoperable and requiring replacement.

This follows the recent cloning of the Oyster card by a group of "Dutch Boffins" (The Register's term, not mine) who had previously cracked the encryption used on the Amsterdam metro (GVB).

I found this story interesting as I travel regularly using both services. It also piqued my curiousity because of the implications on data encryption. It shows that given enough time and ingenuity, it is possible to hack/crack almost all, if not all encryption methods. This has significant impacts for the use of Cloud Storage as a location to store what I would term "Data at Risk" (not Data in Flight or Data at Rest) and re-inforces the need for organisations with valuable data (governments etc) to store their information in secure locations.

Anyway, getting back to this weekend's failure, the conspiracy theorist in me says that TFL decided to change the encryption method used by Oyster to fix the successful Duch crack. This inevitably rendered a number of cards invalid as the cards were/are fixed in nature and couldn't be changed. TFL decided it was better to take the bad publicity of a number of rejected cards than compromise the entire system, because you can be sure the workaround for getting the card recharged would eventually find itself in the public domain.

What do you think?

Friday 11 July 2008

Seagate Raises the Bar with 1.5TB Drive

Seagate announced yesterday version 11 of their Barracuda hard drive range, to be released next month (August 2008) with a maximum capacity of 1.5TB. The news link has all the speeds and feeds if you're interested in how they have achieved this remarkable milestone.

I've trawled the 'net to plot the release of previous versions of the drive and their cap

acities at the time. Here's a graph of the releases I could find, going back to 2002. Trending the growth (totally un-scientifically of course), then we can expect to see 2TB drives by December 2008, 3TB drives by November 2009 and 4TB drives by June 2010. This may be a little optimistic as the trending is skewed slightly by the recent advances perpendicular recording has brought to capacity growth, but maybe not, as my recent post on Hitachi 5TB drives shows.

Unfortunately, sustained transfer rates for these drives have remained around the 100MB/s mark, so offloading a complete drive sequentially takes around 250 hours, by my calculations. I'd love to know how long a RAID rebuild would take (Seagate if you fancy loaning me some drives, I'll find out for you!).

As previously discussed, the increased capacities are good as they increase the GB/Watt and GB/cm3 density but we're going to be increasingly challenged by how we get data on and off them - especially when the drives fail.

Wednesday 9 July 2008

Distributed Computing Nirvana

Last year I blogged about the concept of storage futures (or options - I'd have to go back and check which) which would allow storage charges to be based on a forward pricing model. The logic for this was to penalise those "customers" who don't bother to take the time and plan their storage demand requirements. By charging more closer to the delivery time of the storage, customers are dis-incentivised to ask for new storage at the last minute.

The evolution of cloud computing has the opportunity to deliver on my original idea. I'm sure I don't have to explain Cloud computing to anyone, but in case you're not aware, it is effectively distributed computing for the Web age. Amazon Web Services is probably the most popular service (in terms of awareness). The Amazon services provide the ability to create a virtual machine, perform database services, manage queues and of course store data in their S3 Simple Storage Service. I'll discuss my experiences on Cloud Storage in more detail in another post.

Many large organisations are facing issues with meeting the power and cooling demands in their datacentres. This is being driven by the increase in computing power and storage density, achieved by the use of blade server technology and virtualisation. Although more computing is being achieved in the same physical space, for some organisations their business model demands more computing to take place in order to gain business advantage. Think of pharmaceutical companies who are using software to model organic chemical interactions rather than perform the experiments in the lab.

I suspect if you investigate the use of computing in a lot of these datacentres, then only a small percentage of the computing power will be dedicated to core business operations. There will be many applications providing anciliary services such as reporting, financials, batch processing, reconciliation, inventory and so on. Many will not be time or location dependent and could easily be removed from the core datacentres for processing elsewhere.

Obviously this change of processing requires a different operating model. A key trend in the industry is to consolidate into a small number of large (and expensive) datacentres, but by operating in this way, companies are artifically constraining their growth into the size of these datacentres and setting a timeline which will require new datacentres to be built before expansion can continue.

So what is the answer? Computing could to move to be location independent except for only those critical components which can't suffer the effects of latency. As an example, take file archiving. If data has been unreferenced for more than a specific time (say 3-6 months) then move it into the storage cloud. The data can be duplicated in multiple locations automatically to provide redundancy. Note that I'm assuming all the issues of security have been investigated, discussed, resolved and implemented.

Immediately redundant data is out of the datacentre and the cost of storage reduced to a service charge, which is likely to be signficantly lower than the cost in the primary location.

Some organisations may decide that the cloud is too unsafe for their data. In this instance this is where a more appropriate datacentre strategy needs to be developed. Rather than having a small number of "megacentres", smaller location critical sites can be built for primary data, with other sites developed in locations offering the cheapest space/power costs. In this way, large organisations could effectively operate their own computing (and storage) cloud.

I think the options for real distributed computing are really exciting. They provide the opportunity to "green" the storage environment over and above the simple task of deploying larger disk drives and bigger storage systems.

Sunday 6 July 2008

Shock - Escargots n'est pas francais!

The BBC has reported that prices in France for snails are set to rocket. I lived and worked in France for a few years and loved tucking into the garlic covered gastropods. But horror of horrors, apparently most snails served in France are now not French! Is nothing sacred?

Friday 4 July 2008

5TB drives

I just read this on The Register. 5TB drives! Can you imagine it! The HDD manufacturers continue to push the envelope even further.

Now I have a concern about drives getting to this size and that's the ability to get data on/off the drive itself. With 73/146/300GB drives, the capacity to response time ratio is still within a tolerance that means adequate random access throughput can be achieved. But with larger drives the number of different concurrent accesses will increase and if response time doesn't decrease then very large HDDs will start to operate like sequential devices.

I think I need an illustration to make my point. Imagine a 73GB drive is receiving 200 random I/Os per second, each with an average 5ms response time. Scale the capacity up to a 5TB drive and that's about 69 times the capacity. The scaled up drive would have to cope with 13800 I/Os a second and provide an average response time of 0.07ms!

Firstly, it is unlikely 5TB drives will be expected to perform like today's 73GB drives but it serves to illustrate that we can't expect to simply consolidate and shrink the number of drives installed into an array. We need something more.

I think we need a more innovative approach to the design of the drive interface. This may simply be shed loads of cache, to improve the overall average response time, or perhaps multiple virtual interfaces per drive or independently mobile read/write heads which don't need to read/write a cylinder at the same time. It could even be drives that dynamically reallocate their data to make read/write quicker (for example, put frequently read/write blocks in the same physical area of the drive).

Who knows what the solution is, but rest assured something needs to happen to make 5TB drives useful devices.

Wednesday 2 July 2008

Monster Mash (Up)

The UK Government is running a competition offering participants up to £20,000 if they can create new uses for existing free sources of government data by combining the data into new and useful information (a mash-up). You can find a link to the data here.

The volume of data is immense. I started trawling the UK census from 2001, burrowing down to the small village I live in. Of the 4700 or so residents, there are three buddhists but disappointingly no Jedi (there's an urban myth that Jedi is put down by so many people who have no specific religious persuasion). Apparently there are no 1-room properties, so no studio flats and no basement flats (no-one lives below ground level). 2.4% of houses have no central heating and only 5.5% of the population are not in good health.

I'm not sure what this indicates other than don't try selling central heating or life insurance policies where I live! But being less facetious, the power in this data will be in deriving new value. I can see two immediate uses/methods.

Firstly, most of this data is useful to people looking for new places to live; schools information, services information, crime levels and so on. So, develop a website and put a postcode in to see how your prospective area rates.

Second, trawl the data and find the ends of the spectrum - the good and bad, best and worst of each metric. For example, which area has the highest population density? Which has the worst crime? This information could be great for business planning; don't set up a locksmiths in an area with the lowest crime rate and so on.

Of course the hardest part will not be to correlate different data sources but in bringing together a consistent view of the information. Some data is accessed via APIs, some by XML, some in Excel format. What "common" point of reference can be used? Postcode? Address?

Whatever, the availability of more and more data content will be absolutely invaluable, but for me, I'd like to see more real time information to be mashed up. For instance, at the airport a live XML feed of flight arrivals/departures that I can read in the taxi when I'm running late (I don't want to have to log onto their website); same for the train; a feed showing my nearest tube station or restaurant as I travel; a feed of the waiting time at all the Disnet rides, so I can pick the shortest queue without having to find a status board; I'm sure there are many many more.

Rich data is a wonderful thing; bring it on!

Wednesday 30 July 2008

Tuesday 29 July 2008

Monday 28 July 2008

Friday 25 July 2008

Thursday 24 July 2008

Wednesday 23 July 2008

Tuesday 22 July 2008

Monday 21 July 2008

Thursday 17 July 2008

Wednesday 16 July 2008

Tuesday 15 July 2008

Monday 14 July 2008

Friday 11 July 2008

Wednesday 9 July 2008

Sunday 6 July 2008

Friday 4 July 2008

Wednesday 2 July 2008

My Personal Profile

My Company

Subscribe To

What Am I Doing?

Blog Archive

FEEDJIT Live Page Popularity

FEEDJIT Live Traffic Map

FEEDJIT Live Traffic Feed