Wednesday 2 July 2008

Monster Mash (Up)

The UK Government is running a competition offering participants up to £20,000 if they can create new uses for existing free sources of government data by combining the data into new and useful information (a mash-up). You can find a link to the data here.

The volume of data is immense. I started trawling the UK census from 2001, burrowing down to the small village I live in. Of the 4700 or so residents, there are three buddhists but disappointingly no Jedi (there's an urban myth that Jedi is put down by so many people who have no specific religious persuasion). Apparently there are no 1-room properties, so no studio flats and no basement flats (no-one lives below ground level). 2.4% of houses have no central heating and only 5.5% of the population are not in good health.

I'm not sure what this indicates other than don't try selling central heating or life insurance policies where I live! But being less facetious, the power in this data will be in deriving new value. I can see two immediate uses/methods.

Firstly, most of this data is useful to people looking for new places to live; schools information, services information, crime levels and so on. So, develop a website and put a postcode in to see how your prospective area rates.

Second, trawl the data and find the ends of the spectrum - the good and bad, best and worst of each metric. For example, which area has the highest population density? Which has the worst crime? This information could be great for business planning; don't set up a locksmiths in an area with the lowest crime rate and so on.

Of course the hardest part will not be to correlate different data sources but in bringing together a consistent view of the information. Some data is accessed via APIs, some by XML, some in Excel format. What "common" point of reference can be used? Postcode? Address?

Whatever, the availability of more and more data content will be absolutely invaluable, but for me, I'd like to see more real time information to be mashed up. For instance, at the airport a live XML feed of flight arrivals/departures that I can read in the taxi when I'm running late (I don't want to have to log onto their website); same for the train; a feed showing my nearest tube station or restaurant as I travel; a feed of the waiting time at all the Disnet rides, so I can pick the shortest queue without having to find a status board; I'm sure there are many many more.

Rich data is a wonderful thing; bring it on!

No comments: