Data Visualization and Open Data

COUNT ALL THE THINGS!Last November, I had the great idea to start a blog devoted to excellence in data visualization.  “DataViz” is an increasingly important field as the amount of raw data to which we are exposed and that we generate daily grows and grows.   From searches on Google to trends on Twitter to likes on Facebook, more and more of us become increasingly involved in the generation and collateral consumption of large sets of data.  We are consciously aware of only superficial manifestations of those activities, such as the accuracy of our search results, or the popularity of certain topics of interest, but there is a gaping black-hole of awareness between what we as social networkers contribute to these systems and what we harvest from them.  DataViz is one of the tools available to us to try and illuminate that mystery for ourselves.  It allows us to wrap information in (what is typically hoped to be) an aesthetically appealing presentation of data to derive meaning or to present an position on the basis of empirical data.  To put it simply, it provides a picture of reality based on objective sampling to tell some kind of a story.  Hence, the name of this blog – numbers made to be pretty, or “prettynumbers“.

One of the reasons that it has taken me so long to get this blog going is that I have become professionally involved in a pretty significant data-sharing initiative.  The project deals with Open Data or the presentation of publicly available sets of data for public consumption.  In principle, it is the release into the public domain, to do with what it will, the statistics and measurements that organizations track and use to make informed decisions about how to proceed in policy.  This data is managed or owned by governments, corporations, not-for-profit groups, schools, individuals – anyone at all with a collection of numbers.  The reason that Open Data is so appealing is that if we can agree on a standard format for expressing sets of data, then we can also develop really useful tools to visualize any of those sets of data creatively and usefully, so as to make them comprehensible, appealing and perhaps most importantly relevant for consumption.

In my mind, it’s the difference between taking all of your clothes from your dressers and closets and throwing them on the floor in a huge messy pile and saying that is your collective “wardrobe and personal sense of style”, versus choosing the demonstrative outfits that best describe your wardrobe and style.  Or better yet, allowing your friends or complete strangers to come in to your room and rifle through your clothes themselves and allowing them to draw their own conclusions!  Undeniably , there is an opportunity for bias to be applied in the process in either case, but it makes the overall task of assessing the value of the data far more manageable (and in this example, far more entertaining).  But we’ll deal with the bias issue more in subsequent posts.

Hands down, one of the most exciting partners in the movement to expand Open Data is a company called Socrata.  They have created an unbelievable set of easy-to-use tools designed to simplify the conversion of raw data into useful web applications that make sense to humans, rather than spreadsheet programs, and it has had tremendous uptake.  One of my favourite implementations of the Socrata toolkit belongs to my home town’s government website, the City of Edmonton.  data.edmonton.ca offers over a hundred sundry data sets all coupled with useful (to varying degrees) visualizations that encourage its citizens to explore the data that has been captured, rather than relying on the account of news agencies or even the government itself.  Giving people the purest, most raw form of data available as well as the tools to explore and interact with that data is the best way of removing bias from understanding reality that I can imagine, short of going out to a field and observing all of the phenomena for oneself.  It is at once empowering, democratizing, manifesting real operational transparency, and maximizing opportunities for discourse in a way with which sitting in a crowded bar or pub and exchanging misinformed opinions can’t even begin to compete.

Open Data as a concept has been around forever – since the first sentient being looked under a rock.  However the technology to make all of a government’s spending patterns available to every citizen is incredibly new.  My smartphone has thousands of times the computer processing power that yesteryear’s supercomputers had, meaning that as easily as I can update my Facebook status, I can explore the population movements in my country over the past four decades.  So long as that data is available.  Open Data solves that last problem.

I can’t get excited enough about the possibilities of this technology! With all of the conceivable opportunities to misinform and misdirect public opinion in today’s mass media channels, Open Data stands as a force for unequivocal good in the search for truth in an increasingly complicated and confusing age.  I hope to share more of my experiences, insights and examples with you over time.  In the meantime, check out the thousands and thousands of examples available on Socrata sites like https://opendata.socrata.com/ or https://nycopendata.socrata.com/ to get a sense for the breadth of this cool new approach to sharing information.