We recently had the privilege of attending Vivek Kundra’s talk when he spoke at the University of Washington. We found his assessment of federal government information technology issues to be refreshingly frank. After his talk we had a chance to bring up the issue of data quality in historical datasets. His response was encouragingly realistic.
For those who don’t know, Vivek Kundra is Chief Information Officer of the Obama Administration and is the driving force behind the Data.gov initiative described in a previous post: Reinventing the Wheel at Data.gov. In that post we complained that Data.gov was simply another new website providing links to agency data without any discipline specific evaluation of the data that was being served up. This is a complaint we still level against that initiative but Kundra’s recent talk at the UW gave us a sense of what he is up against with a very limited set of resources.
Other’s have captured ideas from his talk in some detail:
- Notes from Federal CIO Vivek Kundra talk at UW
- Vivek Kundra Outlines Ambitious Government Plans for I.T.
From our perspective, the take home messages were the following:
- Identifying and getting feedback from real ‘customers’, not just project managers, is key to developing useful tools.
- The government needs to pull the plug on IT projects that are going nowhere.
- Making more data open and available to the public with channels for feedback is the only realistic way to improve data quality.
If you want to get a sense of the man and his ideas you can watch a video of his talk. Our question and his response on the topic of data quality in historical datasets begins at 1:00:03.
We were pleased that his response acknowledged the sorry state of some historical data. Refreshingly, his solution is not to set up an expensive government program to ‘fix’ the bad data — we’ve been down that road before. Instead, he believes the only hope is to daylight government data sets and crowdsource the solution.
Now this is a solution that we can get behind!
- make public versions of data available as soon as possible
- open a channel for the public, the ultimate ‘customers’, to provide feedback on data quality and usability issues
- have an individual who’s main job is to improve the quality, completeness and usability of datasets generated by their department
We know that many agencies are used to going their own way with respect to managing their data. And we believe that the needs of research require special tools and systems. However, we also believe that there is no excuse for delaying access to public versions of agency data which should be made as early as possible. And these public versions need not require extensive software engineering — don’t let perfect be the enemy of good. For many, many government datasets, a simple CSV version of the data would go a long way toward making the data more useful.
(You might be surprised to find out how many complicated SQL schema’s ultimately boil down to a single row vs. column table. We’ll save that as the topic for a future post.)
Best Hopes for common sense data management.