Extroverted Data: Beyond the Basics of Open Data
Here at DevResults, we believe in transparency and openness. It's in our company culture code. It's why we support things like the International Aid Transparency Initiative (IATI), both philosophically and technically. And as part of the international development community, we just think transparency and openness make sense.
That's why we've been following the development and implementation of USAID's Open Data Policy, aka ADS 579, aka the Development Data Library (DDL) with interest. There has been considerable consternation among implementing partners ever since the policy went into effect in October 2014, and as a result, a lot has been written about how to interpret and apply the policy.
We’ve tried to make it easy on our clients to comply with open data requirements by making it simple to download and export both aggregate indicators and the datasets from which they are derived. Whether it's pivoting data to generate a report, dumping data from DevResults to another tool, or generating an IATI file, we try to make your data as convenient to extract as we possibly can. Because, hey—it's yours.
Well...kinda. USAID's Open Data Policy is premised on the fact that while the implementing partner technically owns the data, "the Federal government has the right to obtain and use the data" because it was paid for with public money. And most fellow do-gooders will agree that the citizens of the world have a claim to the data if it is useful for solving development challenges.
But there is a big difference between publishing open data and contributing to the open data ecosystem. The former requires a single click; the latter is much more of a process and a mindset. That's why most of the requirements detailed in ADS 579 focus on documentation: metadata, codebooks, data dictionaries, explanations, notes, methodologies, etc. It's one thing to know what's going on in your own dataset, and quite another for someone outside your organization to figure out for themselves what's going on. And yet there's certainly diminishing marginal returns from perfecting your support documentation.
So how do we realize the tremendous benefits of enabling other people to use our data without killing ourselves over the documentation process? We have two suggestions:
First, we should make data as well-structured and standardized as it possibly can be. If you've worked in M&E for more than 5 minutes, you're probably aware that while standard indicators and even universal indicators exist, the ways that we collect, store, and manage that data vary drastically. That creates a lot of problems for data users who are effectively looking over your shoulder and trying to figure out what the heck you were thinking.
Luckily, organizations have much more support in structuring their data than they did even a few years ago. The IATI results standard, though complex, gives us a much-needed common language for talking about indicator data. M&E tools and consultants are also becoming much more rigorous about enforcing guidelines and good data practices upfront that result in more intelligible data coming out the other end. And while ADS 579 only requires that partners publish underlying datasets—not the monitoring data itself—enforcing good data validation, tidy tables, and minimizing our reliance on inelegant solutions like Excel-masquerading-as-a-database still apply.
That's why we built DevResults in such a way that it requires—nay, mandates!—that each and every data point be well defined: linked to a results framework, disaggregated consistently, tied to a specific time and place, associated with a particular activity and partner organization. It's a lot more work upfront—especially if you're accustomed to a more lax framework—but it makes things like storing, managing, exporting, and especially reporting a lot easier down the road.
Second, we should make data more social. Most of the time, when people close their eyes and think about data (to fall asleep I guess?), they see dashboards and spreadsheets and SQL statements, the trappings of nerds toiling away in their cubicles. Why not cozy conversations at the coffee shop?
Personally, I find that I learn much more about the quirks of a dataset after talking it over with a friend or colleague who is more familiar than I. Sometimes ringing somebody up is the best way to understand the provenance of some third party data. It's partly why most open data reporting platforms—IATI, USAID's DDL, etc.—have a required contact listed, so you can get the context you need to start working with some really valuable information that you didn't collect yourself.
At MERL Tech in DC two weeks ago, we heard how several organizations have leveraged social settings rather than trainings to popularize data use in their organizations. I have to admit that I initially chuckled at the idea of "data parties" (credit to IREX) and "building a data culture" (credit to DAI), but more and more I'm convinced that social regimens like these are the cure for "not my job" disorder and "not a data person" syndrome.
But how does more structure and more talk lead to better open data? Simply put, it forces us to broaden our perspective. If we really believe that data is a global public good—in the same way that civic space or clean air are public goods—then we have to ensure that our data can engage others in an ongoing, free-flowing, asynchronous conversation. In short, we have to make our data more extroverted.