Proposal: Towards an IATI Indicator Definition Schema

Aid transparency is not an end in itself: It is a means towards an end. Transparency leads to greater accountability, and accountability leads to greater effectiveness. The donors and others who are committing to publish data are, in effect, agreeing to be held accountable: Accountable to each other, and ultimately accountable to the citizens in the recipient countries and to the taxpayers in donor countries.

The International Aid Transparency Initiative (IATI) has, understandably, started out by "following the money." This makes sense: For one thing, money is easy. Everyone understands it. Every donor, NGO, and government has financial systems in place. A dollar is a dollar, and every cent is tracked as it flows from each level to the next. If it turns out that the money trail goes cold, or the numbers don't add up, then citizens and funders are in a much better position to demand explanations.

But money is only part of the equation. You can spend millions of dollars and account for every penny, but still not have anything to show for it. Aid money is intended to do something. So in addition to asking where the money went, citizens and funders need to know what that money bought.

The Accra Agenda for Action, which resulted in the creation of IATI, explicitly calls for accountability for results, not just a description of who transferred money to whom. Unfortunately, so far very little results data has been published in the IATI format.

Bill Anderson, the technical lead for IATI, explores some of the reasons why: In many cases the data doesn't exist, can't be trusted, or isn't easily connected to actual impact. But even the most competent organization with the best intentions faces the problem that don't have standard ways of defining what we're measuring. Anderson points out:

With few exceptions (such as the WHO’s Global Health Observatory) there are no international standards governing machine-readable indicators. Most providers of development cooperation make up their own indicators. Those using common indicators often rely on free-text descriptions, which are then difficult to compare.

A standard schema for indicator definitions

The solution to the problem Anderson describes is for IATI to define a simple indicator definition schema, as I proposed in a previous post.

To be clear, this is not about getting everyone in the world to agree on which indicators to use. It's about creating a common language for documenting some simple facts about an indicator. If we're counting something, what is the unit of measure? If we're working with a percentage or a ratio, what do the numerator and denominator represent?

Why do we need machine-readable indicator definitions?

Many organizations have already created online repositories that make their indicator definitions public. These are just some of the ones I know of:

So why not just support a reference to these indicators and call it a day?

The argument for a data standard for indicator definitions is the same as the argument for development data standards in general. A common, machine-readable indicator format would help by:

1 Making existing definitions more accessible

Like much development data, many indicator definitions are locked up in PDFs, Word documents, or poorly conceived online databases. The reason why the IATI standard exists is that data, even if openly published online, is much less useful without common standards.

2 Encouraging sharing

A big part of the problem is that new indicators are being generated all the time, even when they're redundant. Often a new programs decides what it wants to accomplish, and then often defines its own idiosyncratic indicators, even when there are perfectly good indicators out there.

Imagine if, instead, a development project could pick and choose from a comprehensive global library that includes all the indicators being used by any development organization anywhere in the world: Take these five from the World Bank, these three from the WHO registry, and these two custom indicators from a small project in another country. These could be automatically added to the program's performance monitoring software, which would then report their data in IATI format. And that data would be comparable across different reporting organizations.

3 Making it easier to draw connections

A universal registry of indicators would also open up the possibility of explicitly drawing connections between indicators:

Two indicators might be effectively identical but described using different words.
One indicator might be a subset or a superset of another. For example, an indicator that counts the number of journalists trained is a subset of an indicator that counts the total number of individuals trained.
Indicators that are percentages or ratios require a numerator and a denominator, which are often well-defined indicators themselves.

Does an indicator schema belong in the IATI standard?

It's legitimate for IATI to worry about "mission creep" — or, in software terms, "feature bloat." The IATI data standard already might seem intimidating at first glance, and added complexity comes at a cost: More moving parts, more tests, more discussions, more work for the technical support team.

Still, I thing the benefits outweigh the costs. This not only belongs in the IATI standard, but IATI is the only conceivable home for it. Here's why:

1 Because IATI is committed to publishing results

Creating standards for development data is at the very core of IATI's mission. If IATI-formatted reports are going to include indicator data, we need a way of knowing what those numbers represent. That definition needs a structure, and in the absence of an existing standard, it's up to IATI to define that what that structure is.

2 Because IATI is the natural home for this standard

If IATI doesn't create a standard schema for indicator definitions, someone else would need to do it. Perhaps a for-profit company like DevResults would take the initiative; perhaps a donor agency like USAID, or a consortium of NGOs or contractors.

But coming up with a standard that has universal legitimacy requires a lot of infrastructure, both human and technical. You need a platform for stakeholders to discuss proposals and changes. You need public documentation and a test suite. You need to support users and provide resources for advocates. Most importantly, you need an engaged community.

It would be enormously wasteful for another organization to try to duplicate everything that IATI has created, let alone the goodwill and engagement it has built, just to create a relatively simple standard that has so much overlap with the existing IATI standards.

An outline of a proposed standard

So what might an indicator definition schema look like?

The minimum viable schema

At a bare minimum, an indicator description needs to include:

A unique code (within a given organization or registry)
A name
A narrative definition
A reference to a unit of measure

So we'd have:

<iati-indicator ref="1.3-a" >  
    <title>
        <narrative># hours of training delivered</narrative>
    </title>
    <definition>
        <narrative>
            The total number of person-hours of training delivered 
            during the reporting period.
        </narrative>
    </definition>
    <unit vocabulary="123" ref="1.2">
        <title singular="hour" plural="hours" />
    </unit>
</iati-indicator>

The <unit> element is ideally a reference to a known codelist. However, development programs count a lot of different things that are unlikely to be found in existing vocabularies: fisheries, condoms, latrines, and so on. So the schema needs to accommodate program-specific units.

Rounding out the schema

Beyond the bare necessities, the schema should also include:

Geographical reporting level (e.g. national, first-level administrative division, locality; using the existing Activity Scope codelist).
Indicator type and aggregation status (currently attributes of the iati-activity/result element), and indicator measure and direction (currently attributes of the iati-activity/result/indicator element). These become attributes of the <iati-indicator> element.

<iati-indicator ref="2.1" type="4" reporting-level="4" measure="2" ascending="false" >

References to identical indicators in parent frameworks or other indicator repositories. This uses the same <reference> element described here, with a reference to an indicator ID or code within a known vocabulary.

<reference vocabulary="3" code="SP.DYN.IMRT.IN" />

References to technical sectors (using the existing Sector Vocabulary codelist).

<sector vocabulary="2" code="111" />

Definitions of the numerator and denominator. These can consist of a text description and/or a reference to another indicator, within this same vocabulary or external to it.

<numerator>  
    <reference vocabulary="500" code="1.2.3"/>
    <title>
        <narrative># infants dying before reaching one year of age</narrative>
    </title>
</numerator>

Preferred display format pattern (percentage, decimal precision, rate per X, etc). To define these, the standard might adopt a commonly used number formatting syntax, such as the one Microsoft Excel uses. Alternatively, the standard could offer more flexibility by allowing the document to indicate which syntax is used by choosing from a new Numeric Display Syntax codelist.

<display-format syntax="1" pattern="n0" rate-per="1000" />

Disaggregation

Disaggregation is one of the trickiest parts of the development data story. The disaggregation dimensions that we see in use are many and varied. There are some standards out there for demographic dimensions, but here again it will be important for any given indicator repository to define the attributes it uses for disaggregation.

The dimensions used by an indicator repository would be defined once in the document. At the simplest, this would look like this:

<dimensions>  
    <dimension name="Sex">
        <value name="Male">
        <value name="Female">
    </dimension>
    <dimension name="Training Topic">
        <value name="Advocacy" />
        <value name="Financial Management" />
       ...
    </dimension>
</dimensions>

Dimensions and their values can also support synonyms in the same language and in other languages, as well as external or external reference codes:

<dimensions>  
    <dimension name="Sex" vocabulary="1" ref="code#sex" >
        <synonym name="Gender" />
        <synonym xml:lang="fr" name="Sexe" />
        <value name="Male" vocabulary="1" ref="code#sex-M" >
            <synonym name="M" />
            <synonym xml:lang="fr" name="Masculin" />
        </value>
        <value name="Female" vocabulary="1" ref="code#sex-F">
            <synonym name="F" />
            <synonym xml:lang="fr" name="Femenin" />
        </value>
    </dimension>
    <dimension name="Training Topic" vocabulary="999" ref="T-1" >
        <value name="Advocacy" ref="T.1-a"  />
        <value name="Financial Management" ref="T.1-b" />
        <value name="Gender Issues &amp; Child Protection" ref="T.1-c" />
        <value name="Leadership &amp; Organizational Development" ref="T.1-d" />
        <value name="Monitoring &amp; Evaluation" ref="T.1-e" />
        <value name="Project Design" ref="T.1-f" />
    </dimension>
</dimensions>

Each indicator would then reference the dimensions it needs by name:

<iati-indicator ref="1.3-a" >  
    <title>
        <narrative># hours of training delivered</narrative>
    </title>
    ...
    <dimension name="Sex"/>
    <dimension name="Training Topic"/>
</iati-indicator>

Narrative

Nearly all indicator definitions I've seen include at least a few additional paragraph-length narrative fields. For example, DevResults includes space for the following (roughly based on the narrative fields included in USAID's Performance Indicator Reference Sheets).

Justification: How is this indicator relevant to the program's objectives?
Collection method: How will the primary data be collected?
Sources: Who will provide the data to us?
Acquisition frequency: When and how often will the data be collected?
Data quality: What are the known limitations and significance of this data? What data quality assessments have been conducted?
Analysis: How will the data be analyzed, and by who?
Review: How will the data be reviewed, and by who?
Reporting: How and in what contexts will the data be reported?

Other indicator repositories might have other sector-specific narrative fields, such as "Epidemiological Significance." Rather than bake these fields into the schema, I'd suggest using a single <description> element and using a codelist to designate the type of narrative (similar to the DescriptionType codelist used for narrative descriptions in the Activity schema).

<description type="1">  
    <narrative>
        This indicator is a measure of the total volume of training delivered and 
        will serve as a reference point for analysing trainee effectiveness outcomes 
        in the future.
    </narrative>
</description>  
<description type="2">  
    <narrative>
        Data will be drawn directly from training logs collected at training site.
    </narrative>
</description>  
<description type="3">  
    etc.
</description>

Putting it all together: The indicator repository

The root element of a list of indicators is the <iati-indicator-vocabulary> element, which contains one or more indicators as well as a list of dimensions referenced.

<iati-indicator-vocabulary  
        generated-datetime="2015-03-05T07:15:37Z" 
        version="2.01" 
        ref="AA-AAA-12345" 
        url="http://global-happiness.org/indicators">
    <iati-indicator>...</iati-indicator>
    <iati-indicator>...</iati-indicator>
    <iati-indicator>...</iati-indicator>
    <dimensions>
        <dimension>...</dimension>
        <dimension>...</dimension>
        <dimension>...</dimension>
    </dimensions>
</iati-indicator-vocabulary>

Sample XML

Here is a mostly worked-out sample of what an indicator vocabulary document might look like (download here).

<iati-indicator-vocabulary generated-datetime="2015-03-05T07:15:37Z" version="2.01" ref="AA-AAA-12345" url="http://global-happiness.org/indicators">

    <maintaining-org ref="AA-AAA-123456789">
        <title>
            <narrative>Global Happiness Organization</narrative>
        </title>
    </maintaining-org>

    <iati-indicator ref="1.3-a" >
        <title>
            <narrative># hours of training delivered</narrative>
        </title>
        <unit vocabulary="123" ref="1.2">
            <title singular="hour" plural="hours" />
            <title xml:lang="fr" singular="heure" plural="heures" />
        </unit>
        <definition>
            <narrative>The total number of person-hours of training delivered during the reporting period.</narrative>
            <narrative xml:lang="fr">Le nombre total d'heures-personnes de formation livrés au cours de la période considérée.></narrative>
        </definition>
        <!-- Sectors -->
        <sector vocabulary="2" code="111" />
        <sector vocabulary="2" code="112" />
        <!-- Disaggregation -->
        <dimension name="Sex"/>
        <dimension name="Training Topic"/>
        <!-- Description fields -->
        <description type="1">
            <narrative>This indicator is a measure of the total volume of training delivered and will serve as a reference point for analysing trainee effectiveness outcomes in the future.</narrative>
        </description>
        <description type="2">
            <narrative>Data will be drawn directly from training logs collected at training site.</narrative>
        </description>
        <description type="3">
            etc.
        </description>
    </iati-indicator>

    <iati-indicator ref="2.1" type="4" reporting-level="4" measure="2" ascending="false" >
        <reference vocabulary="3" code="SP.DYN.IMRT.IN" />
        <title>
            <narrative>Infant mortality rate</narrative>
        </title>
        <definition>
            <narrative>The number of infants dying before reaching one year of age, per 1,000 live births in a given year.</narrative>
        </definition>
        <numerator>
            <reference vocabulary="500" code="1.2.3"/>
            <title>
                <narrative># infants dying before reaching one year of age</narrative>
            </title>
        </numerator>
        <denominator>
            <reference vocabulary="500" code="2.2"/>
            <title>
                <narrative># live births</narrative>
            </title>
            <definition>
                <narrative>
                    Live birth refers to the complete expulsion or extraction from its mother of a product of conception, irrespective of the duration of the pregnancy, which, after such separation, breathes or shows any other evidence of life. 
                </narrative>
            </definition>
        </denominator>
        <display-format syntax="1" pattern="n0" rate-per="1000" />
    </iati-indicator>

    <iati-indicator ref="3.0">
        <reference vocabulary="3" code="SP.DYN.CONU.ZS" />
    </iati-indicator>

    <dimensions>
        <dimension name="Sex" vocabulary="1" ref="code#sex" >
            <synonym name="Gender" />
            <synonym xml:lang="fr" name="Sexe" />
            <value name="Male" vocabulary="1" ref="code#sex-M" >
                <synonym name="M" />
                <synonym xml:lang="fr" name="Masculin" />
            </value>
            <value name="Female" vocabulary="1" ref="code#sex-F">
                <synonym name="F" />
                <synonym xml:lang="fr" name="Femenin" />
            </value>
        </dimension>
        <dimension name="Training Topic" vocabulary="999" ref="T-1" >
            <value name="Advocacy" ref="T.1-a"  />
            <value name="Financial Management" ref="T.1-b" />
            <value name="Gender Issues &amp; Child Protection" ref="T.1-c" />
            <value name="Leadership &amp; Organizational Development" ref="T.1-d" />
            <value name="Monitoring &amp; Evaluation" ref="T.1-e" />
            <value name="Project Design" ref="T.1-f" />
        </dimension>
    </dimensions>

</iati-indicator-vocabulary>

References to existing codelists

The indicator schema would reuse the following existing IATI codelists:

iati-indicator/type: Result Type
iati-indicator/measure: Indicator Measure
iati-indicator/reporting-level: Activity Scope

New codelists needed

The schema would require four new codelists. The first three are lists of existing standards or vocabularies. The last one serves to allow some flexibility with the kinds of narrative fields that might be needed.

Numeric Display Format Syntax

Common syntaxes for number formatting patterns.

iati-indicator-vocabulary/iati-indicator/display-format/@syntax

Unit Vocabulary

Known sources of standard units of measurement.

iati-indicator-vocabulary/iati-indicator/unit/@vocabulary

Dimension Vocabulary

Known sources of standard disaggregation dimensions, such as demographic characteristics.

iati-indicator-vocabulary/dimension/@vocabulary

SDMX-RDF
etc.

Indicator Description Type

Types of unstructured narrative fields for describing an indicator's rationale and usage.

iati-indicators/iati-indicator/description/@type

Utility (The purpose of collecting data for this indicator, how the data might be used)
Data Collection (How the primary data will be collected)
Data Sources (Who will provide the data)
Data Acquisition Frequency (When and how the data will be collected)
Limitations (Known limitations and significance (if any) of this data)
Quality (Details regarding quality assessments that have been or will be conducted)
Analysis (How the data will be analysed, and by who)
Review (How the data will be reviewed, and by who)
Reporting (How and in what contexts the data will be reported)
etc.

I hope this proposal is useful as a starting-off point for conversation. Please let me know what you think.