Grading Agencies' High-Value Data Sets

By Jim Harper

I wrote here a few weeks ago about the “high-value data sets” — three per agency — that the federal government would soon be releasing at Data.gov. They were released on January 22nd, and we’ve been poring over them ever since. More on that below.

Tomorrow, agencies are supposed to have their “open government” sites put up — sites where they make their data feeds available and easily findable for the public. There are a couple of different sites monitoring when those sites are going up.

Data, data, data — that means more direct oversight of the government by more people. We talked about all this at our December 2008 policy forum, Just Give Us the Data!

When I wrote recently about the release of agencies’ high-value data sets, though, I worried:

Rather than substantive insight into government management, deliberations, and results, we might get a lot of data-oriented play-toys… [P]ublic choice economics predicts that the agencies will choose the data feeds with the greatest likelihood of increasing their discretionary budgets or the least likelihood of shrinking them.

So I decided to grade them:

To help focus agencies on releasing the data that is high-value for genuine government transparency, I plan to examine the three data-streams each agency releases and grade the agencies on whether their releases provide insight into agency management, deliberations, or results.

With the help of Cato interns Solomon Stein and Sasha Davydenko, I assigned three points to each feed that had to do with management, deliberation, or results. The resulting numerical scores — 9, 6, 3, or 0 — translate into grades: A, B, C, or D respectively. F was reserved for agencies that didn’t produce feeds.

The results follow these few comments:

There’s no science to determining what is management, deliberations, or results. We made some judgement calls. But on the whole we feel pretty good about getting it right.
We graded the data sets that were posted on data.gov by the deadline. Several agencies have added data sets late. They get no credit. We’re not the “nice” teacher. We’re the mean one that you learn from.
Some agencies produced more than three of what they think are “high-value data sets.” If any three were actually about management, deliberations, or results, we gave them full credit.
Almost uniformly, the agencies came up with interesting data — but “interesting” is in the eye of the beholder. And interesting data collected by an agency doesn’t necessarily give the insight into government we were looking for. It’s data about the agency that matters.
Time and again, we found that “money is management.” If the data has to do with in-flow and out-flow of funds, chances are good that it’s high-value data. Hey, there’s nothing wrong with paying attention to the money. Accountants need love too.
The documentation of data sets probably had an effect on the grading. There were some data sets that were almost perfectly inscrutable. Agencies, imagine that you’re giving the data set to your Aunt Zelda. Have you described what it is well enough for her to understand?
Our grading did not take into account the quality of the data or its true utility for the purposes to which it might be put. That’s for people digging deeper to find out. We’re just going on what agencies said the data was about and how well that reveals their inner workings: their management, deliberations, and results. That’s our idea of high-value.

Without further ado, the results!

Department of Agriculture — D

The Ag Department produced data feeds about the race, ethnicity, and gender of farm operators; feed grains, “foreign coarse grains,” hay, and related items; and the nutrients in over 7,500 food items. That’s plenty to chew on, but none of it fits our definition of high-value.

Department of Commerce — C+

Commerce produced four feeds that it calls “high-value,” but we could only fully agree on one: “Patent Grant Maintenance Fee Events.” That’s a record of money coming in for patents granted from September 1, 1981 to the present. And money is management.

Another — applications received by a couple of broadband programs (assumedly for money) — is close enough to management that we plussed up Commerce’s grade. But the other two — collaboratively collected precipitation data, and information about billions of dollars worth of research stretching back to 1964 — might be high-precipitation or high-dollar, but it’s not what we think of as high-value.

Department of Defense — D

It was a good news-bad news story from the Department of Defense. On the good side, it produced an impressive six data-sets! But the bad overwhelms the good: not a one was high-value in our estimation. They were each just collections of survey data about absentee voting and election administration overseas.

DoD! Drop and give us 20 push-ups! Then go find us some better data feeds.

Department of Education — D

Ed didn’t make the grade. Its three feeds — really two feeds and a metadata file — contain 2007 data about fourth- and eighth-graders’ educational achievement, which doesn’t teach us anything at all about the agencies’ management, deliberations, or results.

Department of Energy — D

One might expect to be invigorated by data feeds from the Energy Department, but these feeds provide shockingly little insight into the agency’s operations. All six of its alleged “high-value” data feeds are bibliographic data about research studies and reports from scientific conferences. This information may give researchers somewhere a jolt, but it’s an information black-out on data about the agency itself.

Department of Health and Human Services — C

By our reckoning, HHS needed a bit of a helping hand to make its “C” grade. One feed wasn’t what the doctor ordered: a directory of all animal drug products that have been listed electronically since June 1, 2009. That data would tranquilize an elephant! (It may be helpful to Dr. Doolittle, of course, just not the government transparency effort.)

Edging close to high-value, but not enough for a full grade, was data on claims filed with HHS’s Office of Minority Health (OMH). More data and better documentation of the data might have won this feed a full letter.

But along with the OMH claims data, what breathed life into HHS’s data effort was the “Part B National Summary Data File.” For all of Medicare Part B, the data describes allowed services, allowed charges, and payment amounts. That’s important enough, and clear enough, that we classified it as “management” and scored one for HHS.

Department of Homeland Security — B

We agreed that two of DHS’s three data sets were high value. Under the simple rule that money is management, we credited the data about supplementary FEMA grants for public structure repair and the data about FEMA grants for mitigation of future disasters because they included dollar figures. The list of all federally declared disasters is no doubt interesting, but not high-value for our purposes — a transparent agency.

Department of Housing and Urban Development — D

HUD was another agency producing interesting and relevant data — but not high-value for transparency purposes. Its data feeds include information about housing authorities and properties, as well as data about lawsuits, but not information about the management, deliberations, or results of the agency itself.

Department of Justice — D

Interesting data is not necessarily high-value, and the Department of Justice also falls into that trap. The data it provides about prison populations and prison employment might be good for researchers, but it’s not so helpful to the people who want to understand how the Justice Department does its work.

Department of Labor — A

The Department of Labor really got to work and put out some helpful data! DoL’s “Research and Evaluation Inventory” includes costs for various projects along with their current status. The Workforce Investment Act Net Impact Evaluation Dataset shows how programs created under this law affected employment — results! And the Project GATE (Growing America Through Entrepreneurship) Final Evaluation Dataset does the same.

This is great work by the Department of Labor to produce some truly revealing information.

Department of State — D

The State Department may have come up with some of the least helpful information of all agencies. Data about attendance at international exchange and training programs in East Asia and Eurasia is only slightly more helpful at exposing the workings of the agency than the dreary “Bibliographical Metadata of the Foreign Relations of the United States Series.” That’s right — a list of books. (Alas, we’re being unkind. Someone loves these books, so we do too. But you’ll forgive us if we don’t read them just now, or look at the data about them ever again.)

Department of Interior — D

Interior has mastered the “interesting, not high-value” category. Its four data-releases reveal: counts of wild horses and burros in various areas; a list of government-designated recreation areas; data about wildland fires and acres burned from 1960 through 2008 (updated annually); and a list of ways to work for the government for free. Interesting … but no thanks.

Department of the Treasury — C

Tax Year 2007 County Income Data is not high-value. The latest quarterly report on bank derivatives activities is not high-value either.

But the Treasury Department’s purchases, trades, or other dispositions of troubled assets in the TARP program (Targeted Investment Program) — that’s high-value! With one true high-value data set, Treasury earns a “C.”

Department of Transportation — D

Steering the public away from its own functions, the Transportation Department released feeds about tire safety, vehicle safety ratings, and child safety seats. This is good data to have, of course, but it doesn’t drive openness about the department itself.

Department of Veterans Affairs — C

Data about Veterans Compensation and Pension by County doesn’t make the cut as far as high-value, but we were willing to salute the release of data about “the factors that impact veterans’ employability resulting from participation in the VR&E Program.” This is in the “results” category — data that can reveal how, and how well, a program works. For producing this information, the Veterans Department gets a respectable “C.”

Central Intelligence Agency — F

Secrecy has its place, but not when it comes to data about the operation of the CIA. With no data released, the CIA gets an “F.”

Consumer Product Safety Commission — F

Our frustration at the lack of data from the CPSC had us gnashing our teeth — on lead-painted toys!

Environmental Protection Agency — D+

The EPA’s data sets about new ways to test chemicals’ toxicity levels and water quality measures for the Chesapeake Bay are great. But they’re not our idea of high-value, which is data that goes to agencies’ management, deliberations, and results. We give the agency a little credit for TRI-CHIP, the Toxics Release Inventory Chemical Hazard Information Profile dataset, just because it’s so important. But overall we didn’t get a look into the agency from its data.

Executive Office of the President — A

With the president’s focus on transparency, the EOP had better get this right! And it did. We agree that five out of six of its data sets are high-value.

A crosscut of data about budget authority for global change research activities reveals what is going on across the government on this issue. A similar set of data goes to nanotechnology work across agencies. Same with research and development budget in the area of networking and information technology across agencies for FY09 and FY10.

We even like the “improper payments” data. Hate the sin, love the sinner; don’t blame it on the messenger — pick your trite saying: We like to see this data, even if the underlying substance is regrettable.

The stinker in the bunch is no stinker at all, by the way. It’s historical data on economic forecasts. But that doesn’t provide insight into today’s EOP.

Overall, a clear “A” for the Executive Office of the President. More than three data feeds that are indeed high-value.

Export-Import Bank of the United States — D+

The user ratings that others gave to the Ex-Im bank raised their grade over a flat D. At least someone likes this data. But it’s so poorly documented that we couldn’t tell whether the data sets are high value or not. You read this and tell us what it means:

This file contains the small business authorizations recorded during FY 2010 up to the last month closed in the Bank’s financial and administrative systems.

OK. Authorizing who…? To do what…? Give us something we can work with Ex-Im Bank! Your data isn’t going to sell itself!

Federal Bureau of Investigation — F

Our investigation turned up no data. We’re throwing the FBI in the transparency slammer.

Federal Communication Commission — F

Failing to release data is pretty uncommunicative, don’t you think?

Federal Deposit Insurance Corporation — F

No data? No credit. Do not pass go. Do not insure $200.

Federal Election Commission — F

Does it take an election campaign to get data out of you?

Federal Reserve Board — D

The Fed issued two feeds: 2008 home mortgage loan application register data, and 2008 data on small business, small farm, and community development lending. Sorry — no and no.

Federal Trade Commission — F

We think that running an agency without data transparency is an unfair and decptive trade practice.

General Services Administration — A

The agency that’s central processing for management of government should get this right, and GSA did. We agree that five of GSA’s seven feeds are high-value.

The cash and payment management data gets credit, especially under the general rule that money is management. We were interested to see three different data feeds dealing with federal advisory committees. The membership of these committees help guide agencies’ policymaking, so we credit these as being data about “deliberation.” The dataset that “represents time taken to hire a GSA employee,” well, we’re not sure about that — maybe management.

The catalog of federal domestic assistance is good to have out there, and the list of federal government contractors too. But those don’t get right at management, deliberation, or results. Those duds notwithstanding, GSA has three high-value data feeds and gets an “A.”

Merit Systems Protection Board — B

Credit goes to the MSPB for its data feeds on petitions for review received, decided, and pending by month at MSPB headquarters. Same for data on initial appeals received, decided, and pending by month for its regional and field offices. These are both management data if we ever saw it.

But we found meritless (for our purposes, anyway) the data store of 2007 survey responses summarizing the existence of positive performance management practices and employee engagement scores. Whatever that is doesn’t seem like the stuff we want to learn from the MSPB.

NASA — D

You know you can count on NASA for cool data, but cool does not necessarily mean high-value. Its contributions — data about nighttime surface temperatures on earth, images of earth, and estimates of the horizontal near-surface currents of the Tropical Pacific ocean — don’t open up the true final frontier: NASA’s inner workings.

National Archives and Records Administration — D

NARA produced some important — if mind-numbing — data, none of which unfortunately deserves credit as high-value. XML versions of the Code of Federal Regulations are neat, but that’s no insight into NARA. The Archival Research Catalog data set is an important record of what NARA has produced, but it’s not about the workings of the agency.

The Organization Authority Files data set contains a highly detailed presentation of the evolution of names and administrative histories of Federal and non-Federal organizations. That’s catnip for a researcher into federal administrative history. It’s valium for those of us seeking after open government.

National Science Foundation — C

The NSF — funder of so much data collection — comes up pretty anemic when the question is data about itself. Generously, we’ve given it credit for data about FOIA requests: received processed, response times, and so on.

Its data feeds naming fellowship award recipients and revealing grant funding rates don’t do enough to show the public how the agency works. For the one decent feed, though, NSF garners itself a “C.”

National Transportation Safety Board — D

The NTSB is really good at collecting and disseminating transportation safety information, but what about NTSB-focused information? Not so good. Perhaps it’s an excess of modesty, but the NTSB’s 12 self-identified high-value data sets don’t meet our criteria even once.

NTSB’s data is all about the accident statistics, which is no surprise because the agency has so much of that data near at hand. We want to see what’s in its head and its heart, though, with data reflecting the agency’s management, deliberation, and results. From that perspective, this data was a wreck.

Nuclear Regulatory Commission — C+

The NRC’s one data feed is a thing of beauty: a list of contracts for greater than $100,000, their purposes, suppliers, dollar amounts, effective dates, NRC identifying number, and award types. That’s just the kind of information that can help one see how the agency is run.

Now if they could just find two more like that…

Overseas Private Investment Corporation — C

OPIC provided two data sets about greenhouse gas emissions attributable to projects the agency is committed to. That’s neither here nor there when it comes to core tranparency, though it might be all there for someone researching the environmental impacts of OPIC.

We did credit OPICs data about the net impact on the economic and social development of OPIC projects’ host countries.

Pension Benefit Guarantee Corporation — B

The PBGC produced some pretty good data. One spreadsheet contains a list of multiemployer plans receiving financial assistance payments from the PBGC from the period 2005 through 2009. That’s management. Key financial data from PBGC’s financial statements for the periods ending September 30, 1992 through September 30, 2009? Remember the rule: money is managment.

The one that we couldn’t see clear to credit was list of all single-employer defined-benefit pension plans trusteed by the PBGC since its creation in 1974. That’s data about the agency that could be useful for oversight, but it’s too far back into history rather than the present-day functioning of the agency. All in all, though, a respectable “B” for the PBGC.

Railroad Retirement Board — D

The RRB produced statistical data about railroad workers, retirees, and annuitants, but nothing that gives us insights about the agency, so nothing we would call high-value.

Securities and Exchange Commission — F

Shares of this agency’s stock are falling.

Small Business Administration — D

Useful data, maybe. But the SBA didn’t manage to produce any data about itself. One data set is a collection of federal, state and local licenses, permits and registrations small businesses need to operate. (Can we say we’d like that list to be shorter?) Another data set is a collection of links to federal, state, and local financial assistance programs for small businesses. That’s fine data, but not high-value. Finally, there’s a “mashup” of URLs for city and county web sites and city and county location data. Cool data, but not valuable in terms of what we’re looking for.

Social Security Administration — A

SSA declared a whopping 14 of its data sets to be high-value. In among that, there had to be three high-value data sets, and there were.

Like SSA’s disability claim acceptance rates, for example. That’s good management data. Same with data on hearings before administrative law judges and their dispositions. Workload indicators for each hearing office in the Office of Disability Adjudication and Review (i.e., pending, receipts, dispositions and average processing time) — it seems mind-numbingly boring, but it’s also management data that we’ll treat here as high-value.

Kudos to SSA for getting the data out.

U.S. Agency for International Development — B

We really liked USAID’s database containing funding levels of U.S. Trade Capacity Building (TCB) activities designed to promote economic growth through international trade. With that data set, USAID is “TCB” in a different sense — takin’ care of business. We also credited statistics about U.S. official development assistance detailing it by country and implementing agency.

We couldn’t credit the data set containing U.S. economic and military assistance by country from 1946 to present. Historical data, good. But unless it shows how an agency or program produced results, it’s not our idea of “high-value.”

Solid job by USAID, though, and a “B.”

U.S. Equal Employment Opportunity Commission — D

The EEOC might want to put an ad in the paper looking for a database administrator. Its one feed doesn’t cut it as high-value for our purposes. Statistics on employment by race, occupation, gender, state and job category are good to have around, but they don’t let us see how the EEOC does its work.

Government and Politics, Technology and Privacy

Cato at Liberty

Cato at Liberty

Topics

Grading Agencies’ High-Value Data Sets

Related Tags

Cato at Liberty Cato at Liberty

Email Signup

Topics

Grading Agencies’ High-Value Data Sets

Related Tags

Cato at Liberty

Cato at Liberty