A bedrock principle of scientific inquiry is the independent reproducibility of scientific results. Publication of irreproducible results threatens the reliability of scientific publications and the integrity of scientific inquiry, and can lead to questions about the merits of science-based regulations to reduce risks. Yet, recent examinations of empirical research published in prominent academic journals have found a disturbingly high number of irreproducible results. There is now a growing agreement that the entire scientific community must help remedy this serious problem and many journals are requiring authors to provide access to data, computer codes, and lab specimens.

One stark exception to this important trend is the many influential journals published by U.S. government agencies. Although federal agencies have adopted policies to promote reproducibility, e.g., through data transparency, we find that nine of the top 10 federal journals (as measured by the h‑index, a measure of productivity and citation impact), lack policies to promote data access and sharing.

This needs to change. The White House’s Office of Science and Technology Policy (OSTP), in conjunction with the Office of Management and Budget (OMB), should direct federal journals to adopt policies for public data access that are at least as strong as the private journals Science and Nature.

Data access / Those two journals, and others, have adopted policies requiring data access. Science, for example, requires that

all data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science. All computer codes involved in the creation or analysis of data must also be available to any reader of Science.… Large data sets with no appropriate approved repository must be housed as supplementary materials at Science, or only when this is not possible, on an archived institutional Web site.

Public access to data and code has improved reproducibility in economics research. In the 1980s, William Dewald, Jerry Thursby, and Richard Anderson tried to replicate results of empirical research in economics that had been published in a respected journal. Writing in the American Economic Review (AER) in 1986, they reported that “inadvertent errors in published empirical articles are commonplace rather than a rare occurrence.” As a result, the AER implemented various data access policies and, importantly, investigated the reproducibility of published papers after implementation of its current data access rules. After analyzing data and code placed as required in repositories for published papers, the AER’s replication researchers concluded that “all but two of the articles (95 percent) could be replicated with little or no help from the author(s).”

These findings may well hold in other fields. Economists’ analytic methods involve complicated statistical analyses of large non-experimental datasets, techniques broadly similar to those used in epidemiological and some medical research. In psychological research, a major cooperative effort recently found that while 97 percent of original studies reported statistically significant results, the researchers endeavoring to replicate those studies found statistically significant results in only 36 percent of replications. The researchers did not address how or whether greater data access might improve reproducibility.

Federal journals / The federal government has already taken important steps to promote reproducibility. In 2002, the OMB directed agencies disseminating “influential scientific, financial, or statistical information” to “include a high degree of transparency about data and methods to facilitate the reproducibility of such information by qualified third parties.”

However, as noted above, the federal government has not acted to promote data access in the scientific journals that it manages. We have examined the editorial policies of the peer-reviewed federal journals that accept submissions from non-government authors. In particular, we searched the U.S. Government Printing Office, websites of the cabinet-level departments and agencies, and the SCImago Journal and Country Rank portal for federal journals and found information on their rankings and h‑indices to identify the most prominent. Information on the 10 federal journals with the highest h‑indices appears in Table 1. With one exception, none have posted policies to promote, let alone guarantee, access to data or code needed to replicate the research that they publish.

The exception, the Social Security Bulletin, has a “requirement” for researchers: “If your paper is accepted for publication, you will be asked to make your data available to others at a reasonable cost for a period of three years (starting six months after actual publication).” This modest step is inadequate. Bryan Drew, Romina Gazis, Patricia Cabezas, et al. reported in a 2013 PLoS Biology article that less than 3 percent of researchers voluntarily share data and code sufficient to allow for reproducibility, even if many had said they would share data and code upon request. Similarly, requirements for authors only to make data available upon request have been shown to be ineffective.

Some federal journals are important. Four (Environmental Health Perspectives, Emerging Infectious Diseases, Morbidity and Mortality Weekly Report, and the Journal of Rehabilitation Research and Development) are ranked among the top 10 journals in their areas of specialization based on the 2014 data in the SCImago Journal and Country Rank portal. All of these journals enjoy taxpayer support, and thus the editors-in-chief have a special duty to ensure sound management.

The general lack of policies regarding data access for federal journals contrasts sharply with the data access policies of the highest-ranked non-federal journals publishing in the same subject areas. The Journal of Geophysical Research requires authors to post their data. Immunity, the Journal of Clinical Microbiology, the Canadian Journal of Fisheries and Aquatic Sciences, and Health Technology Assessment require authors to commit to making all data available upon request. Immunity, Clinical Infectious Diseases, the Journal of Infectious Diseases, Marine Ecology—Progress Series, and the Journal of Experimental Biology require authors to deposit sequence and microarray data in publicly accessible databases. The Journal of Geophysical Research and the Journal of Clinical Microbiology require that computer code necessary to reproduce the results be made available, with the Journal of Geophysical Research requiring that such computer code be posted for download.

The adoption of data access policies by these higher-ranked journals shows their value. Indeed, the head of the National Institutes of Health commended Science and the other journals of the American Association for the Advancement of Science for requiring data and code access for its publications.

Regulation - Winter 2015 - Briefly Noted 6 - Table 1

Data access policies are low cost, since so many non-federal journals could not otherwise have adopted them. Further, one federal journal, the Journal of Fish and Wildlife Management, published by the Fish and Wildlife Service, already has a strong data access policy, requiring “as a condition for publication, that data … be provided either directly in the paper, in the associated supplemental materials, … or archived in an appropriate public archive.”

Journals of international agencies also fail to promote data access. For example, the World Bank Research Observer, World Bank Economic Review, and the World Health Organization’s WHO Bulletin are all published without any announced policies on data access.

The policies of federal journals have been managed on a decentralized basis, with little formal interagency coordination. An initiative to promote reproducibility through mandatory public access to data and code could be managed centrally, however, either by the OMB (which oversees the journals’ budget requests) or the OSTP. Such an initiative would not only promote reproducibility of articles published in federal journals, but may spark the adoption of similar best practices among non-federal journals that now lack strong data access policies.

Readings

  • Estimating the Reproducibility of Psychological Science,” by the Open Science Collaboration. Science, Vol. 349 (2015).
  • “Irreproducible Experimental Results: Causes, (Mis)interpretations, and Consequences,” by Joseph Loscalzo. Circulation, Vol. 125 (2012).
  • “Lost Branches on the Tree of Life,” by Bryan T. Drew, Romina Gazis, Patricia Cabezas, et al. PLoS Biology, Vol. 11 (2013).
  • “NIH Plans to Enhance Reproducibility,” by Francis S. Collins and Lawrence A. Tabak. Nature, Vol. 505 (2014).
  • “Public Availability of Published Research Data in High-Impact Journals,” by Alawi A. Alsheikh-Ali, Waqas Qureshi, Mouaz H. Al-Mallah, et al. PLoS One, Vol. 6 (2011).
  • “Replication in Empirical Economics: the Journal of Money, Credit, and Banking Project,” by William G. Dewald, Jerry G. Thursby, and Richard G. Anderson. American Economic Review, Vol. 76 (1986).
  • “Appendix to the Report of the Editor: American Economic Review,” by Philip J. Glandon. American Economic Review, Vol. 101, No. 3 (2011).
  • “Reproducibility,” by Marcia McNutt. Science, Vol. 343 (2014).