The DataRefuge Project at the University of Oregon
By Rachel Rochester and Heidi Kaufman
Last Saturday morning people from across the University of Oregon campus and the Eugene area crowded into a computer lab in McKenzie Hall for a crowdsourcing public humanities meetup. The dozens of people filling the lab shared an attitude of subdued excitement. We were there on a common mission: to join the rush to preserve government data on the environment. This was an event that united friends and colleagues interested in Computer Science, DH, PH, the Environment, IT services, and the UO libraries, as well as a number of other departments and local members across our Eugene community.
Scientists and their allies have developed deep anxiety surrounding the scientific data housed on government servers since Donald Trump assumed the Presidency. Although it is illegal to delete government data, it’s not illegal to take it offline, making it virtually inaccessible for climate experts and scholars. Extensive cuts to government agencies, particularly those concerned with environmental research and welfare like the National Oceanic and Atmospheric Administration (NOAA) and the Environmental Protection Agency (EPA), threaten access to scientific data housed on government servers. The cuts, when viewed alongside some alarming changes to federal science agency websites, have some people nervous about the future of critical climate research, among other lines of inquiry. As a recent New York Times article notes, changes to available data so far “appear only to reflect the publicly stated priorities of the new administration and there have been few signs as yet that federal databases are being systematically manipulated or restricted.” Nevertheless, for many activists, the changes are enough to drive them to attempt to back up vulnerable data.
The event at UO began when Stephanie LeMenager, the Barbara and Carlisle Moore Distinguished Professor in English and Environmental Studies, brought the DataRefuge Project to the attention of several environmental humanists on campus. A group effort ensued, though the bulk of the organizational credit is due to LeMenager, Taylor McHolm, a PhD candidate in the interdisciplinary Environmental Science, Studies and Policy Program, and DH’s own Heidi Kaufman. McHolm notes that the event wouldn’t have been possible without the immense efforts of the Environmental Data and Governance Initiative (EDGI) and the University of Pennsylvania Program in the Environmental Humanities (PPEH Lab), who are working together to spearhead the initiative on a national level.
McHolm became invested in the project for both practical and political reasons. In E-mail correspondence he writes: “On the practical level, the data refuge process is an opportunity to make data meaningful and secure it. As an environmental humanist, I rarely work with raw data. But my work depends on those who do. Moreover, so does my livelihood and well-being outside of my academic life — and that’s true for quite a lot of other people, too. As the climate continues to change, climate and environmental data give us access to trends and models that can help us grow food, organize cities, figure out appropriate forms of transportation, and help ensure justice for vulnerable and disproportionately impacted communities. If the data were to go missing, all of that would be far more difficult to achieve. Canvasing federal sites for important data also gives us a sense of all the work that’s being done. A number of participants noted that they didn’t realize how many programs were out there. So this is a way to show that this data influences their lives on some level. That alone is a reason to engage in the process.”
McHolm, like many of the participants at the event, was also driven by frustration at the attitudes expressed toward science during the most recent political cycle. He writes: “ With every change in administration, there are going to be shifts in
policy and messaging. With the election of Donald Trump, this was certainly no different. What was different, however, was the open hostility to social and environmental justice measures … Many [members of Trump’s administration] refuse the reality of climate change. Seeing these people come to power in a position that grants them the authority to make their views actionable was motivating, to say the least. It would be very easy (though illegal) for this data to simply be erased. It would be easier still to make the data hard to find or access. The DataRefuge process, and our work in it, is just one small way of helping prevent those scenarios.”
Anxieties about how the Trump administration will handle government data are shining a light on the broader issue of information storage in the digital age. Much of the research funded by the government exists only on government servers that may not be backed up or easily searchable. In a recent New York Times article, Laurie Allen, a digital librarian at the University of Pennsylvania who helped launch the Data Refuge project, notes that the process of storing climate data digitally has always been deeply flawed, even prior to the current administration’s changes. “No one would advocate for a system where the government stores all scientific data and we just trust them to give it to us,” she said. “We didn’t used to have that system, yet that is the system we have landed with.”
Unfortunately, backing up the data is not as easy as cutting and pasting. Much of the data can only be extracted by custom code, which is being written as it becomes necessary. The information must also be preserved and archived in such a way that it can be authenticated by future researchers. At UO’s Data Refuge event Taylor McHolm gave a brief overview of the history of the Data Refuge project before helping participants divide up into groups according to skill. Some participants used a web browser extension to help alert a web crawler to automatically copy federal websites, while flagging more complex materials that need more human attention. A second group began working to develop ways of mining those more complicated data sets.
Stephen Fickas, a professor in the University of Oregon Computer Science Department who participated in the event, notes that the logistical challenges of a data migration project of this magnitude make this project appealing from an intellectual standpoint as well as an altruistic one. In an e-mail he wrote that his interest was partially inspired by “the technical challenges involved in snapshotting major pieces of the Internet,” as well as the way the structure of dividing people to take on different, collaborative team roles could work for other crowd-sourced research projects.
McHolm is realistic about what he believes the DataRefuge Project can accomplish. He writes, “I think it’s important that we realize and be open about the fact that this process … isn’t saving the world. There’s credible reason to believe that data may not be as vulnerable as some folks have said that it is. Even if that’s the case, though, having back-ups is always helpful, and civic engagement is more necessary now than it was a few months ago. This is a way to engage civically, learn about federal agencies and their necessity, and become more familiar with the importance of environmental data and regulations.” In an era when it’s easy for environmental activists and concerned citizens to succumb to helplessness, this is a concrete action that may, at least, stimulate intelligent inquiry into climate data and the politics surrounding it.
If you’d like to get involved, we’d love to hear from you in the comments. The event on Saturday was Phase 1, and if there is enough interest UO may host a more public and large-scale event. Potential organizers and participants alike are encouraged to reach out. Oregon State University is also holding a DataRefuge event on March 17-18 that is open to the public.