Katherine Harrison

Behind the Science: the invisible work of data management in Big Science

Some of the largest quantities of data produced in today’s data-dominated world occur as the result of experiments taking place at Big Science facilities. These are facilities like CERN, where cutting-edge scientific discoveries take place, changing our understandings of the universe and promising the answers to some of society’s most pressing problems. The production of high-quality, reliable data is fundamental to this advancement of scientific knowledge. However, both historically and today, Big Science has tended to ignore how the collection, processing and storage of experimental data at these facilities shapes this new knowledge.

This book tells the story of a unique research journey following the people responsible for designing and implementing data management at a new Big Science facility, the European Spallation Source (ESS) in Lund, Sweden. It addresses a gap in the scholarly literature by highlighting the role of data management within the broader landscape of changing scientific experimentation. It draws on insights from Critical Data Studies and Science & Technology Studies to investigate how this data is framed by the context in which it is conceived of and generated.

This volume is contracted to Bristol University Press, and is scheduled for publication late 2023/early 2024, coinciding with the formal opening of the ESS to visitors.
Final report
The aim of this six-month project was to complete a book manuscript titled “Behind the Science: the invisible work of data management in Big Science” contracted to Bristol University Press. I spent the period February-July 2023 as a visiting researcher at the Computer Architecture and VLSI (CARV) Systems Laboratory of the Institute of Computer Science (ICS), at the Foundation for Research and Technology (FORTH) in Heraklion, Crete, Greece. CARV was established in 1988 and is a leading European Laboratory with a core focus on High Performance Computing with expertise in Big Data analytics. During this time, I was working on the manuscript and meeting colleagues working in this field. Furthermore, I took the opportunity to make contact with other researchers at FORTH whose interests are complementary to those in my other ongoing research projects around smart cities and AI.

The manuscript was submitted to Bristol University Press in October 2023, and a date of 1 April 2024 has been provisionally set by the publisher for submission of the final, print-ready version. Publication is expected in Fall 2024 and the book will be Open Access. Below I provide a summary of the book’s results, followed by details of the additional activities achieved during the sabbatical period.

Book results
This book was based on material collected during an earlier (separately funded) project that set out to explore the idea that Big Science itself was changing in significant ways, so much so that we might need to start talking about “New” Big Science. That project focused on the European Spallation Source (ESS), a new Big Science facility under construction outside Lund, Sweden.

My contribution to the conversation was to suggest that one of the distinguishing characteristics of such a New Big Science was a change in data – the volumes of data, the complexity of the data, the changes in user support around data, all of which combined to make data management at facilities such as the ESS more visible than ever before. When I completed that research, the material I had gathered during fieldwork at the ESS far surpassed the earlier project publication requirements in terms of quantity and richness. A book was the obvious way to do justice to this material, to the participants and to the central question that emerged: why – when data is so foundational to experimental results – has the management of that data received so little attention in Big Science to date?

Thanks to the extended period of focused time afforded by this sabbatical, "Behind the Science" shows how the specific context surrounding development of the ESS shaped the design and development of the data management system. The chapters at the heart of the book present extracts from empirical work that show the varied impact and contributions of the technologies, people and organizational structure. In this book, I want to make clear not only the importance of understanding how data management is “situated” by these different actors but also to make the expertise involved more visible. To this end I deployed two analytical approaches from Science & Technology Studies: the “black box” and “invisible work”. Using these approaches to make visible technologies, practices and expertise related to data management brought me to a point where I started thinking about experiments at the ESS as having a “front stage” and a “back stage” aspect to them, a metaphor that eventually inspired the title of this book. In the closing chapters, I draw from the empirical findings to engage with theoretical debates taking place in Critical Data Studies about the role of “raw data” in knowledge production.

Contacts and collaborations
Whilst the most significant result of this project is submission of the book manuscript itself, the sabbatical period was enormously productive in terms of wider conversations and contacts. From researchers at FORTH with experience conducting the kinds of experiments I was writing about, I learned much about working practices at other Big Science facilities and benefited from their feedback on the text. From others at the institute whose expertise lies in the field of AI and machine learning, I embraced the opportunity to gain knowledge about different learning models. These are conversations that I am taking forward with me into a new WASP-HS project called “Operationalising ethics for AI: translation, implementation and accountability challenges” which will give me the chance to collaborate further with these new contacts. Finally, thanks to the opportunity to give a guest lecture as part of a Smart Cities course at the University of Crete during my stay, I was able to bring experiences from another earlier project in Sweden to share with students there, and to form a collaboration with researchers working on sensing technologies.

Dissemination
As I prepare the book for printing, I am also preparing to disseminate the results and promote the book itself. Plans for this include presentations at relevant major international conferences, such as Data Power, the joint meeting of the European Association for the Study of Science and Technology (EASST) and the Society for Social Studies of Science (4S), and the Association of Internet Researchers annual meeting. In addition to these, I plan to participate in more local events, such as presentations at the Swedish National STS days, and events on “raw data” at Linköping University TEMA’s own Data Lab and the Swedish “Digital STS” group. The theoretical work that forms part of this book has allowed me to draw together threads that had appeared across different empirical research projects in order to formulate a clear theoretical contribution to the field of Critical Data Studies, a contribution that stretches far beyond the book itself to inform research and teaching going forwards.
Grant administrator
Linköpings universitet
Reference number
SAB22-0063
Amount
SEK 775,600
Funding
RJ Sabbatical
Subject
Media Studies
Year
2022