Reflections on “Privacy Implications of Research Data: A NISO Symposium”

I had the opportunity to attend “Privacy Implications of Research Data: A NISO Symposium” (Sponsored by the NISO-RDA Joint Interest Group) in Denver this past weekend  as a member of the RDA/NISO Privacy Implications of Research Data Sets Working Group. I’m grateful to Todd Carpenter, NISO Executive Director, for including me in this project, which is a follow-on to the project group that I was a member of that produced the NISO Consensus Principles on Users’ Digital Privacy in Library, Publisher, and Software ­Provider Systems [PDF].

As the Case Statement for the Working Group states, the goal is to “develop a framework for how researchers and repositories should appropriately manage human-subject datasets, to develop a metadata set to describe the privacy-related aspects of research datasets, compile a bibliography of related resources, and to build awareness of the privacy implications of research-data sharing.”

The speakers were all thoughtful and each provided a focused talk on some aspect of this multi-faceted topic that continues to shift while we grapple with it. And, in fact, that was one of the themes that emerged in the talks – the lack of clear definitions of what we mean by the terms privacy, research, and data. Terms we all use regularly but seem to defy easy operational definition in the context of this project.

All of the presentations were recorded as well as the follow-on discussions and are accessible from the symposium website and so I won’t recap them here in summary. Instead, I’d like to offer a few reflections.

  • In the context of the symposium, health/biomedical, social media, and (to a degree) sociology/psychology data were the focus on the discussion with an emphasis on quantitative data. In future conversations, considering qualitative data and privacy will also be important. Interviews, focus groups, oral histories, etc. all produce data that raise privacy questions and concerns.
  • At times the conversation seemed to conflate the question of whether data was “research data” with the question of whether the person who had collected and/or who wanted to access and use the data was a “bona fide researcher.” I think we find more clarity in separating the question of whether data is research data from the question of who is allowed to access and use it. This is particularly useful if we want to affirm the tenant that an individual whose data is in the data set should have a right to access (and possibly review, correct, and/or delete) his or her own data. How to think about citizen science is also an open question here.
  • I also left thinking that, while this topic is vast, one way to develop a focus for the coming year would be to think carefully about capitalizing on NISO’s leadership/participation in this NISO-RDA project. There are many facets to privacy in research data. Is there a way to best use NISO’s areas of expertise, recognizing that the RDA community at large may have additional interests as well?

As a reminder, anyone is welcome to contribute to the group by joining the forum on the RDA/NISO Privacy Implications of Research Data Sets Interest Group website to receive notifications of meetings and other events as well as drafts of the framework as it emerges.

I previously blogged about the meeting for this project held at FORCE11 in April 2016.

Privacy in Research Data at FORCE11 #force2016

Slide01I had the pleasure of attending FORCE11 2016 Conference pre-conference in Portland today as a member of the RDA/NISO Privacy Implications of Research Data Sets Working Group. I’m grateful to Todd Carpenter, NISO Executive Director, for including me in this project, which is a follow-on to the project group that I was a member of that produced the NISO Consensus Principles on Users’ Digital Privacy in Library, Publisher, and Software ­Provider Systems [PDF].

We had our first discussion at the Research Data Alliance Seventh Plenary Meeting in Tokyo in March, which introduced the project and examined in detail the question of whether the project is best suited to be an RDA Interest Group or Working Group. The discussion at FORCE11 reviewed these issues as well but quickly focused on some questions of substance about the content of the framework that is being created and what will be most useful for the community.

As the Case Statement for the Working Group states, the goal is to “develop a framework for how researchers and repositories should appropriately manage human-subject datasets, to develop a metadata set to describe the privacy-related aspects of research datasets, compile a bibliography of related resources, and to build awareness of the privacy implications of research-data sharing.”

The Case Statement also presents a Work Plan for the group: “focus on world-wide legal frameworks and the impacts these frameworks have on data sharing, especially with human-subject data. After gathering these legal strictures and comparing the differences and similarities, the group will begin crafting a set of principles that will provide guidance to the researcher and repository communities on how to manage these data when they are received. Building on these, the group will craft a set of use cases on how the principles will be applied. After these elements are completed, an effort to advance the principles through promotion and community outreach will be developed and executed.”

Today’s discussion was, as expected (since we are at the beginning of the work and thus in brainstorming mode), wide-ranging. Nonetheless, as I listened to the comments and questions, a few themes emerged from my perspective:

  • Principles and Practices – Though there is need to identify the what and why, the framework will provide value to the community if it also includes indicators of the how. Specifically, the discussion revealed a need for best practices in governance of privacy in data sets and best practices in technology and metadata infrastructures. How can the framework respond to known use cases while also anticipating future ones?
  • Stakeholders – The stakeholders for this topic are diverse and multiple. Though the document might be useful to all, using a smaller set of identified stakeholders as a focus might prove a useful way to scope the framework. Possibilities discussed included chief information officers, vice-presidents for research, and repository managers. What are the advantages and disadvantages of choosing one or more stakeholder groups as the focus and likewise of not doing so?
  • Unique Contribution – Privacy in research data sets is a topic that could also include IRB, legal compliance, etc.  The framework may be most useful if it makes a unique contribution, acknowledging but not duplicating other work that focuses on human subjects ethics, institutional legal compliance/risk, etc. What is the unique contribution of the proposed framework?
  • Level of Abstraction – The framework is necessarily abstracted from the particulars of an individual researcher, institution, or discipline and local decisions can benefit from a general framework. How can the framework find the right level of abstraction so that it is generalization but also usable in practice?
  • International – The international nature of the framework adds another layer of complexity to the question of the level of abstraction. How can the framework account for but not be subsumed within any particular set of national and/or multinational policy and legal guidelines?

The RDA/NISO Privacy Implications of Research Data Sets Working Group will be holding a number of conference calls in the coming months to discuss these issues as well as a public symposium on September 11, 2016 in Denver. Anyone is welcome to contribute to the group by joining the forum on the RDA/NISO Privacy Implications of Research Data Sets Working Group website to receive notifications of meetings and other events as well as drafts of the framework as it emerges.