Wednesday, January 23


p(id)-2-p(id): bridging PIDs to the p2p, Content-Addressed Web?
At the risk of causing some respected members of the Pidapalooza community to stab themselves in the eyeball (see https://doi.org/10.6084/m9.figshare.5914312.v1), this session will explore content-addressing of research outputs through the Interplanetary FileSystem (IPFS) and how this approach relates to existing persistent identifier infrastructures.
Research data is driving some new PID requirements. Linking versioned DOIs has been explored in various working groups and implemented on various popular platforms (Figshare, Zenodo, F1000). There is also renewed interest in the idea of data packaging (https://rd-alliance.org/approaches-research-data-packaging-rda-11th-plenary-bof-meeting)  and research objects (http://www.researchobject.org/ro2018/) in RDA and related groups as a practical means of bundling data with its metadata in a way that can be easily cited and transmitted as a single payload. Finally, several groups are exploring how best to directly reference content in PID metadata using cryptographic hashing. For example, the Freya project is looking at how best to allow “direct access to content associated with a DOI” (see https://github.com/datacite/freya/issues/2) and RDA is tackling similar issues in the PID Kernel Information Working Group (https://www.rd-alliance.org/groups/pid-kernel-information-wg). This all suggests an appetite for greater consensus around linking PIDs to versioned (and therefore immutable), self-describing, directly accessible content.
The tl;dr of IPFS is that all content on the web/network can be referenced not by where it is located (a particular server or server farm, referenced by a DNS/domain lookup), but by cryptographic identifiers derived from the content itself, allowing the protocol to retrieve the desired information from any node on the network, and removing some of the issues with content moving and drifting on the web. Like bittorrent, git and many others before it, cryptographic hashing plays a key role here, but IPFS aims to make this a ubiquitous, general purpose network protocol, comparable to HTTP URLs. Hashing digital content as a means of ensuring fixity/integrity of content will be familiar to Pidapalooza participants. Moreover, people working in the digital repository and cloud storage space, may well work with content-addressed storage of one form or another. However, the prospect of (relatively) widespread use of peer-to-peer content-addressing at web/network-level, as exemplified in particular by the Interplanetary File System (IPFS; ipfs.io), raises some interesting possibilities for how we manage and cite research data.

On the face of it, IPFS bakes *some* of the core PID use-cases, in particular handling location change and/or multi-resolution, right into the fabric of the network. However, by itself, IPFS is not a panacea. For example, although IPFS has built in ways to update hashing algorithms to deal with future hash collisions, this isn’t much good if you’ve cited something using a now broken hash. IPFS also has emerging specifications for metadata registry and for a mutable namespace to cater for updated content but again, from a PID perspective, these re-introduce some of the fragility of the HTTP web and also some of the familiar requirements for transparent, multi-stakeholder, and relatively centralised governance models that make PID schemes so trusted today.
This season will introduce the peer-to-peer content-addressed approach, exploring its benefits and weaknesses in terms of persistence and, using the Bilder-Fenner framework (https://doi.org/10.6084/m9.figshare.5914312.v1) as a basis, discuss how IPFS and related approaches can both leverage PID infrastructure and in turn be leveraged by PID infrastructures to address particular content distribution and referencing requirements.


Eoghan Ó Carragáin

I work on research data and open science at University College Cork Library, and chair the Infrastructure Working Group of Ireland’s National Open Research Forum. I previously worked at the National Library of Ireland as a software developer building a digitisation workflow and... Read More →

Wednesday January 23, 2019 11:30am - 11:55am
Stage 1


To the Rescue of Scholarly Orphans
The Scholarly Orphans project, funded by the Andrew W. Mellon Foundation, explores technical approaches aimed at capturing and archiving scholarly artifacts that researchers deposit in web productivity portals as a means to collaborate and communicate with their peers. These artifacts are not collected by other frameworks aimed at archiving the scholarly record (e.g., LOCKSS, Portico, Institutional Repositories) and are only incidentally captured by web archives. The project explores an institution-driven approach inspired by web archiving. To demonstrate the ongoing thinking, the project has devised an experimental automated pipeline that continuously discovers, captures, and archives artifacts. These are created by actual researchers who, for the purpose of the experiment, were virtually enlisted in a fictive research institution. A portal at myresearch.institute provides an overview of the artifacts that were discovered and provides access to archived versions stored in both an institutional and a cross-institutional archive. The set-up leverages a range of  technologies that share a flavor of persistence: Memento, Memento Tracer, Robust Links, Signposting.


Herbert Van de Sompel

Herbert Van de Sompel started his career as head of library automation at Ghent University. After leaving Ghent, he was Visiting Professor in Computer Science at Cornell University, Director of e-Strategy and Programmes at the British Library, and information scientist at the Los... Read More →

Wednesday January 23, 2019 12:00pm - 12:25pm
Stage 1


5 Steps Towards a Dream State: Persistent & Open Data Metrics
To standardize the counting and expose the reach of research data, Make Data Count has focused on building open infrastructure for scholarly communications research data usage and citation counts. Our intention has always been to build open infrastructure that enables the analysis and building of proper data level metrics, to truly value each data PID. While implementing this infrastructure we have exposed barriers and feats necessary for the community to address in implementing metrics that are valuable and feasible for research data. Join us for a discussion on the five steps we believe are necessary that the community towards building, implementing, and advocating for data usage and data citation before we can really have "data-level metrics" and move towards a value system for research data.

John Chodacki

Director, University of California Curation Center, California Digital Library - CDL
John Chodacki is Director of the University of California Curation Center (UC3) at California Digital Library (CDL)

Martin Fenner

Technical Director, DataCite
Daniella Lowenberg

California Digital Library - CDL

Wednesday January 23, 2019 2:00pm - 2:25pm
Stage 2
Thursday, January 24


Architecting Attribution
Open science, team science, and a drive to understand meaningful outcomes have transformed research at all levels. It is not sufficient to consider scholarship simply from the perspective of papers written, citations garnered, and grant dollars awarded.  We need a more nuanced characterization and contextualization of contributions of varying types and intensities that are critical to power research.  Unfortunately, little infrastructure exists to identify, aggregate, present, and understand the impact of these contributions. Moreover, these challenges are technical as well as social and require an approach that assimilates cultural perspectives for investigators and organizations, alike. Here we will present ongoing work through the US National Center for Data to Health (CD2H) to address these challenges, including a discussion of the role of PIDs, contributor roles and assertions, and stakeholders and systems to recognize and credit a diverse complement of work.


Kristi Holmes

Northwestern University

Thursday January 24, 2019 2:15pm - 2:40pm
Stage 1