PIDapalooza 2019 has ended
Are you ready to PID-party!?!?  Join us at the Bernard Shaw Pub on Tuesday night at 7pm for some pre-PID festivities.  See you there!
View analytic
Wednesday, January 23 • 11:30am - 11:55am
p(id)-2-p(id): bridging PIDs to the p2p, Content-Addressed Web?

Sign up or log in to save this to your schedule and see who's attending!

At the risk of causing some respected members of the Pidapalooza community to stab themselves in the eyeball (see https://doi.org/10.6084/m9.figshare.5914312.v1), this session will explore content-addressing of research outputs through the Interplanetary FileSystem (IPFS) and how this approach relates to existing persistent identifier infrastructures.
Research data is driving some new PID requirements. Linking versioned DOIs has been explored in various working groups and implemented on various popular platforms (Figshare, Zenodo, F1000). There is also renewed interest in the idea of data packaging (https://rd-alliance.org/approaches-research-data-packaging-rda-11th-plenary-bof-meeting)  and research objects (http://www.researchobject.org/ro2018/) in RDA and related groups as a practical means of bundling data with its metadata in a way that can be easily cited and transmitted as a single payload. Finally, several groups are exploring how best to directly reference content in PID metadata using cryptographic hashing. For example, the Freya project is looking at how best to allow “direct access to content associated with a DOI” (see https://github.com/datacite/freya/issues/2) and RDA is tackling similar issues in the PID Kernel Information Working Group (https://www.rd-alliance.org/groups/pid-kernel-information-wg). This all suggests an appetite for greater consensus around linking PIDs to versioned (and therefore immutable), self-describing, directly accessible content.
The tl;dr of IPFS is that all content on the web/network can be referenced not by where it is located (a particular server or server farm, referenced by a DNS/domain lookup), but by cryptographic identifiers derived from the content itself, allowing the protocol to retrieve the desired information from any node on the network, and removing some of the issues with content moving and drifting on the web. Like bittorrent, git and many others before it, cryptographic hashing plays a key role here, but IPFS aims to make this a ubiquitous, general purpose network protocol, comparable to HTTP URLs. Hashing digital content as a means of ensuring fixity/integrity of content will be familiar to Pidapalooza participants. Moreover, people working in the digital repository and cloud storage space, may well work with content-addressed storage of one form or another. However, the prospect of (relatively) widespread use of peer-to-peer content-addressing at web/network-level, as exemplified in particular by the Interplanetary File System (IPFS; ipfs.io), raises some interesting possibilities for how we manage and cite research data.

On the face of it, IPFS bakes *some* of the core PID use-cases, in particular handling location change and/or multi-resolution, right into the fabric of the network. However, by itself, IPFS is not a panacea. For example, although IPFS has built in ways to update hashing algorithms to deal with future hash collisions, this isn’t much good if you’ve cited something using a now broken hash. IPFS also has emerging specifications for metadata registry and for a mutable namespace to cater for updated content but again, from a PID perspective, these re-introduce some of the fragility of the HTTP web and also some of the familiar requirements for transparent, multi-stakeholder, and relatively centralised governance models that make PID schemes so trusted today.
This season will introduce the peer-to-peer content-addressed approach, exploring its benefits and weaknesses in terms of persistence and, using the Bilder-Fenner framework (https://doi.org/10.6084/m9.figshare.5914312.v1) as a basis, discuss how IPFS and related approaches can both leverage PID infrastructure and in turn be leveraged by PID infrastructures to address particular content distribution and referencing requirements.


Eoghan Ó Carragáin

I work on research data and open science at University College Cork Library, and chair the Infrastructure Working Group of Ireland’s National Open Research Forum. I previously worked at the National Library of Ireland as a software developer building a digitisation workflow and... Read More →

Wednesday January 23, 2019 11:30am - 11:55am
Stage 1

Attendees (22)