
datacrate/ at new_draft/0.2 · UTS-eResearch/datacrate · GitHub


This document specifies a method of organising file-based data with associated metadata, known as DataCrate in both human and machine readable formats, based on the linked-data vocabularly, supplemented with terms from the SPAR ontologies and [PCDM] where does not have coverage. The motivation for this work comes from the research domain.

A DataCrate is a dataset a set of files contained in a single directory. There are two ways of organizing a DataCrate.

For working data or data that does not need to be distributed with checksums, a Working DataCrate is a plain-old directory containing payload data files, with two metadata files at the root; one for humans and one for machines.

For distribution, or archiving; where integrity is important, a Bagged DataCrate is a BagIt bag conforming to the DataCrate BagIt profile with the payload files in the /data directory. A Bagged DataCrate has a clear separation between metadata and payload, and can be integrity-checked using the checksums in the BagIt manifest.




  • @diglezakis

Comments and Reviews