This document specifies a method of organising file-based data with associated metadata, known as DataCrate in both human and machine readable formats, based on the schema.org linked-data vocabularly, supplemented with terms from the SPAR ontologies and [PCDM] where schema.org does not have coverage. The motivation for this work comes from the research domain.
A DataCrate is a dataset a set of files contained in a single directory. There are two ways of organizing a DataCrate.
For working data or data that does not need to be distributed with checksums, a Working DataCrate is a plain-old directory containing payload data files, with two metadata files at the root; one for humans and one for machines.
For distribution, or archiving; where integrity is important, a Bagged DataCrate is a BagIt bag conforming to the DataCrate BagIt profile with the payload files in the /data directory. A Bagged DataCrate has a clear separation between metadata and payload, and can be integrity-checked using the checksums in the BagIt manifest.