Data Life Cycle
Research data passes through many processes from creation to (voluntary or accidental) destruction. While data is generated by our submitters/contributors, nmrXiv aims at supporting all subsequent stages of the data lifecycle, starting with data deposition. Using various sources and community involvement/incentives, we will endeavour to procure and disseminate raw NMR data that is currently not required (e.g., by journals or funding agencies) to be shared.
Deposition: User(s) can submit their datasets to the repository using our easy-to-use interface, or an electronic lab notebook such as Chemotion ELN. Using smart lab notebooks enables easy integration with commonly used software. We ensure that the submitted data sets are in accordance with minimum information standards and additional requirements set by the domain experts and funders. The submitter can then enrich their dataset with additional information (taxonomic and genomic information, related publication) according to standards defined by the community, with some metadata and acquisition parameters being pre-populated.
Processing/Archiving/Distribution: nmrXiv infrastructure will convert the data into the standard formats using workflows over distributed infrastructure. Data and associated metadata will be processed, stored, and archived following the RDM best practises such as BagIt.
Archiving: Once deposited, the users receive a stable identifier, typically a DOI hyperlink that remains unchanged for the resource's lifetime. As nmrXiv will issue DOIs, any deposited data will immediately have a stable identifier that can point to the dataset and allow its re-use or repurposing. This fully accommodates manuscript submission, peer review, and post-acceptance needs. nmrXiv will also develop restricted access models that allow only authorized reviewers to access the author's original NMR data. Upon acceptance, the DOI-labeled datasets will become publicly accessible.
nmrXiv will also support DOI Versioning, allowing users to update their data while maintaining modification records. In DOI Versioning, the DOI assigned to a dataset will always point to the latest version, while specific DOIs are assigned to specific versions. This will allow nmrXiv to manage data, e.g., structural revisions, without breaking the DOI link in publications.
Other Sources of Data
nmrXiv will not be limited to publication-related applications. We also envision and make an effort towards characterizing a set of well-known "flagship" Natural Products (NP) and deposit these data in nmrXiv.
Harvard's Dataverse
An existing comprehensive set of publicly available NMR data for +40 of the most well-studied bioactive NPs (already publicly available via Harvard's Dataverse) will be integrated into nmrXiv.
NMRShiftDB
nmrXiv will strive to make data accessible for machines (making use of ontologies), while ensuring it remains human-readable. The data can be utilized through external tools using the APIs and nmrXiv processing tools. Any data available publicly on the platform will be re-usable under FAIR and Open licenses. That way, this data can be re-used for incremental or completely new research by any user.