Good data management routines underpins the FAIR principles for scientific data management and stewardship. For many projects, a starting point will be to set up a logical folder structure in which to store all files associated with the data collected.
Within such a folder structure, different versions of the data (field notes, raw data, mapped data, and scripts used for data transformation and mapping) deserves designated folders. In addition, data documentation (i.e. data management plans and metadata) must be included in the “data package”. Not all projects will rely on the same underlying data flow model, but in our experience most field based ecology projects still have sufficient overlap in terms of data flow to make it worthwhile suggesting a common folder structure for field data projects.
Beyond making it easier for the individual researchers or data management units to keep their data well organized, an important endpoint is to facilitate publication of the “data package” at the appropriate stage in the project life cycle. Thus, the folder structure should facilitate e.g. publishing of the mapped data as a Darwin Core Archive (and preferentially register the data set with GBIF). Also the raw data could be easily extracted and archived in a generalist repository.
As part of our work in Living Norway, we have made a draft function for software R (https://www.r-project.org/) to facilitate setting up such a folder structure. You can read more about the proposed functionality here.
If you have input to the workflow and folder structure and workflow model we are proposing, you are welcomed to contribute by posting an issue on our GitHub repo for this project.
One thought on “How to initiate and organize a data project in ecology?”
Comments are closed.