Publishing complex ecological data using the Darwin Core standard

Wouldn’t it be nice if we could share our ecological data using a common format, in a common place, freely available for everyone? In his blog post on Living Norways technical blog site, researcher Jens Åström from the Norwegian Institute for Nature Research (NINA), discuss how you can use the Darwin Core standard to publish complex ecological data.

Making your data publicly available is quickly becoming a standard task for researchers. It is increasingly demanded by journals when publishing your research findings, or even by funding agencies when applying for grants. Journals have traditionally accepted data in file format, which can be reached through their websites along with the paper. Wouldn’t it be nice if we could store our ecological data using a common format, in a common place, freely available for everyone?

Foto: Jens Åström

In his blog post, researcher Jens Åström from the Norwegian Institute for Nature Research (NINA) discuss how he formatted and published a multi-year observation data set of ~80 species with a hierarchical survey scheme, while incorporating all collected environmental covariates, and meta-data into GBIF. The data set is similar in structure to many other data sets that typically arise from ecological monitoring and research programs. Read the blog post here.

Publishing complex ecological data

Miniworkshop: Facilitating FAIR management of ecological data using R

Setting up data infrastructure to simplify management of (ecological) data, to facilitate publishing using open community standards (foremost as a Darwin Core Archive) was centre stage for the discussion at our miniworkshop this week.

One of the activities we are engaged with currently is to develop R-functionality that help researchers in ecology and related fields to manage their data (see e.g. this blog post for more about the topic). To this end, we have started to collect functions in an add-on library for statistical software R, called LivingNorwayR. The work is just initiated, so the package is very much in development. However, the idea is that we will develop a set of functionality that will facilitate:

  • Metadata creation, by reading from the core data files
  • Quality control of data
  • Matching taxonomy to selected databases
  • Mapping to and reading from Darwin Core
  • Reading and writing Darwin Core Archives
  • Visualizing the geographic extent of data
  • …. and many other features

Wednesday this week, we met to discuss the state of the project, including a brain storming session to voice ideas about additional functionality to include as we move forward with the development. There are still some major decisions to be taken regarding the general programming approach, and the functions have not yet found their final form. Please keep watching out for updates on the project!

Want to learn more or contribute – visit our GitHub site for this project:

How to initiate and organize a data project in ecology?

Good data management routines underpins the FAIR principles for scientific data management and stewardship. For many projects, a starting point will be to set up a logical folder structure in which to store all files associated with the data collected.

Within such a folder structure, different versions of the data (field notes, raw data, mapped data, and scripts used for data transformation and mapping) deserves designated folders. In addition, data documentation (i.e. data management plans and metadata) must be included in the “data package”. Not all projects will rely on the same underlying data flow model, but in our experience most field based ecology projects still have sufficient overlap in terms of data flow to make it worthwhile suggesting a common folder structure for field data projects.

Beyond making it easier for the individual researchers or data management units to keep their data well organized, an important endpoint is to facilitate publication of the “data package” at the appropriate stage in the project life cycle. Thus, the folder structure should facilitate e.g. publishing of the mapped data as a Darwin Core Archive (and preferentially register the data set with GBIF). Also the raw data could be easily extracted and archived in a generalist repository.

As part of our work in Living Norway, we have made a draft function for software R (https://www.r-project.org/) to facilitate setting up such a folder structure. You can read more about the proposed functionality here.

If you have input to the workflow and folder structure and workflow model we are proposing, you are welcomed to contribute by posting an issue on our GitHub repo for this project.

2nd International Living Norway Colloquium

Towards openness and transparency in Applied Ecology

Ecology is a discipline that tries to understand nature and how changes affect it. How is the diversity of organisms dispersed around the world? How do extreme climate changes influence populations of animals and vegetations? These are all questions regularly asked, and topics that are necessary to address for providing evidence for policymaking and management. 

Unfortunately, much ecological data needed to answer these challenges are currently not available to the research community. Even though there is a rapidly increasing awareness of the importance of ecological data, it is equally vital not only to collect the data but also making data Findable, Accessible, Interoperable and Reusable (FAIR). While an open data culture is clearly emerging in ecology, there are many steps that needs to be taken to improve to make most data FAIR.   

These are the challenges Erlend Birkeland Nilsen, project manager for the Living Norway Ecological Data Network and senior scientist at NINA, invites you to explore at a two days colloquium held in Trondheim this fall. You will get the opportunity to discuss solutions to these thrilling challenges in collaboration with Nordic and international partners within the field of biodiversity informatics, says Nilsen.  

Many scientific journals now demand scientists to share their data openly. Nature, for example, recently endorsed the FAIR Data initiative, which entails authors to put their data on public repositories, where available. However, the real benefit of FAIR, Nilsen argues, is that it helps you demonstrate the impact of your research when people re-use and cite your dataset, it leads to new collaborations and hence benefits the researchers to come. 

I look forward to two days filled with fruitful discussions with some of the best researchers in the field, with policymakers, research publishers and young scientists, he says.    

No parallel sessions – only plenaries!

On day 1, the program will include a range of plenary presentation covering topics related to open science and FAIR data management in applied and basic ecology. Each session will be completed with a panel discussion.

On day 2, we will arrange two workshops that will extend the discussions from day 1. Before lunch, we will arrange a workshop on education and training in open science and FAIR data management. This workshop will be organized in collaboration with SFU bioCEED (https://bioceed.w.uib.no/). After lunch, we will arrange a workshop on statistical modelling of new open data sources. We will, in particular, discuss models that integrate information from a range of different data sources simultaneously. This workshop will be arranged in collaboration with SFF Centre for Biodiversity Dynamics (CBD; https://www.ntnu.edu/cbd).

The conference has an unusual arrangement, all plenaries. All of them are worth attending and makes your schedule all filled up. These means that you do not have to choose; just sit down and enjoy!  


Join the colloquium either in-person or virtually! During these challenging times of Covid-19, we understand the apprehension about making plans to attend a conference. This is why we are developing plans to ensure that everyone will have an opportunity to participate in the Living Norway Colloquium, be it in person or virtually. Whatever you choose, remember to register for the event.

TENTATIVE PROGRAM

Venue: The colloquium and workshops will be located at NINA-huset in Trondheim, or you can participate virtually.

Date: October 12th – 13th 2020.

Registration: You can register to the colloquium, the lunches and dinner here.

The colloquium is supported by funding from the Research Council of Norway.

The plan on day 1 is to have a series of lectures covering three main topics. A panel discussion will follow each session. The lecturers will be from various sectors, both from Norway and abroad. The tentative program for day 1 is as follows: 

A fruitful two days seminar

Written by Benjamin Cretois, Lasse F Eriksen and Wouter Koch – PhD students at The Norwegian University of Science and Technology.

There is a rapidly increasing awareness of the importance of ecological data. With the climate crisis and global biodiversity loss as a background for much of today’s ecological research, we cannot afford wasting any resources in the search for more knowledge. Thus, as scientists we have to be aware of not only the need for collecting data, but also how we make the most of the data we collect. A part of this is making data Findable, Accessible, Interoperable and Reusable (FAIR). This was highlighted during the 2 days Living Norway seminar at the Norwegian Institute for Nature Research (NINA) last week, where a group of ca. 70 participants met to discuss the challenges, opportunities and infrastructure needed to improve the ways data is shared within the scientific community.  

What is FAIR and how to develop this paradigm within the scientific community, a recurrent theme of this seminar

First day was dedicated to introducing the participants to the Open Science philosophy, emphasizing that a FAIR management of data is a step forward to this scientific ideal. Finding and formatting data is a difficult and time consuming step in research as it can take up 79% of the researcher’s time (Data Science report 2016, CrowdFlower). It is critical that scientists agree on standards for data and metadata as it would facilitate its traceability and use.

Finding data can be difficult and time consuming for a scientist

This way of managing data is also intended to render data more relevant and readable for environmental policy and management structures as Ingunn Limstrand from the Norwegian Environment Agency highlighted this in particular. This would facilitate policy decisions and trust between policy makers and scientists. Even though implementing the Open Science ideal is challenging, big data platforms such as the Global Biodiversity Information Facility (GBIF) are facilitating the process as they pay attention to make their data FAIR.

The second day was rich in ideas and discussion but also disagreements as two workshops took place. The first workshop stressed the importance of data that are not digitized but bear a great value as they carry valuable temporal information. These are called legacy data and can take the form of museum collection samples, paper notes or old undocumented spread sheets. The afternoon workshop aimed to settle an important question: How could and should Open Science be implemented in biodiversity education? Participants agreed that the way to do Open Science and more generally good data management, should be taught as soon as possible in the University curriculum.

Availability of data will enable scientists to dedicate more resources to seek results and less time to perform data collection, thus increase the efficiency in all fields, not the least climate and biodiversity research. In that sense, as Einar Hjorthol from the Norwegian Biodiversity Information Centre reminded us in the introductory talk, this seminar is a part of the work of saving the planet.

Detailed program

Fair data management and open science in ecology, wildlife management and conservation

June 11. – 12. 2019

Seminar invitation can be found here.

June 11th:

Open seminar at NINA-huset

Time

Program

09:30 – 10:00

Registration and coffee

Session chair: Bob O’Hara, NTNU

10:00 – 10:10

Welcome (Einar Hjorthol, Director of Norwegian Biodiversity Information Centre)

10:10 – 10:30

Towards FAIR data management of ecological data? (Erlend B. Nilsen, NINA)

10:30 – 10:50

Challenges in dissemination of biodiversity knowledge (Arild Lindgaard, Norwegian Biodiversity Information Centre)

10:50 – 11:10

Why, when and how do we need ecological data for environmental policy and management (Ingunn Limstrand, Norwegian Environment Agency)

11:10 – 11:30

Norwegian participation in international biodiversity research infrastructures (Frank Hanssen, NINA)

11:30 – 12:30

Lunch

Session chair: Eveliina Päivikki Kallioniemi, NBIC

12:30 – 13:00

Open Science in the Nordics and the EOSC-Nordic project (Andreas Jaunsen, Special advisor, NeIC)

13:00 – 13:30

The new joint Swedish Biodiversity Data Infrastructure (SBDI)

(Debora Arlt, Swedish University of Agricultural Sciences)

13:30 – 14:00

Data integration on the global stage and the forward direction of GBIF (Tim Robertson, Head of informatics – GBIF)

14:00 – 14:30

Coffee break

Session chair: Katrine Eldegard, NMBU

14:30 – 15:00

New statistical methods for data integration in ecology and beyond (Nick Isaac, Centre for Ecology and Hydrology)

15:00 – 15:30

Education and training resources for FAIR data management and scientific reuse of data (Vigdis Vandvik, University of Bergen)

15:30 – 16:00

Summary and discussion (Anders G. Finstad, NTNU)

______________________________________________________

June 12th:

Workshop 1: NeIC and Nordic collaboration

Workshop-leader: Frank Hanssen, NINA

CANCELLED

——————-

Workshop 2: Legacy data: prioritization, data types, sources and tools

Workshop leader: Anders G Finstad, NTNU.

What kind of ecological data should be prioritized for archiving and re-use retrieval preparation? We will have a special focus on legacy data. Legacy data include a large backlog of digitalization and documentation of data collected throughout (mainly) the last century. In particular, as a consequence of increased focus on ecological research during the 1970s, a large surge of ecologist is currently passed or closing in on retirement. Consequently, Irreplaceable observations from the last decades are being lost due to inappropriate storage formats or lack of documentation. Concurrently, in a field such as ecology that relies on unique non-replicable observations from nature, data rescue missions to secure legacy data and make them available to the current generation of scientists are a most pressing task. However, the task is daunting, both due to the share volume of information, due to the huge variety of storage formats and common lack of associated documentation (metadata), but also due to legal issues. Here, we will ask questions and facilitate discussions around:

  • Practical solutions for data discovery, data rescue and FAIRification of legacy data
  • How do we prioritize between data-sets and data-types?
  • Licencing and legal issues?

Time

Program

09:00 – 09:10

Welcome and intro (Anders G. Finstad, NTNU)

09:10 – 09:20

Example of a successful data-rescue mission: the case of ptarmigan monitoring in Norway (Erlend B. Nilsen, NINA)

09:30 – 09:40

Digitizing legacy data through crowdsourcing: “Dugnadsportalen” as case study (Rukaya Sarah Johaadien, GBIF Norway)

09:40 – 10:05

Why does legacy data matter? New tools and approaches for the integration of unstructured and historical data in ecological research (Joseph Chipperfield, NINA)

10:05 – 10:20

Discussion and comments

10:20 – 10:35

Coffee break

10:35 – 11:10

Legal issues and licencing (Wouter Koch)

11:10 – 12:00

Data paper discussion (Bob O’Hara)

12:00 – 13:00

Lunch.

—————————

PAGE_BREAK: PageBreak

Workshop 3: How could and should Open Science be implemented in biodiversity education?

Workshop leaders: Vigdis Vandvik and John-Arvid Grytnes, UiB and Dag Endresen, GBIF-Norway

The Open Science movement represents a paradigm shift in science – it is currently transforming not only the standards and payment schemes for scientific publication, but how we think and act around every aspect of science, from our daily research practices, data, scientific publications, to teaching, learning, and communication of science. Future researchers will need to know and master new tools and practices, but also think about data and science in new ways, to succeed in the new landscape.

The current science curricula taught at the biology programmes at our universities are not reflecting this rapid transformation – we are not making use of the opportunities represented by Open Science, but we are also not preparing students for the challenges.

There are new opportunities rapidly emerging to integrate open data in education – students can reuse available data in their studies, and we can teach them to make their own data openly available as part of their course or thesis work. In this way, students can be part of the ‘real’ scientific enterprise during their studies, using and contributing to ‘real’ science.

The aim of this workshop is to discuss these issues, share experiences and ideas. The intended outcome is to write a discussion piece in a journal (e.g., commentary in Methods in Ecology and Evolution)

Time

Program

12:00 – 13:00

Lunch

13:00 – 13:30

What is Open Science, and why is it important for students? (Dag Endresen)

13:30 – 14:00

Why, how, and when could and should Open Science be implemented in our educations? (Vigdis Vandvik)

14:00 – 14:15

Coffee break

14:15 – 16:00

How are we dealing with data in biology educations today? (Case studies – examples from participants)

Way forward: Group discussions; summary in plenary

PAGE_BREAK: PageBreak

In a FAIR open science world…

Group members / email:

PAGE_BREAK: PageBreak

…students will need new skills

For discussion:

  • What are critical FAIR open science skills students should acquire?
  • Which of these do we already cover, and which are missing in the education we offer today?
  • How can these skills be acquired? Existing/new methods, examples, cases…
  • When during the studies should this happen?
  • What are the main opportunities? Challenges?

PAGE_BREAK: PageBreak

…students should learn using real data

For discussion:

  • Examples of real data used in education today?
  • When, how, and for what learning outcome can students use real data?
  • Pros and cons of using real data in education

…student’s data should be shared

For discussion:

  • Examples of student contributions of real data to science? Examples from different levels, types, learning situations, learning outcomes (beyond thesis work…). Published examples, if you have 🙂 ?
  • Additional opportunities, ideas of how students could contribute data?
  • For what learning outcome

…we should share educational resources

For discussion:

  • which educational resources should we devellop now, and for what?

Living Norway Seminar 2019

FAIR DATA MANAGEMENT AND OPEN SCIENCE IN ECOLOGY, WILDLIFE MANAGEMENT AND CONSERVATION

Dates: June 11. – 12. 2019

Venue: The seminar and workshops will be located at NINA (June 11th) and NTNU (June 12th) in Trondheim.

Registration: You can register to the seminar, the lunches and the conference dinner here. Registration deadline May 31st 2019.

Detailed program: Click here for a detailed detailed program.

Livestream: https://www.ustream.tv/channel/gk5T3FAzwvd

________________________

We hereby invite you to a 2-day seminar (June 11th – 12th 2019) focusing on fair data management and open science in ecology, wildlife management and conservation. The seminar is organised by the Living Norway Ecological Data Network (www.livingnorway.no) and the Nordic eInfrastructure Collaboration DeepDive project (https://neic.no/deepdive/).

Human welfare is inherently linked to the goods and services delivered by nature, and the current degradation of the biosphere is among the most pressing and severe societal challenges. Nevertheless, much ecological data needed to address these challenges are currently not available to the research community. Currently, most ecological data are stored locally at institutions or with individual researchers, they are not well documented, and they are not standardized. Consequently, the data management is often not in agreement with the FAIR principles (Findable, Accessible, Interoperable, Reuseable). This situation hinders both the researcher at the level of the individual research project, but also affects the possibility to efficiently conduct large scale data driven synthesis important for management and policy development. In addition, a better documentation and management of data will promote transparency and reproducibility in the research process. The ambition for this combined seminar and workshop is both to increase the awareness about the importance of improved management of Norwegian ecological datasets in accordance with the FAIR principles, and to discuss solutions to the challenge in collaboration with Nordic and international partners within the field of biodiversity informatics.

What will we discuss in this seminar:

In this seminar we will cover questions such as

  • What is the current situation in Norway with respect to management of ecological data sets?
  • Who are the main national and international players in this field?
  • What are the leading global and nordic e-infrastructures, and what will they look like in the future?
  • What new research questions could be answered by combining data from disparate sources?
  • What are the main fields and opportunities for increased Nordic collaboration
  • How could we mobilize for data rescue?

Scientific committee: Living Norway Steering Board, and Frank Hanssen, NINA (representing NeIC DeepDive).

Organization: The seminar is co-hosted by the Norwegian Institute for Nature Research (NINA), Norwegian University for Science and Technology (NTNU), the Norwegian Biodiversity Information Centre (NBIC), the Norwegian GBIF-node and the NeIC DeepDive project.

Program outline:

[A detailed program will be available soon]

Day one will be dedicated to public lectures from national and international invited speakers. On day two, Living Norway and NeIC will co-host workshops, focusing on increased Nordic collaboration (see suggested topics below), FAIR data management and increased data mobilization.

June 11th:

10:00 – 11:30

Presentations from Norwegian management, research and data management institutions

11:30 – 12:30

Lunch at NINA-huset

12:30 – 14:00

Presentations from key Nordic and international research infrastructures and institutions

14:30 – 16:00

Researchers perspectives, etc

June 12th:

On this day, we will arrange three parallel workshops. Participants can only take part in one workshop, but because workshop 1 is a full day event, it is possible for those that join the other workshops to join the last sessions of workshop 1.

Workshop

Content

1: NeIC and Nordic collaboration (Workshop leader: Frank Hanssen, NINA) 09:00 – 16:00

CANCELLED

Evolving from the current NeIC DeepDive project into an extended nordic-baltic collaboration in a NeIC DeepDive 2 project. This workshop contains 4 sessions covering topics such as Nordic-Baltic cloud services, Linked Open Data, trait data and identification of future collaboration areas within the field of biodiversity data infrastructures. The workshop is open for all.

2: Prioritization – data types and sources, data rescue task force (Workshop leaders: Anders G. Finstad, NTNU) 09:00 – 12:30

What kind of ecological data should be prioritized for archiving and re-use retrieval preparation? We will have focus on legacy data – data left behind from researchers and research projects. This provides an irreplaceable source of past observations currently being lost due to inappropriate storage formats or lack of documentation. Questions addressed relates to prioritization, practical solutions and legal issues related to this.

3: Curriculum in FAIR data management (Workshop leaders: Dag Endresen, UiO & Vigdis Vandvik, UiB) 13:00 – 16:00

Rapid development in societal expectations, technological possibilities and the launch of FAIR data management principles are prevailing across scientific fields. We see a clear need to establish a curriculum and training program that prepare current and future ecologist for this new era. Such a curriculum should include both technological skills and ethical and legal reflections. In this workshop, we will present and discuss ideas for moving forward with these ideas.

Living Norway Ecological Data Network has been launched!

Living Norway Ecological Data Network is a new initiative to promote FAIR data management and reuse of Norwegian ecological data. The network currently involves seven Norwegian institutions that are involved with collection, curation, management, publication and/or use of ecological data scientific studies. To promote management of ecological data from Norwegian research institutions in agreement with FAIR principles, we will focus on both on contributing to the technological development, and to increase human know how and the general competence in biodiversity informatics. Once fully established, Living Norway will contribute to and be part of the Living Atlas community. Living Norway is embedded with the Norwegian node of the Global Biodiversity Information Facility (GBIF).

Because human welfare is inherently linked to the goods and services delivered by nature, the current degradation of the biosphere is among the most pressing and severe societal challenges. Unfortunately, much ecological data needed to solve the above challenge are not available to the research community, because they are stored locally at institutions or with individual researchers, they are not well documented, and they are not standardized. Given the complexity of the task it is absolutely pivotal to unlock these data sources, making it possible to integrate data from several sources in the quest for deeper insight into the natural world and the human pressure on biodiversity. Living Norway is a direct answer to this challenge, and will therefore be in high demand by the Norwegian research community.

We are currently at the stage where we actively work to secure funding for the planned activities. The network partners submitted a common application to the INFRA-program of Research Council of Norway fall 2018. Now that the network is established, we have already started some of the activities, although the volume of the activity will depend on the funding situation.

The network currently involve seven Norwegian institutions, but our ambition is that the number of involved institutions will continue to grow now that the network is established. Should you be interested in learning more about Living Norway Ecological Data Network, please do not hesitate to contact us. You can use the contact page above, or follow the link to the members of the steering committee that you find on these pages.