Living Norway Ecological Data Network will host a workshop December 7th – 8th, focusing on data management and data publishing of ecological data using program R combined with other tools.
We hereby invite you to participate in our workshop about data management and data publishing using R and other tools. In the workshop, we will present how you can use our newly developed package LivingNorwayR (for statistical software R) to manage ecological data, map data into the Darwin Core standard, document your data with rich metadata, and zip it all together to a Darwin Core Archive that can be archived or published with e.g. Living Norway and GBIF.
During the workshop, we will first present a general introduction to data management, data standardisation, data documentation, data archiving and data publishing. Then we will present the main functionality of the LivingNorwayR package. It will be possible to follow these lectures online. Then we will have a hands-on workshop session, where workshop participants will work on a set of pre-defined exercises. Finally, workshop participants are invited to bring their own data sets that they can work on, with a goal of producing a complete Darwin Core Archive data archive. The workshop organisers will be available and provide supervision during this session. This data package can then be stored locally on your computer, or it can be published and registered with Living Norway and GBIF using a specific tool provided by GBIF. The exercises will be made available to all participants, but we will only be able to provide practical help to in-house participants.
You do not need to be an expert in data standardisation or data management to participate in the workshop, but familiarity with program R is preferable if you want to take part in the practical exercises.
10:30 – 12:00
Introductory lectures (available for online participants)
12:00 – 12:45
12:45 – 13:15
Introduction to exercises (available for online participants)
13:15 – 14:00
14:00 – 15:00
Start to work on participants data
9:00 – 11:00
Work on participants data
11:00 – 11:30
To register for the workshop, follow this link. Deadline for registration is December 1st.
We are getting ready for the 3rd Living Norway Colloquium – and hope you are also getting ready to join us! The theme for this years conference is “The ethics and technical know-how of open science in ecology and evolution” – a theme we are sure will be relevant across the ecological research community.
October 25th – 26th, we invite fellow ecologisits and others that are interested in open science, open data and research ethics to join our annual colloquium. Following up from last years event, we have decided that also this event will be a hybrid event. Thus – you can either join the event “live” at NINA-huset in Trondheim, or online from anywhere in the world. Last year, our colloquium attracted participants from >20 different countries, and we hope this years event will attract a comparable number of participants.
There is no fee to attend the event – and should you decide to join us live in Trondheim, you even get free lunch! This has been made possible through event support from the Research Council of Norway and in-kind contribution from NINA – and we take the opportunity to thank both for their kind support.
More information about the colloquium, including detailed program and link to the registration form can be found here.
Elizabeth Law (NINA), Vigdis Vandvik (UIB), Matt Grainger (NINA), Erlend B. Nilsen (NINA)
As open science and reproducible research practices are becoming mainstream across the scientific community, we are becoming increasingly aware that this ‘FAIR open revolution’ in how science is planned, conducted, reported, communicated, and assessed, must also transform the way we teach and learn science. At the Living Norway 2020 ‘FAIR open education’ workshop, we shared experiences and plans, were presented with some interesting and inspiring case studies, and discussed opportunities and ways forward. We are working towards a publication of the workshop outcomes, but in the meantime, here are some of the main take-home messages.
A wordcloud representing initial thoughts on open science and education, developed from the first discussion groups using the R-package InteractiveWordcloud (available on github).
Open Science is rapidly and dynamically transforming how science is done
Open science is transforming how we think about, do, and communicate science. No longer constrained to discussion on free and open access to read research articles (open access), and to download and use data (open data, e.g. LivingNorway and GBIF), the emerging open science landscape includes openness in all stages of research, from openness about research planning (e.g., pre-registered reports) via methods (open protocols) to data (open data, FAIRdata practices) and analyses (open code, e.g. in R and on github) to research outputs (open access, open peer review, open research synthesis). Associated with this development are new platforms for sharing and participating in all these different aspects of open science (e.g. OSF). As this landscape has evolved, the classical view of ‘access’ as the main benefit of open science has broadened to realize that openness in science is key to promote quality, reproducibility, efficiency, and broad sharing or research both within and beyond the scientific society.
The Living Norway Open Educational Workshop emerged from a realization that this ongoing transformation of how science operates should have profound implications for how science should be taught and learnt, as Vigdis Vandvik emphasised in the opening of the workshop. Students need to learn these Open Science skills – and therefore we have a responsibility to teach them the principles and practice. Not only are these skills becoming required for best practice and ethical research, but they also are increasingly essential to gaining funding and publications, building meaningful networks, answering emerging large-scale and integrative questions, and developing careers both inside and outside scientific research (these skills are also highly transferable to professional work environments). We need to do our best in transforming science education to meet the needs of today’s students and tomorrow’s science.
Open science offers learning opportunities beyond classical educational settings. Students can learn open science by integrating their classes into real science workflows. For example Aud Halbritter (UiB) and Tanya Styrdom (Université de Montréal), gave a coordinator and student perspective (respectively) on how Open Science integrates into the UiB Plant Functional Traits course. This started as a necessity: improving workflows to enhance the quality of the data being collected (via standardised measurement protocols) but quickly evolved to also including students in best practice data processing, management and publication. Over the last five years, the course has blossomed from being ‘just another field ecology course’, to one where the students collect and manage real data which are shared openly through real scientific data publications, and later used in real science (e.g papers listed here). Through their participation, students learn both the principles and practice of open and reproducible research (e.g. standardised protocols, best practice data management, use and contribution to open data, use and development of open code with R and github). Collaborative and cross-cultural communication skills emerged as important added learning outcomes from the real research participation in the courses.
These Open Science skills, technologies, resources, and practices are dynamic, however, leaving us with a moving target. How do educators keep up with the times, but not overload ourselves (or our students), and what will be relevant in the future?
An example of this question came through the discussions, highlighting how R is often both the solution and the problem. While we commonly document code and workflows (aiming for transparency and repeatability) through R scripts and packages, the general lexicon evolves and branches from base R to tidyverse. What to teach first? Both dialects are ostensibly required, and while some prefer base because it is fundamental and used by most other packages, others find tidyverse a more intuitive introduction. Also, many packages (particularly with the ongoing development of tidyverse) get periodically updated, and these updates can often lead to code breaking or becoming “buggy” over time. Luckily, more open-source software comes to the rescue, such as packrat or renv or conda. But a balance needs to be found: participants agreed docker is possibly ‘best practice’, however it is currently more challenging to use (although see a friendlier introduction here) and possibly overkill in many contexts, as one commenter put it “using a sledgehammer to hammer a nail”.
Did we lose you there with all that discussion of R tools many of us have never heard of? This is something we need to address: the acknowledgement of the risks of Open Science simply preaching to the converted, alienating those who are not, and expanding the existing equity gap of student experience, rather than narrowing it. While Open Science has the core values of reproducibility, accountability, and FAIRness, it also “inherits many systematic barriers that already exist in mainstream science”. This elephant-in-the-room might need to be a question for future LivingNorway colloquiums. For now, let’s return to the Open Science values of reproducibility and accountability.
Cherry-picking: the lowest-hanging fruit
The first step in change is recognising there is a problem. For example, the UiB Plant Functional Traits course recognised they had an issue with poor quality data that was preventing them doing quality science. But these problems are far from unique.
Hannah Fraser (University of Melbourne) (slides from her talk) revealed how common questionable research practices are in ecology and evolution: at levels similar to those causing alarm in education, and psychology (also here). Of these, she focused on cherry-picking, which is often unintentional; a result of the culture of field ecology going out and measuring a bunch of different variables, analysing them in many different ways, and selectively reporting only the ‘best’ results. Hannah also highlighted the reproducibility crisis (very few studies are repeated, and if they are, often different answers are found). Hannah discussed two emerging solutions – pre-registration to address cherry-picking, and repeat studies to address the issues and implications of reproducibility.
In terms of cherry-picking, pre-registration – a statement of the analysis intention and hypotheses prior to collecting or collating the data – is arguably the lowest-hanging fruit. We already do this in the form of project proposals, but there is a lack of impetus and culture to be stringent about revisiting these when writing up for publication, or to make them publicly available. In the case of student theses/dissertations, Hannah points out there is even the cultural expectation that projects will change. And this is reasonable, Hannah notes, plans can be updated, we just need to change the culture to make it clear when deviations occur, and thereby distinguish exploratory vs confirmatory research. Several participants in the workshop agreed that there are many benefits to improved pre-registration, particularly for graduate students, including clarifying research questions and hypotheses, spreading the workload of writing across the project. Going one step further, developing analysis scripts prior to getting the data can be really useful in terms of focusing the analysis on getting the methods right without getting lost trying to get the ‘right’ results (and is ‘best practice’ science). All of this can help keep students on task, and on track.
The infrastructure for pre-registration is there, so what is stopping us? Do we fear that commitment to a predefined hypothesis may preclude explorations of interesting unexpected results? This fear is unfounded, as pre-registration does not preclude exploratory analyses (see above). If we fear being “scooped”, this fear is also unreasonable, as pre-registration actually does the opposite – it provides precedence, a foot in the door even before our results are in and ready. Do we fear the possibility of negative results impacting publishability? Again, pre-registration may actually provide a solution: many journals now accept or encourage pre-registrations, typically with a commitment to publish from the journal’s side (Hannah mentions Ecology and Evolution, and Conservation Biology as examples). Also, pre-registration gives us valuable access to peer-reviewers’ comments at a stage in the research when changes can still be made to plans and protocols. Or do we fear being wrong, and fear this could damage our reputation as a scientist? We need to rise above this: a negative or contradictory result is rarely ‘wrong’, indeed, a study suggests that, in science, “admitting wrongness … is less harmful to one’s reputation than not admitting”.
A glimpse into the workshop, collated from workshop output, the LivingNorway twitter stream, and photos by Vigdis Vandvik.
Overall, within the workshop there was general agreement that starting small, simple, and salient is helpful, because often the most challenging and intimidating part is simply to start. Pre-registration and repeat studies are two really great examples of how we can potentially affect big and meaningful change through relatively small changes in our perceptions and practices, particularly in the context of teaching. But they also highlight how it is typically our (mostly unfounded, but still felt) fears that may hold us back.
For the student, the teaching environment gives us a fantastic opportunity to help quell those fears. For example, Tanya Styrdom suggested that while committing code to github can feel scary, that doing this in a safe environment, with support from her teachers, made it more approachable. This is all well and good for the proficient student, but what about those with less experience with coding, but how about the teacher? Integrating Open Science can see many of us feeling out of our depth. Here Aud Halbritter emphasised starting incrementally, and focussing on the changes that will give the most benefit (and indeed, focussing on the benefits). With the recognition that not all students are necessarily so comfortable with coding and technology, the course coordinators chose to make many of the more ‘advanced’ Open Science aspects of the course optional extras. This allows all students to at least be aware of the possibilities, and is a great option for both teachers and students to learn together (though perhaps requiring some adaptability and humility of the teacher).
Luis Verde Arregoitia presented another approach to starting small, in safe, supportive environments: blogging. Blogging, he argues, is a really approachable way to start interacting with Open Science. This is true in the case of both learners and teachers. Luis started blogging as a way to help fellow students, inspired by this advice, but his blog is now an access point for many opportunities in teaching, learning, and collaborating. Blogs can include several elements to make them effective for teaching, including being self-contained interactive exercises, well structured, and engaging. But blogs can take many forms, from formal tutorials to a more mutual learning experience (e.g. ‘today I learned’), and it is the informal tone that makes them so approachable when learning. Another participant agrees, “blogs can teach you that you are not alone in your problems”. And you don’t have to get them perfect the first go: being editable, they are a ‘low commitment’ way of starting a journey to open science.
So noexcuses! Go forth and (teach and learn) open science!
In a couple of days, the Living Norway Colloquium 2020 will take place as a hybrid event. With close to 150 participants from more than 15 different countries spread across several time zones, we are a bit overwhelmed by the interest in the conference. But this for sure is a luxery that we should be able to handle!
The main reason for the great interest is obviously the many excellent presenters that we have managed to fit into the program on both days. The updated colloquium program can be found here. On day 1 (October 12th), we will have a series of lectures on key topics related to open and reproducible science for the 21st century. On day 2 (October 13th), we will organize two workshops that both will include group discussions and other group tasks. We hope all participants are ready to join in on the discussions!
Although the conference is just a few days ahead of us, it is still possible to sign up to attend the conference online (unfortunately, we cannot host more people in-house in Trondheim). You can find the link at the conference web page here. If you just want to view the presentations without attending the conference and joining the discussions, you could do so by the livestream embedded in the conference web page (here).
We are looking forward to meeting you all early next week – be it in person in Trondheim (Norway) or online from anywhere in the world!
Wouldn’t it be nice if we could share our ecological data using a common format, in a common place, freely available for everyone? In his blog post on Living Norways technical blog site, researcher Jens Åström from the Norwegian Institute for Nature Research (NINA), discuss how you can use the Darwin Core standard to publish complex ecological data.
Making your data publicly available is quickly becoming a standard task for researchers. It is increasingly demanded by journals when publishing your research findings, or even by funding agencies when applying for grants. Journals have traditionally accepted data in file format, which can be reached through their websites along with the paper. Wouldn’t it be nice if we could store our ecological data using a common format, in a common place, freely available for everyone?
In his blog post, researcher Jens Åström from the Norwegian Institute for Nature Research (NINA) discuss how he formatted and published a multi-year observation data set of ~80 species with a hierarchical survey scheme, while incorporating all collected environmental covariates, and meta-data into GBIF. The data set is similar in structure to many other data sets that typically arise from ecological monitoring and research programs. Read the blog post here.
Setting up data infrastructure to simplify management of (ecological) data, to facilitate publishing using open community standards (foremost as a Darwin Core Archive) was centre stage for the discussion at our miniworkshop this week.
One of the activities we are engaged with currently is to develop R-functionality that help researchers in ecology and related fields to manage their data (see e.g. this blog post for more about the topic). To this end, we have started to collect functions in an add-on library for statistical software R, called LivingNorwayR. The work is just initiated, so the package is very much in development. However, the idea is that we will develop a set of functionality that will facilitate:
Metadata creation, by reading from the core data files
Quality control of data
Matching taxonomy to selected databases
Mapping to and reading from Darwin Core
Reading and writing Darwin Core Archives
Visualizing the geographic extent of data
…. and many other features
Wednesday this week, we met to discuss the state of the project, including a brain storming session to voice ideas about additional functionality to include as we move forward with the development. There are still some major decisions to be taken regarding the general programming approach, and the functions have not yet found their final form. Please keep watching out for updates on the project!
Want to learn more or contribute – visit our GitHub site for this project:
Good data management routines underpins the FAIR principles for scientific data management and stewardship. For many projects, a starting point will be to set up a logical folder structure in which to store all files associated with the data collected.
Within such a folder structure, different versions of the data (field notes, raw data, mapped data, and scripts used for data transformation and mapping) deserves designated folders. In addition, data documentation (i.e. data management plans and metadata) must be included in the “data package”. Not all projects will rely on the same underlying data flow model, but in our experience most field based ecology projects still have sufficient overlap in terms of data flow to make it worthwhile suggesting a common folder structure for field data projects.
Beyond making it easier for the individual researchers or data management units to keep their data well organized, an important endpoint is to facilitate publication of the “data package” at the appropriate stage in the project life cycle. Thus, the folder structure should facilitate e.g. publishing of the mapped data as a Darwin Core Archive (and preferentially register the data set with GBIF). Also the raw data could be easily extracted and archived in a generalist repository.
As part of our work in Living Norway, we have made a draft function for software R (https://www.r-project.org/) to facilitate setting up such a folder structure. You can read more about the proposed functionality here.
If you have input to the workflow and folder structure and workflow model we are proposing, you are welcomed to contribute by posting an issue on our GitHub repo for this project.
Towards openness and transparency in Applied Ecology
Ecology is a discipline that tries to understand nature and how changes affect it. How is the diversity of organisms dispersed around the world? How do extreme climate changes influence populations of animals and vegetations? These are all questions regularly asked, and topics that are necessary to address for providing evidence for policymaking and management.
Unfortunately, much ecological data needed to answer these challenges are currently not available to the research community. Even though there is a rapidly increasing awareness of the importance of ecological data, it is equally vital not only to collect the data but also making data Findable, Accessible, Interoperable and Reusable (FAIR). While an open data culture is clearly emerging in ecology, there are many steps that needs to be taken to improve to make most data FAIR.
These are the challenges Erlend Birkeland Nilsen, project manager for the Living Norway Ecological Data Network and senior scientist at NINA, invites you to explore at a two days colloquium held in Trondheim this fall. You will get the opportunity to discuss solutions to these thrilling challenges in collaboration with Nordic and international partners within the field of biodiversity informatics, says Nilsen.
Many scientific journals now demand scientists to share their data openly. Nature, for example, recently endorsed the FAIR Data initiative, which entails authors to put their data on public repositories, where available. However, the real benefit of FAIR, Nilsen argues, is that it helps you demonstrate the impact of your research when people re-use and cite your dataset, it leads to new collaborations and hence benefits the researchers to come.
–I look forward to two days filled with fruitful discussions with some of the best researchers in the field, with policymakers, research publishers and young scientists, he says.
No parallel sessions – only plenaries!
On day 1, the program will include a range of plenary presentation covering topics related to open science and FAIR data management in applied and basic ecology. Each session will be completed with a panel discussion.
On day 2, we will arrange two workshops that will extend the discussions from day 1. Before lunch, we will arrange a workshop on education and training in open science and FAIR data management. This workshop will be organized in collaboration with SFU bioCEED (https://bioceed.w.uib.no/). After lunch, we will arrange a workshop on statistical modelling of new open data sources. We will, in particular, discuss models that integrate information from a range of different data sources simultaneously. This workshop will be arranged in collaboration with SFF Centre for Biodiversity Dynamics (CBD; https://www.ntnu.edu/cbd).
The conference has an unusual arrangement, all plenaries. All of them are worth attending and makes your schedule all filled up. These means that you do not have to choose; just sit down and enjoy!
Join the colloquium either in-person or virtually! During these challenging times of Covid-19, we understand the apprehension about making plans to attend a conference. This is why we are developing plans to ensure that everyone will have an opportunity to participate in the Living Norway Colloquium, be it in person or virtually. Whatever you choose, remember to register for the event.
Venue: The colloquium and workshops will be located at NINA-huset in Trondheim, or you can participate virtually.
Date: October 12th – 13th 2020.
Registration: You can register to the colloquium, the lunches and dinner here.
The colloquium is supported by funding from the Research Council of Norway.
The plan on day 1 is to have a series of lectures covering three main topics. A panel discussion will follow each session. The lecturers will be from various sectors, both from Norway and abroad. The tentative program for day 1 is as follows:
Written by Benjamin Cretois, Lasse F Eriksen and Wouter Koch – PhD students at The Norwegian University of Science and Technology.
There is a rapidly increasing awareness of the importance of ecological data. With the climate crisis and global biodiversity loss as a background for much of today’s ecological research, we cannot afford wasting any resources in the search for more knowledge. Thus, as scientists we have to be aware of not only the need for collecting data, but also how we make the most of the data we collect. A part of this is making data Findable, Accessible, Interoperable and Reusable (FAIR). This was highlighted during the 2 days Living Norway seminar at the Norwegian Institute for Nature Research (NINA) last week, where a group of ca. 70 participants met to discuss the challenges, opportunities and infrastructure needed to improve the ways data is shared within the scientific community.
What is FAIR and how to develop this paradigm within the scientific community, a recurrent theme of this seminar
First day was dedicated to introducing the participants to the Open Science philosophy, emphasizing that a FAIR management of data is a step forward to this scientific ideal. Finding and formatting data is a difficult and time consuming step in research as it can take up 79% of the researcher’s time (Data Science report 2016, CrowdFlower). It is critical that scientists agree on standards for data and metadata as it would facilitate its traceability and use.
Finding data can be difficult and time consuming for a scientist
This way of managing data is also intended to render data more relevant and readable for environmental policy and management structures as Ingunn Limstrand from the Norwegian Environment Agency highlighted this in particular. This would facilitate policy decisions and trust between policy makers and scientists. Even though implementing the Open Science ideal is challenging, big data platforms such as the Global Biodiversity Information Facility (GBIF) are facilitating the process as they pay attention to make their data FAIR.
The second day was rich in ideas and discussion but also disagreements as two workshops took place. The first workshop stressed the importance of data that are not digitized but bear a great value as they carry valuable temporal information. These are called legacy data and can take the form of museum collection samples, paper notes or old undocumented spread sheets. The afternoon workshop aimed to settle an important question: How could and should Open Science be implemented in biodiversity education? Participants agreed that the way to do Open Science and more generally good data management, should be taught as soon as possible in the University curriculum.
Availability of data will enable scientists to dedicate more resources to seek results and less time to perform data collection, thus increase the efficiency in all fields, not the least climate and biodiversity research. In that sense, as Einar Hjorthol from the Norwegian Biodiversity Information Centre reminded us in the introductory talk, this seminar is a part of the work of saving the planet.
Workshop 2: Legacy data: prioritization, data types, sources and tools
Workshop leader: Anders G Finstad, NTNU.
What kind of ecological data should be prioritized for archiving and re-use retrieval preparation? We will have a special focus on legacy data. Legacy data include a large backlog of digitalization and documentation of data collected throughout (mainly) the last century. In particular, as a consequence of increased focus on ecological research during the 1970s, a large surge of ecologist is currently passed or closing in on retirement. Consequently, Irreplaceable observations from the last decades are being lost due to inappropriate storage formats or lack of documentation. Concurrently, in a field such as ecology that relies on unique non-replicable observations from nature, data rescue missions to secure legacy data and make them available to the current generation of scientists are a most pressing task. However, the task is daunting, both due to the share volume of information, due to the huge variety of storage formats and common lack of associated documentation (metadata), but also due to legal issues. Here, we will ask questions and facilitate discussions around:
Practical solutions for data discovery, data rescue and FAIRification of legacy data
How do we prioritize between data-sets and data-types?
Licencing and legal issues?
09:00 – 09:10
Welcome and intro (Anders G. Finstad, NTNU)
09:10 – 09:20
Example of a successful data-rescue mission: the case of ptarmigan monitoring in Norway (Erlend B. Nilsen, NINA)
09:30 – 09:40
Digitizing legacy data through crowdsourcing: “Dugnadsportalen” as case study (Rukaya Sarah Johaadien, GBIF Norway)
09:40 – 10:05
Why does legacy data matter? New tools and approaches for the integration of unstructured and historical data in ecological research (Joseph Chipperfield, NINA)
10:05 – 10:20
Discussion and comments
10:20 – 10:35
10:35 – 11:10
Legal issues and licencing (Wouter Koch)
11:10 – 12:00
Data paper discussion (Bob O’Hara)
12:00 – 13:00
Workshop 3: How could and should Open Science be implemented in biodiversity education?
Workshop leaders: Vigdis Vandvik and John-Arvid Grytnes, UiB and Dag Endresen, GBIF-Norway
The Open Science movement represents a paradigm shift in science – it is currently transforming not only the standards and payment schemes for scientific publication, but how we think and act around every aspect of science, from our daily research practices, data, scientific publications, to teaching, learning, and communication of science. Future researchers will need to know and master new tools and practices, but also think about data and science in new ways, to succeed in the new landscape.
The current science curricula taught at the biology programmes at our universities are not reflecting this rapid transformation – we are not making use of the opportunities represented by Open Science, but we are also not preparing students for the challenges.
There are new opportunities rapidly emerging to integrate open data in education – students can reuse available data in their studies, and we can teach them to make their own data openly available as part of their course or thesis work. In this way, students can be part of the ‘real’ scientific enterprise during their studies, using and contributing to ‘real’ science.
The aim of this workshop is to discuss these issues, share experiences and ideas. The intended outcome is to write a discussion piece in a journal (e.g., commentary in Methods in Ecology and Evolution)
12:00 – 13:00
13:00 – 13:30
What is Open Science, and why is it important for students? (Dag Endresen)
13:30 – 14:00
Why, how, and when could and should Open Science be implemented in our educations? (Vigdis Vandvik)
14:00 – 14:15
14:15 – 16:00
How are we dealing with data in biology educations today? (Case studies – examples from participants)
Way forward: Group discussions; summary in plenary