| |||
|
Thursday, November 19, 2015
November Newsletter
Thursday, November 12, 2015
Open Science Framework (OSF): A useful free tool for data and workflow management for scientific reproducibility
On October 13, 2015, DataOne hosted a webinar led by Courtney Soderberg from the Center for Open Science.
The webinar had two goals: (1) To outline the issues with existing scientific workflows that can lead to bias and results that are not reproducible, and (2) To introduce the Open Science Framework (OSF) as a tool to overcome these biases and increase the reproducibility of science.
Regarding issues of reproducibility, most scientists are probably aware of the narrow issue of computational reproducibility, i.e., the ability to take the data collected by a team of researchers, perform the same analyses, and reach the same conclusions. Ms. Soderberg described this issue in her talk, but she also described more subtle biases and issues with reproducibility. One issue is publication bias: analyses often change through the course of a project, and only the final (successful) analyses and results are documented, while negative results or dead-end analyses are never captured. Related to publication bias is Hypothesizing After Results are Known, or HARKing. To present a succinct story, publications often present hypotheses as a priori, whereas hypotheses may in fact have been generated after researchers spent significant time poring over the data. In what is known as researcher degrees of freedom, data processing and analytical decisions are often made after seeing and interacting with data, severely increasing the potential for false positive outcomes, often outside of the awareness of a researcher. (For further discussion of reproducibility problems, I suggest the enlightening recent special issue of Nature on this topic.)
In response to these various potential sources of bias, the OSF, a free web-based resource for data and workflow management, builds in mechanisms to reduce (or at least document) potential sources of research bias. The OSF is meant to be used through the whole research life cycle, from project conception to final paper and data publication, and all actions taken, wiki entries written, and files uploaded on the OSF are timestamped and version controlled. For example, it is possible to document a timestamped hypothesis prior to data collection and analysis to avoid HARKing. More details on the OSF can be learned by viewing Ms. Soderberg's excellent presentation in full; below, I provide a few highlights:
I strongly encourage all scientists to investigate OSF as an option for workflow and data management. The advantage of OSF is that it provides a flexible, robust architecture for many data management challenges. The disadvantage is that it may not fulfill discipline-specific needs of sediment experimentalists. As we continue to develop the SEN Knowledge Base, we will closely follow developments of OSF and other data management platforms.
Raleigh L. Martin
UCLA Dept. of Atmospheric and Oceanic Sciences
---
Click here to view the webinar on the DataOne website.
The webinar had two goals: (1) To outline the issues with existing scientific workflows that can lead to bias and results that are not reproducible, and (2) To introduce the Open Science Framework (OSF) as a tool to overcome these biases and increase the reproducibility of science.
Regarding issues of reproducibility, most scientists are probably aware of the narrow issue of computational reproducibility, i.e., the ability to take the data collected by a team of researchers, perform the same analyses, and reach the same conclusions. Ms. Soderberg described this issue in her talk, but she also described more subtle biases and issues with reproducibility. One issue is publication bias: analyses often change through the course of a project, and only the final (successful) analyses and results are documented, while negative results or dead-end analyses are never captured. Related to publication bias is Hypothesizing After Results are Known, or HARKing. To present a succinct story, publications often present hypotheses as a priori, whereas hypotheses may in fact have been generated after researchers spent significant time poring over the data. In what is known as researcher degrees of freedom, data processing and analytical decisions are often made after seeing and interacting with data, severely increasing the potential for false positive outcomes, often outside of the awareness of a researcher. (For further discussion of reproducibility problems, I suggest the enlightening recent special issue of Nature on this topic.)
In response to these various potential sources of bias, the OSF, a free web-based resource for data and workflow management, builds in mechanisms to reduce (or at least document) potential sources of research bias. The OSF is meant to be used through the whole research life cycle, from project conception to final paper and data publication, and all actions taken, wiki entries written, and files uploaded on the OSF are timestamped and version controlled. For example, it is possible to document a timestamped hypothesis prior to data collection and analysis to avoid HARKing. More details on the OSF can be learned by viewing Ms. Soderberg's excellent presentation in full; below, I provide a few highlights:
- OSF pages can be public or private, and there is granular control over access to individual pages and sections for collaborators or the general public. Public projects are fully searchable.
- Built-in tools smooth the collaboration process. One can create templates for common file types, and projects can be "forked" to create copies of files/folders with original content intact.
- Third-party software such as GitHub, Google Drive, and FigShare can be seamlessly integrated through add-ons. This is especially useful for large files that exceed the current 128 MB limit for individual files stored with OSF (no total storage limit across all files). The one catch is that, while all file versions uploaded directly to OSF are stored permanently, linked third-party content remains stored with third parties subject to their version control/storage policies. Nonetheless, OSF does keep track of all version changes (even if it does not keep the original files).
- Permanent identifiers (GUIDs) are assigned to projects created on OSF. Other unique identifiers (e.g., DOIs, ORCID, LinkedIn) can be assigned to projects and/or researchers.
- Versions of a project can be "registered" at a fixed point in time, such as when submitting an article for publication. Registered versions become read-only and fully include all linked (third-party) content, so a registered project can provide a stable data/workflow accompaniment to a published journal article. Registered versions can remain private for an embargo period of up to four years. Once public, registered projects can be assigned a DOI.
- Data sustainability is extremely important to OSF. In case the Center for Open Science disappears, a "sustainability fund" has been established to maintain existing data in a read-only format indefinitely.
- Public projects are fully searchable.
I strongly encourage all scientists to investigate OSF as an option for workflow and data management. The advantage of OSF is that it provides a flexible, robust architecture for many data management challenges. The disadvantage is that it may not fulfill discipline-specific needs of sediment experimentalists. As we continue to develop the SEN Knowledge Base, we will closely follow developments of OSF and other data management platforms.
Raleigh L. Martin
UCLA Dept. of Atmospheric and Oceanic Sciences
---
Click here to view the webinar on the DataOne website.
Wednesday, September 30, 2015
SEN Fall Newsletter
Dear Experimentalists,
We hope everyone had a productive summer and are getting geared up for an exciting fall with the Sediment Experimentalist Network.
This issue contains the following:
Graduate Student/Early Career AGU Travel Grant Contest
The Sediment Experimentalist Network (SEN) is sponsoring a data-sharing contest for graduate students and early career scientists who feel passionate about making their data public. The top three winners will be awarded travel grants in the amount of $1000 for use towards the 2015 American Geophysical Union (AGU) Fall Meeting. The deadline for entries has been extended to October 15, 2015.
Binghamton Geomorphology Symposium Update
This September, SEN attended the 46th annual Binghamton Geomorphology Symposium hosted by the University of Buffalo. This year’s theme was Experiments in Geomorphology and featured tours of various lab facilities and talks covering a wide range of experiments (photos). SEN’s own Brandon McElroy presented a talk on our recent Geomorphology paper. Wonsuck Kim, Raleigh Martin, and Kim Miller presented posters, which can be viewed here.
EC3 Field Trip Report
SEN team member, Raleigh Martin, recently attended a field trip hosted by EarthCube Building Block EC3, Earth-Centered Communication for Cyber-infrastructure, which focuses on the challenges of field data collection, management, and integration. Check out the blog post to read about what we as experimentalist can learn from the field about data sharing.
New Features and Updates on sedexp.net
The Knowledge Base/Wiki at www.sedexp.net now has an entry category for “Lab Facility”, which can be linked to equipment entries. Use these entries to promote your lab or find other lab facilities for collaborations.
Also, there have been several new entries over the last month including: Erosional landscape topography by Kristin Sweeney, Field saltation observations by Raleigh Martin, and Data for experiments in high-intensity bedload transport by Ricardo Hernandez.
For up to date information about SEN, please check out our blog athttp://sedimentexperiments.blogspot.com/ and follow us on Twitter (@sedimentexp).
Happy experimenting,
The Sediment Experimentalist Network
http://workspace.earthcube.org/sen
We hope everyone had a productive summer and are getting geared up for an exciting fall with the Sediment Experimentalist Network.
This issue contains the following:
- Graduate Student/Early Career AGU Travel Grant Contest
- Binghamton Geomorphology Symposium Update
- EC3 Field Trip Report
- New Features and Updates on sedexp.net
Graduate Student/Early Career AGU Travel Grant Contest
The Sediment Experimentalist Network (SEN) is sponsoring a data-sharing contest for graduate students and early career scientists who feel passionate about making their data public. The top three winners will be awarded travel grants in the amount of $1000 for use towards the 2015 American Geophysical Union (AGU) Fall Meeting. The deadline for entries has been extended to October 15, 2015.
Binghamton Geomorphology Symposium Update
This September, SEN attended the 46th annual Binghamton Geomorphology Symposium hosted by the University of Buffalo. This year’s theme was Experiments in Geomorphology and featured tours of various lab facilities and talks covering a wide range of experiments (photos). SEN’s own Brandon McElroy presented a talk on our recent Geomorphology paper. Wonsuck Kim, Raleigh Martin, and Kim Miller presented posters, which can be viewed here.
EC3 Field Trip Report
SEN team member, Raleigh Martin, recently attended a field trip hosted by EarthCube Building Block EC3, Earth-Centered Communication for Cyber-infrastructure, which focuses on the challenges of field data collection, management, and integration. Check out the blog post to read about what we as experimentalist can learn from the field about data sharing.
New Features and Updates on sedexp.net
The Knowledge Base/Wiki at www.sedexp.net now has an entry category for “Lab Facility”, which can be linked to equipment entries. Use these entries to promote your lab or find other lab facilities for collaborations.
Also, there have been several new entries over the last month including: Erosional landscape topography by Kristin Sweeney, Field saltation observations by Raleigh Martin, and Data for experiments in high-intensity bedload transport by Ricardo Hernandez.
For up to date information about SEN, please check out our blog athttp://sedimentexperiments.blogspot.com/ and follow us on Twitter (@sedimentexp).
Happy experimenting,
The Sediment Experimentalist Network
http://workspace.earthcube.org/sen
Tuesday, September 29, 2015
Raleigh Martin at EC3 Workshop 2015
Recently I participated in the EarthCube funded EC3
(Earth-Center Communication for Cyberinfrastructure) workshop at Yosemite
National Park and Owens Valley, California.
The workshop brought together a mix of geoscientists and computer
scientists to address challenges in field data collection and to brainstorm
cyberinfrastructure solutions to make field data collection easier, more
efficient, and more likely to result in useful long-term data preservation.
My own work encompasses both laboratory experiments and
fieldwork on active sediment transport processes. Through my engagement with SEN (Sediment
Experimentalists Network), I have already thought substantially about
laboratory issues, so participation in the EC3 trip gave me a chance to think
more about field data. To my somewhat
surprise, the idea of “fieldwork” varies vastly among domains. Whereas fieldwork for me primarily
encompasses collection of instrumental time series records, during the EC3 trip
the focus was on mapping of geological structures and stratigraphy.
Despite my somewhat outsider status, I learned several
lessons from the EC3 field trip, which I hope to share with the SEN community:
1)
The most effective development of geoscience
cyberinfrastructure occurs when software developers and geoscientists are tied
together at every step of the development process. Otherwise, there is a danger that computer
tools will not be compatible with the way that scientists actually do their
work. For example, tablet-based apps
might one day replace the field notebook, but only if they accommodate the
free-form sketches that don’t fit neatly into metadata categories.
2)
Research progresses in an unpredictable, heterogeneous,
iterative, and “messy” way that makes the adoption of uniform, comprehensive
cyberinfrastructure and database tools impossible. I could see this in how much my concept of
“fieldwork” differed from other workshop participants. Rather than seeking a grand solution to all
of our data problems, we’re better off building smaller-scale solutions for
specific applications, then linking these applications through semantics, i.e.,
clear, machine-readable assignments of meaning that allow computers to link
together heterogeneous databases into shared resources.
3)
Computer scientists actually enjoy our data
problems and view them as research challenges!
They are not simply contractors for hire to build specific pieces of
software. As geoscientists, we can view
work with computer scientists as research collaboration, which includes
applying for grants together and writing papers together. This will also make the development of
cyberinfrastructure feel more like fun and less like a chore. The EARTHTIME project is one great example of
the synergies to be found between geoscientists and computer scientists.
These lessons are my own personal opinions, and I’m open to
debate with those who might disagree! I
encourage comments on these ideas and perhaps even further blog posts by
members of the Sediment Experimentalist Network on this topic of development of
cyberinfrastructure for the geosciences.
Wednesday, September 2, 2015
DEADLINE EXTENDED: SEN AGU Graduate Student/Early Career Travel Grants
The
graduate student and early career travel grant contest deadline has been extended
to October 15th! We are sponsoring a data-sharing contest for
those who feel passionate about making their data public (more details below).
The top three winners will be awarded travel grants in the amount of $1000 for
use towards the 2015 AGU Fall Meeting. Post your entries to sedexp.net and submit your application!
TRAVEL GRANT CONTEST DETAILS
The Sediment Experimentalist Network (SEN) is sponsoring a
data-sharing contest for graduate students and early career scientists who feel
passionate about making their data public. The top three winners will be
awarded travel grants in the amount of $1000 for use towards the 2015 American Geophysical
Union (AGU) Fall Meeting.
The Sediment Experimentalist Network (SEN) is funded by the U.S.
National Science Foundation (NSF) EarthCube program as a Research Coordination
Network (RCN). SEN integrates the efforts of sediment experimentalists to build
a knowledge base for data collection and management. The network facilitates
cross-institutional collaborative experiments and communicates with the
research community about data and metadata guidelines for sediment-based
experiments. This effort aims to improve the efficiency and transparency of
sedimentary research for field geologists and modelers as well as
experimentalists. More information is available here: http://earthcube.org/group/sen
The contest will be judged on the quantity and quality of
participation in the SEN Knowledge Base (www.sedexp.net),
which contains data catalog entries and descriptions of experimental setups,
methods, equipment. To begin, create an account on the website and then start
creating entries for your experiments. The more entries, the more likely you
are to win!
Eligibility
This contest is open to current graduate students and early career
scientists (within 5 years of graduating) who are interested in helping make
data more accessible.
Requirements
1.
Sign up
for the SEN Newsletter: http://goo.gl/s7dLjb
2.
Create a
Knowledge Base account at www.sedexp.net
3.
Start posting entries of your experimental data, set-ups,
methods, and equipment.
4.
Send a
one-page document to sedimentexp@gmail.com as described below
Contest Entry
To enter the contest, please send a one-page document containing
contact information, short professional biography, and a list of your SEN
Knowledge Base entries to sedimentexp@gmail.com.
Selection of Winners
Winners will be selected on the quality (completeness of entry)
and quantity (total number of entries) of entries to the SEN Knowledge Base.
Winners will be notified via email and will be given instructions on the
funding process. Names of winners will also be featured in the upcoming SEN
Newsletter. Winners should acknowledge funding from NSF SEN when presenting
their work at AGU.
REVISED Timeline
June 15th: Contest opens
October 15th: All entries must be received
November 1st: Notification of winners
Questions? Please contact SEN at sedimentexp@gmail.com.
Thursday, August 20, 2015
SEN Travel Grant and Upcoming Events
Dear
Experimentalists,
Hope you all have had
a productive summer. We just want to send a quick reminder about two
upcoming SEN activities. A full newsletter will be sent out at the end of
September.
SEN AGU Travel Grants
The graduate student
and early career travel grant contest deadline is soon approaching on August 31st.
We are sponsoring a data-sharing contest for those who feel passionate about
making their data public. The top three winners will be awarded travel grants
in the amount of $1000 for use towards the 2015 AGU Fall Meeting. Post
your entries to sedexp.net and submit
your application!
Questions? Get more information at http://sedimentexperiments.blogspot.com/2015/06/sen-graduate-student-and-early-career.html
or contact SEN at sedimentexp@gmail.com.
Binghamton Geomorphology Symposium
Members of the SEN
leadership team will be presenting a talk and posters at this year’s upcoming
Binghamton Geomorphology Symposium, taking place on Friday, September 18 to
Sunday, September 20, 2015 in Buffalo, NY, with the theme of Laboratory
Experiments in Geomorphology.
See the second
circular and more information at https://www.ubevents.org/event/bgs46.
Can’t make the
symposium? Not to worry; SEN presentation slides and posters will be made
available in the next newsletter.
For up to date
information about SEN, please check out our blog at http://sedimentexperiments.blogspot.com/ and
follow us on Twitter (@sedimentexp).
Happy experimenting,
The Sediment
Experimentalist Network
http://workspace.earthcube.org/sen
Subscribe to:
Posts (Atom)