The EarthCube Science Committee is excited to announce the next in its series of "EarthCube Tools" webinars on the GeoDeepDive project, which should be useful for exploring the nooks and crannies of the scientific literature for useful research data and information. See below for more information:
Friday, June 3, at 2 pm EDT
(1 PM CDT, 12 PM MDT, 11 AM PDT/MST, 8 AM HST)
GeoDeepDive: A digital library and infrastructure to support text and data mining
with Shanan Peters, John Czaplewski, Miron Livny, and Ian Ross
University of Wisconsin
Call-in and event details are available here
This webinar will describe GeoDeepDive, a project that is comprehensively scouring data and information from the scientific literature for reuse. Join us for a description and walk-through of this NSF EarthCube-sponsored project and learn how you can start using GeoDeepDive to advance your research.
The published scientific literature contains a large amount of data and information that has utility beyond the scope of the original investigation. For example, fossil occurrences are commonly described in the literature as part of local and regional field work, but literature-based syntheses of millions of fossil occurrences from around the world are required to generate an accurate history of life on Earth. Here we describe GeoDeepDive (GDD), a High Throughput cyberinfrastructure to support the reliable, scalable, and automated fetching of documents from content providers, the preprocessing of those documents by software tools that provide annotations for machine reading, and the indexing of those documents based on known vocabularies of scientific terms.
The GeoDeepDive infrastructure currently contains more than 1.2 million documents (https://geodeepdive.org) from six different content providers and grows at a rate of ~30K documents per week. Software applications can now be written by scientists to extract data and information from these documents using the GeoDeepDive application template and testing datasets. The GDD infrastructure supports the running of supported applications against the whole of the relevant document set; continual updating of the result set occurs as new relevant documents are acquired and processed. The high throughput computing capabilities of HTCondor are critical to the processing of documents and the deployment of new tools against the entire library as they are developed and as the digital library grows.
About the webinar series:
The EarthCube Tools webinar series, organized by the Science Committee of the NSF-sponsored EarthCube program, provides practical demonstrations of how EarthCube projects can help you to collect, access, share, and visualize geoscience data. Each webinar begins with a showcase of an EarthCube funded project followed by ample time for questions and conversation. Wary of EarthCube jargon? Presenters will describe their projects in plain English for scientists in all disciplines who may be unfamiliar with EarthCube. Here’s a chance for you (and your colleagues, team members, and students) to learn about EarthCube and how it can help to advance your scientific work. More information on the webinar series is available here. Archived video will be available on the website about one week after the webinar.
Friday, August 5, 2 pm EDT (1 PM CDT, 12 PM MDT, 11 AM PDT/MST, 8 AM HST)