@tomjwebb I see tons of spreadsheets that i don't understand anything (or the stduent), making it really hard to share.
— Erika Berenguer (@Erika_Berenguer) January 16, 2015
@tomjwebb @ScientificData "Document. Everything." Data without documentation has no value.
— Sven Kochmann (@indianalytics) January 16, 2015
@tomjwebb Annotate, annotate, annotate!
— CanJFishAquaticSci (@cjfas) January 16, 2015
Document all the metadata (including protocols).@tomjwebb
— Ward Appeltans (@WrdAppltns) January 16, 2015
You download a zip file of #OpenData. Apart from your data file(s), what else should it contain?
— Leigh Dodds (@ldodds) February 6, 2017
"Information that describes, explains, locates, or in some way makes it easier to find, access, and use a resource (in this case, data).""
Backbone of digital curation
Without it, a digital resource may be irretrievable, unidentifiable or unusable
General: Dublin Core Metadata Initiative Specification
NERC Data Centers: Check with individual data centers for their metadata specification.
Re3data.org: Registry of Research Data Repositories.
Most university libraries have assistants dedicated to Research Data Management:
@tomjwebb @ScientificData Talk to their librarian for data management strategies #datainfolit
— Yasmeen Shorish (@yasmeen_azadi) January 16, 2015
Make sure to record units!
methods
documentKeep a dynamic document used to plan, record and write up methods.
@tomjwebb record every detail about how/where/why it is collected
— Sal Keith (@Sal_Keith) January 16, 2015
Any additional information other users would need to combine your data with theirs? Record it
Teaching this course has always felt challenging in terms of practical exercises
Teaching this course has always felt challenging in terms of practical exercises
Teaching this course has always felt challenging in terms of practical exercises
Defining Metadata & explaining importance: ✅
Advising on domain specific Controlled Vocabularies & structure ❌
How can we practice creating metadata?
bringing together scientists, developers, and open data enthusiasts from academia, industry, government, and non-profits to get together for a few days and hack on various projects.
Luckily, a whole bunch of other awesome folks were also thinking about these topics and interested in working on them! 🤩
(in alphabetical order):
dataspice
Package
dataspice
makes it easier for researchers to create basic, lightweight and concise metadata files for their datasets.
csv
filesdataspice
Package
dataspice
makes it easier for researchers to create basic, lightweight and concise metadata files for their datasets.
Metadata collected in csv
files
Metadata fields are based on schema.org
dataspice
Package
dataspice
makes it easier for researchers to create basic, lightweight and concise metadata files for their datasets.
Metadata collected in csv
files
Metadata fields are based on schema.org
dataspice
Package
dataspice
makes it easier for researchers to create basic, lightweight and concise metadata files for their datasets.
Metadata collected in csv
files
Metadata fields are based on schema.org
Helper functions and shinyapps to extract and edit metadata files.
Ability to produce:
https://toolbox.google.com/datasetsearch
dataspice
tutorialdataspice
tutorialThe goal of dataspice-tutorial is a practical exercise in creating metadata for an example field collected data product using package dataspice
.
dataspice
tutorialThe goal of dataspice-tutorial is a practical exercise in creating metadata for an example field collected data product using package dataspice
.
Understand basic metadata and why it is important
Understand where and how to store them
dataspice
tutorialThe goal of dataspice-tutorial is a practical exercise in creating metadata for an example field collected data product using package dataspice
.
Understand basic metadata and why it is important
Understand where and how to store them
Understand how they can feed into more complex metadata objects.
dataspice
workflowThis data product contains the quality-controlled, native sampling resolution data from in-situ measurements of live and standing dead woody individuals and shrub groups, from all terrestrial NEON sites with qualifying woody vegetation.
This data product contains the quality-controlled, native sampling resolution data from in-situ measurements of live and standing dead woody individuals and shrub groups, from all terrestrial NEON sites with qualifying woody vegetation.
Structure and mapping data are reported per individual per plot
Sampling metadata, such as per growth form sampling area, are reported per plot.
dataspice
workshop dataThe data are a trimmed subset of data downladed from the NEON data portal after filtering for:
time periods between 2015-06
- 2016-06
locations within NEON Domain area D01: Northeast
Filter returned data from 2 sites from 2015-6
to 2015-11
.
National Ecological Observatory Network. 2018. Data Products: DP1.10098.001. Provisional data downloaded from http://data.neonscience.org on 2018-05-04. Battelle, Boulder, CO, USA
Plot level data
## # A tibble: 165 x 14## date siteID plotID plotType nlcdClass decimalLatitude## <date> <chr> <chr> <chr> <chr> <dbl>## 1 2015-06-06 BART BART_… tower deciduou… 44.1## 2 2015-07-16 BART BART_… tower deciduou… 44.1## 3 2015-07-21 BART BART_… tower deciduou… 44.1## 4 2015-07-22 BART BART_… tower mixedFor… 44.1## 5 2015-07-22 BART BART_… tower deciduou… 44.1## 6 2015-07-22 BART BART_… tower deciduou… 44.1## 7 2015-07-22 BART BART_… tower deciduou… 44.1## 8 2015-07-23 BART BART_… tower mixedFor… 44.1## 9 2015-07-23 BART BART_… tower deciduou… 44.1## 10 2015-07-28 BART BART_… tower mixedFor… 44.1## # … with 155 more rows, and 8 more variables: decimalLongitude <dbl>,## # treesPresent <lgl>, shrubsPresent <lgl>, lianasPresent <lgl>,## # totalSampledAreaTrees <dbl>, totalSampledAreaShrubSapling <dbl>,## # totalSampledAreaLiana <dbl>, recordedBy <chr>
Individual level data
## # A tibble: 1,799 x 7## date siteID plotID individualID taxonID scientificName recordedBy## <date> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 2015-08-04 BART BART_0… NEON.PLA.D0… TSCA Tsuga canaden… 6HzkzFDdL…## 2 2015-08-04 BART BART_0… NEON.PLA.D0… TSCA Tsuga canaden… 6HzkzFDdL…## 3 2015-07-22 BART BART_0… NEON.PLA.D0… FAGR Fagus grandif… zODC+zTh3…## 4 2015-08-26 BART BART_0… NEON.PLA.D0… FAGR Fagus grandif… zODC+zTh3…## 5 2015-08-04 BART BART_0… NEON.PLA.D0… PICEA Picea sp. 6HzkzFDdL…## 6 2015-08-04 BART BART_0… NEON.PLA.D0… TSCA Tsuga canaden… 0uwWHUCkG…## 7 2015-07-22 BART BART_0… NEON.PLA.D0… FAGR Fagus grandif… zODC+zTh3…## 8 2015-08-12 BART BART_0… NEON.PLA.D0… FAGR Fagus grandif… jRr6tAEXv…## 9 2015-07-22 BART BART_0… NEON.PLA.D0… FAGR Fagus grandif… jRr6tAEXv…## 10 2015-08-25 BART BART_0… NEON.PLA.D0… FAGR Fagus grandif… 0uwWHUCkG…## # … with 1,789 more rows
or head to the tutorial if working through this on your own
eg. create Ecological Metadata Language (EML) metadata using r pkg EML
.
--
eg. create Ecological Metadata Language (EML) metadata using r pkg EML
.
--
reposit your data at KNB
allows richer search and presentation of metadata
@tomjwebb I see tons of spreadsheets that i don't understand anything (or the stduent), making it really hard to share.
— Erika Berenguer (@Erika_Berenguer) January 16, 2015
@tomjwebb @ScientificData "Document. Everything." Data without documentation has no value.
— Sven Kochmann (@indianalytics) January 16, 2015
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |