Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

ACCE Research Data and Project Management


File naming

10-11 April 2019, University of Sheffield

Dr Anna Krystalli @annakrystalli

1 / 30

Background

Let's face it...

  • There are going to be files

  • LOTS of files

  • The files will change over time

  • The files will have relationships to each other

It'll probably get complicated

2 / 30

3 / 30

Strategy against chaos

File organization and naming is a mighty weapon against chaos

  • Make a file's name and location VERY INFORMATIVE about:

    • what it is,
    • why it exists,
    • how it relates to other things
  • The more things are self-explanatory, the better.

4 / 30

File naming


Names matter

5 / 30

What works, what doesn't?

NO

myabstract.docx
Joe’s Filenames Use Spaces and Punctuation.xlsx
figure 1.png
fig 2.png
JW7d^(2sl@deletethisandyourcareerisoverWx2*.txt

YES

2014-06-08_abstract-for-sla.docx
joes-filenames-are-getting-better.xlsx
fig01_scatterplot-talk-length-vs-interest.png
fig02_histogram-talk-attendance.png
1986-01-28_raw-data-from-challenger-o-rings.txt
6 / 30

Three principles for good (file) names

  1. Machine readable

  1. Human readable

  1. Play well with default ordering

7 / 30

Machine readable

  • Regular expression and globbing friendly

    • Avoid spaces, punctuation, accented characters, case sensitivity
  • Easy to compute on

    • Deliberate use of delimiters
8 / 30

Filtering and search through Globbing

Excerpt of complete file listing:

9 / 30

Example of globbing to filter file listing:

10 / 30

Search using Mac OS Finder search facilities

11 / 30

Search using regex in R

12 / 30

Delimit information with punctuation

Deliberate use of "-" and "_" allows recovery of metadata from the filenames:

  • "_" underscore used to delimit units of metadata I want to access later
  • "-" hyphen used to delimit words so our eyes don't bleed

13 / 30

Splitting filenames by delimiters

This happens to be R but also possible in the shell, Python, etc.

14 / 30

Include important metadata

e.g. I'm saving a number of files of temperature data extracted at different resolutions (res) and for a number of months (month). Including these parameters in the filename allows me to use them to target files to read in.

write.csv(df, paste("variable", res, month, sep ="_"))
df <- read.csv(paste("variable", res, month, sep ="_"))
15 / 30

Recap: machine readable

  • Easy to search for files later
  • Easy to filter file lists based on names
  • Easy to extract info from file names, e.g. by splitting

New to regular expressions and globbing? be kind to yourself and avoid

  • Spaces in file names
  • Punctuation
  • Accented characters
16 / 30

Human readable

  • Name contains info on content

  • Connects to concept of a slug from semantic URLs

17 / 30

Example

Which set of file(name)s do you want at 3 a.m. before a deadline?

18 / 30

Embrace the slug

19 / 30

The R scripts:

01_marshal-data.r
02_pre-dea-filtering.r
03_dea-with-limma-voom.r
04_explore-dea-results.r
90_limma-model-term-name-fiasco.r

The figures left behind:

02_pre-dea-filtering-preDE-filtering.png
03-dea-with-limma-voom-voom-plot.png
04_explore-dea-results-focus-term-adjusted-p-values1.png
04_explore-dea-results-focus-term-adjusted-p-values2.png
...
90_limma-model-term-name-fiasco-first-voom.png
90_limma-model-term-name-fiasco-second-voom.png
20 / 30

Recap: Human readable

  • Easy to figure out what the heck something is, based on its name
21 / 30

Play well with default ordering

  • Put something numeric first
  • Use the ISO 8601 standard for dates
  • Left pad other numbers with zeros
22 / 30

Examples

Chronological order:

23 / 30

Dates

Use the ISO 8601 standard for dates: YYYY-MM-DD

24 / 30

iso_psa

25 / 30

Logical order: Put something numeric first

26 / 30

Left pad other numbers with zeros

If you don’t left pad, you get this:

10_final-figs-for-publication.R
1_data-cleaning.R
2_fit-model.R

which is just sad :(

27 / 30

Recap: Play well with default ordering

  • Put something numeric first

  • Use the ISO 8601 standard for dates

  • Left pad other numbers with zeros

28 / 30

Recap: Three principles for (file) names

  1. Machine readable

  2. Human readable

  3. Play well with default ordering

Go forth and use awesome file names :)

29 / 30

Get back home

30 / 30

Background

Let's face it...

  • There are going to be files

  • LOTS of files

  • The files will change over time

  • The files will have relationships to each other

It'll probably get complicated

2 / 30
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow