+ - 0:00:00
Notes for current slide
Notes for next slide

Creating and Sharing Code for Reproducible Research and Scalable Impact

Making your code and research make a difference

Robin Lovelace, 10DS Fellow, Turing Fellow

Leeds’ Institute for Transport Studies

ADR Fellows Code sharing Workshop, 2021-01-21 (updated: 2022-01-19) Reproducible source code: github.com/Robinlovelace/presentations

1 / 11

The problem

Source: 'Inside the black box' report

2 / 11

The problem

Source: 'Inside the black box' report

  • Black boxes obfuscate methods, reduce trust in research, stifle innovation and reduce the ability of future work to build on (and cite) your research
2 / 11

The problem

Source: 'Inside the black box' report

  • Black boxes obfuscate methods, reduce trust in research, stifle innovation and reduce the ability of future work to build on (and cite) your research

  • Academic research was developed at a time when computers, let alone open source software and reproducible code, were available: black boxes are the norm

2 / 11

The solution: get your code out there

Source: Lovelace, Robin 2021: Open Source Tools for Geographic Analysis in Transport Planning. Journal of Geographical Systems. https://doi.org/10.1007/s10109-020-00342-2, accessed January 17, 2021.

3 / 11

Preparing code for publication

At a minum

  • Code hosting website (e.g. GitHub)
  • Good README with instructions to run code (+optional badges)
  • Minimum example (synthetic?) input dataset
  • Clear directory structure
  • Select and follow a style guide
  • Small readable 'chunks' (functions/scripts)

Advanced

  • Active issue tracker
  • Packaging + documentation
  • Code review
  • Continuous integration
  • Community chat (e.g. discord)

Example: stplanr

rstudio mirror
downloads CRAN\_Status\_Badge lifecycle R-CMD-check

Demo of packaged code stplanr (Lovelace and Ellison 2018) https://docs.ropensci.org/stplanr/

Example of code for a paper: https://github.com/Robinlovelace/odjitter

4 / 11

Beyond code 1: dissemination

Source: https://twitter.com/robinlovelace/status/1351477455203299328

5 / 11

Beyond code 2: open access tools

  • Outlines 'network effects' of open research and putting things 'out there'
  • Conclusion: open access is particularly important for policy relevant research
6 / 11

Case study of publishing code

Code processing confidential data hosted online: https://github.com/npct/pct-scripts/blob/master/03.2_school_prepare_OD_file.R

Source: Goodman, et al. 2019. Scenarios of Cycling to School in England, and Associated Health and Carbon Impacts: Application of the ‘Propensity to Cycle Tool.’ Journal of Transport & Health 12: 263–278.

7 / 11

Aggregate outputs published

  • We commissioned a dataset from DfE, sensitive data, processed securely
  • Open code and (more importantly) aggregated derived data increased impact

8 / 11

Skills and lessons learned along the way

Skills

General

  • Communication (beyond usual ppl)
  • Forward planning
  • Agile workflow

Technical

  • Version control (Git/GitHub)
  • Kanban boards
  • Receiving feedback on code
  • Integrating manuscript prose + code, with RMarkdown and Quarto

Lessons

  • Get collaborators onboard
  • Don't be afraid to publish 'unfinished' code

  • Publishing code and contributing online can lead to surprising benefits

    • E.g. link with Italian PhD student and paper
  • Ask questions of the community
9 / 11

Beyond code 3: Community engagement

Sharing code happens in context of open source communities

They are usually friendly communities

Getting involved can lead to collaborations

Source: https://github.com/r-spatial/sf/issues/966

milan_car_crashes <- data.frame(
ID = 1:5, # Reproducible example:
X = c(1513037, 1513008, 1515473, 1514039, 1515748),
Y = c(5034945, 5034750, 5036177, 5036820, 5037396)
)

Resulting Publication: Gilardi, A., Mateu, J., Borgoni, R., Lovelace, R., 2022. Multivariate hierarchical analysis of car crashes.

10 / 11

Thanks for listening!

Further information: robinlovelace.net + @robinlovelace on Twitter

see slides infoRming policy talk for R-focussed take.

11 / 11

The problem

Source: 'Inside the black box' report

2 / 11
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow