class: center, middle, inverse, title-slide # Creating and Sharing Code for Reproducible Research and Scalable Impact ## Making your code
and research
make a difference ### Robin Lovelace, 10DS Fellow, Turing Fellow ### Leeds’
Institute for Transport Studies
### ADR Fellows Code sharing Workshop, 2021-01-21 (updated: 2022-01-19) Reproducible source code:
github.com/Robinlovelace/presentations
--- <!-- Talk in context: --> <!-- From 11:05 to 11:10 --> ## The problem ![](https://user-images.githubusercontent.com/1825120/103755051-1b6b2f80-5005-11eb-988b-f7c8d3108e53.png) Source: ['Inside the black box' report](https://www.co.pierce.wa.us/DocumentCenter/View/755/A-GuideToModeling?bidId=) -- - Black boxes obfuscate methods, reduce trust in research, stifle innovation and reduce the ability of future work to build on (and cite) your research -- - Academic research was developed at a time when computers, let alone open source software and reproducible code, were available: black boxes are the norm --- ### The solution: get your code out there ![](https://user-images.githubusercontent.com/1825120/150163421-45f446c7-2f8d-407d-a24b-8ab8911c2870.png) Source: Lovelace, Robin 2021: Open Source Tools for Geographic Analysis in Transport Planning. Journal of Geographical Systems. https://doi.org/10.1007/s10109-020-00342-2, accessed January 17, 2021. --- ## Preparing code for publication .pull-left[ #### At a minum - Code hosting website (e.g. GitHub) - Good README with instructions to run code (+optional badges) - Minimum example (synthetic?) input dataset - Clear directory structure - Select and follow a style guide <!-- - Identify a style guide you like and stick to it --> - Small readable 'chunks' (functions/scripts) #### Advanced - Active issue tracker - Packaging + documentation - Code review - Continuous integration - Community chat (e.g. discord) ] .pull-right[ ### Example: stplanr <!-- ![](https://user-images.githubusercontent.com/1825120/150215724-e57e9969-dad8-4af7-8a6d-98edf99e5819.png) --> [![rstudio mirror downloads](https://cranlogs.r-pkg.org/badges/stplanr)](https://github.com/r-hub/cranlogs.app) [![](https://cranlogs.r-pkg.org/badges/grand-total/stplanr)](https://cran.r-project.org/package=stplanr) [![CRAN\_Status\_Badge](https://www.r-pkg.org/badges/version/stplanr)](https://cran.r-project.org/package=stplanr) [![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html) [![](https://badges.ropensci.org/10_status.svg)](https://github.com/ropensci/software-review/issues/10) [![R-CMD-check](https://github.com/ropensci/stplanr/workflows/R-CMD-check/badge.svg)](https://github.com/ropensci/stplanr/actions) <a href='https://docs.ropensci.org/stplanr/'><img src='https://docs.ropensci.org/stplanr/reference/figures/stplanr.png' align="right" height=215/></a> Demo of packaged code **stplanr** (Lovelace and Ellison 2018) https://docs.ropensci.org/stplanr/ Example of code for a paper: https://github.com/Robinlovelace/odjitter ] --- ## Beyond code 1: dissemination ![](https://user-images.githubusercontent.com/1825120/150164001-5992ea14-efa5-4287-9983-b8ced943ab98.png) Source: https://twitter.com/robinlovelace/status/1351477455203299328 --- ## Beyond code 2: open access tools ![](https://www.robinlovelace.net/publication/lovelace-open-2020/featured.png) - Outlines 'network effects' of open research and putting things 'out there' - Conclusion: open access is particularly important for policy relevant research --- ## Case study of publishing code ![](https://user-images.githubusercontent.com/1825120/150165206-d7f4e45b-eebb-4f10-a458-2fe8b146ba36.png) Code processing confidential data hosted online: https://github.com/npct/pct-scripts/blob/master/03.2_school_prepare_OD_file.R Source: Goodman, et al. 2019. Scenarios of Cycling to School in England, and Associated Health and Carbon Impacts: Application of the ‘Propensity to Cycle Tool.’ Journal of Transport & Health 12: 263–278. --- ## Aggregate outputs published - We commissioned a dataset from DfE, sensitive data, processed securely - Open code and (more importantly) aggregated derived data increased impact ![](https://user-images.githubusercontent.com/1825120/150166166-511aba5b-1ac6-4aa1-a623-d98953d34d7a.png) --- ### Skills and lessons learned along the way .pull-left[ ### Skills General - Communication (beyond usual ppl) - Forward planning - Agile workflow Technical - Version control (Git/GitHub) - Kanban boards - Receiving feedback on code - Integrating manuscript prose + code, with RMarkdown and Quarto ] .pull-right[ ### Lessons - Get collaborators onboard <!-- Talk about working with many people --> - Don't be afraid to publish 'unfinished' code - Publishing code and contributing online can lead to surprising benefits - E.g. link with Italian PhD student and paper - Ask questions of the *community* ] --- .left-column[ #### Beyond code 3: Community engagement Sharing code happens in context of open source communities They are usually friendly communities Getting involved can lead to collaborations Source: https://github.com/r-spatial/sf/issues/966 ] .right-column[ ![](https://user-images.githubusercontent.com/1825120/150217296-39c4d45c-943f-40d4-9b8e-8dce1fd252e9.png) ```r milan_car_crashes <- data.frame( ID = 1:5, # Reproducible example: X = c(1513037, 1513008, 1515473, 1514039, 1515748), Y = c(5034945, 5034750, 5036177, 5036820, 5037396) ) ``` ![](https://user-images.githubusercontent.com/1825120/150217702-075354b0-82bf-42da-8321-60002dff795f.png) Resulting Publication: Gilardi, A., Mateu, J., Borgoni, R., Lovelace, R., 2022. Multivariate hierarchical analysis of car crashes. ] --- # Thanks for listening! ## Further information: robinlovelace.net + @robinlovelace on Twitter see slides infoRming policy talk for R-focussed take. [![](https://user-images.githubusercontent.com/1825120/150219247-7e0ea4d8-7d03-4e7d-8c3d-e7d070e33955.png)](https://www.robinlovelace.net/presentations/c4p-slides.html#1)