+ - 0:00:00
Notes for current slide
Notes for next slide

Efficient WoRkflows

🚀
getting more done with R

Robin Lovelace

University of Leeds

2019-12-05 (updated: 2019-12-11)

1 / 24
2 / 24
# to reproduce these slides do:
pkgs = c("rgdal", "sf", "geojsonsf")
install.packages(pkgs)
2 / 24
# to reproduce these slides do:
pkgs = c("rgdal", "sf", "geojsonsf")
install.packages(pkgs)
2 / 24
# to reproduce these slides do:
pkgs = c("rgdal", "sf", "geojsonsf")
install.packages(pkgs)

Source goodhousekeeping.com

2 / 24
# to reproduce these slides do:
pkgs = c("rgdal", "sf", "geojsonsf")
install.packages(pkgs)

Source goodhousekeeping.com

The Best Self-Help Books to Read in 2019, According to Amazon and Goodreads

2 / 24

There are loads of self help books out there

3 / 24

Originally published in 2001, it has gained huge popularity

4 / 24

2008 book by Scott Burkun aimed at software architects

Too techy for me, about managing 100s of people!

5 / 24

2015 book by Daniel Levitin

Too techy for me, about managing 100s of people!

6 / 24

Open source book

Published early 2016

All the things I wish I knew

Handy chapters:

Efficient set-up

Efficient workflow

Efficient collaboration

6 / 24

From code to impact

7 / 24

From code to impact

Source - https://github.com/npct which hosts national web tool PCT www.pct.bike

7 / 24

2017 book by Mark Reed

Types of efficiency

8 / 24

Types of efficiency

Programmer efficiency

This is the amount of useful work a person (not a computer) can do per unit time

8 / 24

Types of efficiency

Programmer efficiency

This is the amount of useful work a person (not a computer) can do per unit time

Computational efficiency

How quickly the computer can undertake a piece of work

8 / 24

Types of efficiency

Programmer efficiency

This is the amount of useful work a person (not a computer) can do per unit time

Computational efficiency

How quickly the computer can undertake a piece of work

One essential object is to choose that arrangement which shall tend to reduce to a minimum the time necessary for completing the calculation. (Ada Lovelace, 1842)

8 / 24

Types of efficiency

Programmer efficiency

This is the amount of useful work a person (not a computer) can do per unit time

Computational efficiency

How quickly the computer can undertake a piece of work

One essential object is to choose that arrangement which shall tend to reduce to a minimum the time necessary for completing the calculation. (Ada Lovelace, 1842)

An efficient light shines bright with a minimum of energy consumption. In this final sense, efficiency has a formal definition as the ratio of work done per unit effort:

η=WQ

8 / 24

Efficient set-up

9 / 24

Efficient set-up

Based on Efficient R programming Chapter 2

9 / 24

R and package versions

Keep your software up-to-date

R.version
## _
## platform x86_64-pc-linux-gnu
## arch x86_64
## os linux-gnu
## system x86_64, linux-gnu
## status
## major 3
## minor 6.1
## year 2019
## month 07
## day 05
## svn rev 76782
## language R
## version.string R version 3.6.1 (2019-07-05)
## nickname Action of the Toes
packageVersion("dplyr")
## [1] '0.8.3'
10 / 24

R and package versions

Keep your software up-to-date

R.version
## _
## platform x86_64-pc-linux-gnu
## arch x86_64
## os linux-gnu
## system x86_64, linux-gnu
## status
## major 3
## minor 6.1
## year 2019
## month 07
## day 05
## svn rev 76782
## language R
## version.string R version 3.6.1 (2019-07-05)
## nickname Action of the Toes
packageVersion("dplyr")
## [1] '0.8.3'
10 / 24

Efficient workflow

11 / 24

Package selection

  • Is it mature?

  • Is it actively developed?

  • Is it well documented?

  • Is it well used?

12 / 24

Package selection

  • Is it mature?

  • Is it actively developed?

  • Is it well documented?

  • Is it well used?

Interest in popularity can go too far

12 / 24

Package selection

  • Is it mature?

  • Is it actively developed?

  • Is it well documented?

  • Is it well used?

Interest in popularity can go too far

See Efficient R Programming 4.4 for details

12 / 24

Benchmarking your code

# 3 ways to read in files representing parks in Leeds
rds = function() readRDS("parks-west-yorkshire.Rds")
rgd = function() rgdal::readOGR(dsn = "parks-west-yorkshire.geojson", verbose = FALSE)
sfr = function() sf::read_sf("parks-west-yorkshire.geojson")
gsf = function() geojsonsf::geojson_sf("parks-west-yorkshire.geojson")
res = bench::mark(check = F, rds(), rgd(), sfr(), gsf(), time_unit = "ms")
res[1:5]
## # A tibble: 4 x 5
## expression min median `itr/sec` mem_alloc
## <bch:expr> <dbl> <dbl> <dbl> <bch:byt>
## 1 rds() 4.75 5.10 188. 356.18KB
## 2 rgd() 642. 642. 1.56 19.9MB
## 3 sfr() 47.5 49.5 19.9 7.89MB
## 4 gsf() 7.52 7.80 125. 1.14MB
13 / 24

Benchmarking your code

# 3 ways to read in files representing parks in Leeds
rds = function() readRDS("parks-west-yorkshire.Rds")
rgd = function() rgdal::readOGR(dsn = "parks-west-yorkshire.geojson", verbose = FALSE)
sfr = function() sf::read_sf("parks-west-yorkshire.geojson")
gsf = function() geojsonsf::geojson_sf("parks-west-yorkshire.geojson")
res = bench::mark(check = F, rds(), rgd(), sfr(), gsf(), time_unit = "ms")
res[1:5]
## # A tibble: 4 x 5
## expression min median `itr/sec` mem_alloc
## <bch:expr> <dbl> <dbl> <dbl> <bch:byt>
## 1 rds() 4.75 5.10 188. 356.18KB
## 2 rgd() 642. 642. 1.56 19.9MB
## 3 sfr() 47.5 49.5 19.9 7.89MB
## 4 gsf() 7.52 7.80 125. 1.14MB

Which to use?

13 / 24

Benchmarking your code

# 3 ways to read in files representing parks in Leeds
rds = function() readRDS("parks-west-yorkshire.Rds")
rgd = function() rgdal::readOGR(dsn = "parks-west-yorkshire.geojson", verbose = FALSE)
sfr = function() sf::read_sf("parks-west-yorkshire.geojson")
gsf = function() geojsonsf::geojson_sf("parks-west-yorkshire.geojson")
res = bench::mark(check = F, rds(), rgd(), sfr(), gsf(), time_unit = "ms")
res[1:5]
## # A tibble: 4 x 5
## expression min median `itr/sec` mem_alloc
## <bch:expr> <dbl> <dbl> <dbl> <bch:byt>
## 1 rds() 4.75 5.10 188. 356.18KB
## 2 rgd() 642. 642. 1.56 19.9MB
## 3 sfr() 47.5 49.5 19.9 7.89MB
## 4 gsf() 7.52 7.80 125. 1.14MB

Which to use?

Package/data format choice -> 200 fold speed-up!

13 / 24

Benchmarking your code

# 3 ways to read in files representing parks in Leeds
rds = function() readRDS("parks-west-yorkshire.Rds")
rgd = function() rgdal::readOGR(dsn = "parks-west-yorkshire.geojson", verbose = FALSE)
sfr = function() sf::read_sf("parks-west-yorkshire.geojson")
gsf = function() geojsonsf::geojson_sf("parks-west-yorkshire.geojson")
res = bench::mark(check = F, rds(), rgd(), sfr(), gsf(), time_unit = "ms")
res[1:5]
## # A tibble: 4 x 5
## expression min median `itr/sec` mem_alloc
## <bch:expr> <dbl> <dbl> <dbl> <bch:byt>
## 1 rds() 4.75 5.10 188. 356.18KB
## 2 rgd() 642. 642. 1.56 19.9MB
## 3 sfr() 47.5 49.5 19.9 7.89MB
## 4 gsf() 7.52 7.80 125. 1.14MB

Which to use?

Package/data format choice -> 200 fold speed-up!

Rds fast + mem efficient, 'gsf' fast, memory hungry, sf slow but ubiquitous, rgdal ...

13 / 24

Benchmarking your code

# 3 ways to read in files representing parks in Leeds
rds = function() readRDS("parks-west-yorkshire.Rds")
rgd = function() rgdal::readOGR(dsn = "parks-west-yorkshire.geojson", verbose = FALSE)
sfr = function() sf::read_sf("parks-west-yorkshire.geojson")
gsf = function() geojsonsf::geojson_sf("parks-west-yorkshire.geojson")
res = bench::mark(check = F, rds(), rgd(), sfr(), gsf(), time_unit = "ms")
res[1:5]
## # A tibble: 4 x 5
## expression min median `itr/sec` mem_alloc
## <bch:expr> <dbl> <dbl> <dbl> <bch:byt>
## 1 rds() 4.75 5.10 188. 356.18KB
## 2 rgd() 642. 642. 1.56 19.9MB
## 3 sfr() 47.5 49.5 19.9 7.89MB
## 4 gsf() 7.52 7.80 125. 1.14MB

Which to use?

Package/data format choice -> 200 fold speed-up!

Rds fast + mem efficient, 'gsf' fast, memory hungry, sf slow but ubiquitous, rgdal ...

For reproducible code, see https://github.com/Robinlovelace/EfficientWorkflows

13 / 24

Benchmark + check results

plot(res)

14 / 24

Always check your results

# see https://geocompr.robinlovelace.net/adv-map.html to learn how to make more maps
mapview::mapview(gsf()) # the parks of West Yorkshire
10 km
10 mi
Leaflet | © OpenStreetMap contributors © CARTO, © OpenStreetMap contributors, Tiles © Esri — Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community, Map data: © OpenStreetMap contributors, SRTM | Map style: © OpenTopoMap (CC-BY-SA)
15 / 24

Other things R can do

In addition to uneatable statistical capabilities

R is also good at...

(again it can go too far)

16 / 24

Other things R can do

In addition to uneatable statistical capabilities

R is also good at...

(again it can go too far)

Example: citr

16 / 24

Going too far

Source: https://xkcd.com/1205/

17 / 24

Reproducibility

  • Not just vital for science, can save time, by reducing the time you spend 'reinventing the wheel'
18 / 24

Efficient collaboration

19 / 24

Dozens of technologies to collaborate

20 / 24

Dozens of technologies to collaborate

But it's all about people

20 / 24

Community

21 / 24

Community tips (Geocomputation with R)

22 / 24

Magic Touch: typing

23 / 24

Magic Touch: typing

Source - Efficient R Programming 1.5.1

23 / 24

The evidence suggests that good diet, physical activity, plenty of sleep and a healthy work-life balance can all boost your speed and effectiveness at work (Jensen 2011; Pereira et al. 2015; Grant, Wallace, and Spurgeon 2013).

While we recommend the reader to reflect on this evidence and their own well-being, this is not a self help book. It is about programming. However, there is one non-programming skill that can have a huge impact on productivity: touch typing.

Thanks!

Contact me at r. lovelace at leeds ac dot uk (email), @robinlovelace (twitter + github)

24 / 24

Thanks!

Contact me at r. lovelace at leeds ac dot uk (email), @robinlovelace (twitter + github)

Check-out my repos at https://github.com/robinlovelace/

24 / 24

Thanks!

Contact me at r. lovelace at leeds ac dot uk (email), @robinlovelace (twitter + github)

Check-out my repos at https://github.com/robinlovelace/

For more information on efficient workflows, see our book Efficient R Programming

24 / 24

Thanks!

Contact me at r. lovelace at leeds ac dot uk (email), @robinlovelace (twitter + github)

Check-out my repos at https://github.com/robinlovelace/

For more information on efficient workflows, see our book Efficient R Programming

Thanks to all the R developers who made this possible, including (for this presentation):

remark.js, knitr, and R Markdown.

Slides created via the R package xaringan.

24 / 24

Thanks!

Contact me at r. lovelace at leeds ac dot uk (email), @robinlovelace (twitter + github)

Check-out my repos at https://github.com/robinlovelace/

For more information on efficient workflows, see our book Efficient R Programming

Thanks to all the R developers who made this possible, including (for this presentation):

remark.js, knitr, and R Markdown.

Slides created via the R package xaringan.

Thanks to everyone for building a open and collaborative communities!

24 / 24
2 / 24
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow