- Introductory comments
- An example: the propensity to cycle tool
- dplyr and tibbles (if time allows)
- Discussion: can GDS save the world?
2017-05-02
Many ways of saying the same thing:
My definition: building an evidence-base for sustainable systems.
I urge everyone to read this series of tweets: it is clear that the govt is incompetent, the prime minister is deluded, & we are damned. https://t.co/aaNYvSpxK9
— Michael Cashman (mcashmanCBE) April 30, 2017
Rapid Ice Melt In East Antarctica Could Lead to 11-Foot Rise In Global Sea Level, Scientists Say https://t.co/3jYOQhyeUL pic.twitter.com/QdTOQNC6m7
— Weather Underground (( ???)) December 20, 2016
Machine learning is "is finding commercial applications that range from self-driving cars to websites that recommend products on the basis of a user's browsing history" (Castelvecchi 2016).
Where does the (data) scientist come in in this context?
Before scientists can trust AI, they first need to understand how machines learn: https://t.co/DekQuyP8YE pic.twitter.com/Phqtj3mRc1
— Nature News&Comment (NatureNews) May 1, 2017
Code example:
d = frame_data( ~Attribute, ~GIS, ~GDS, "Home disciplines", "Geography", "Geography, Computing, Statistics", "Software focus", "Graphic User Interface", "Code", "Reproduciblility", "Minimal", "Maximal" )
knitr::kable(d)
Attribute | GIS | GDS |
---|---|---|
Home disciplines | Geography | Geography, Computing, Statistics |
Software focus | Graphic User Interface | Code |
Reproduciblility | Minimal | Maximal |
Reasoning:
“Their very spirit undergoes a pervasive transformation,” and they finally end up as “experts at exchanging smiles, handshakes, and favors.” (Reclus 2013, original: 1898)
Tool | Scale | Coverage | Public access | Format of output | Levels of analysis | Software licence |
---|---|---|---|---|---|---|
Propensity to Cycle Tool | National | England | Yes | Online map | A, OD, R, RN | Open source |
Prioritization Index | City | Montreal | No | GIS-based | P, A, R | Proprietary |
PAT | Local | Parts of Dublin | No | GIS-based | A, OD, R | Proprietary |
Usage intensity index | City | Belo Horizonte | No | GIS-based | A, OD, R, I | Proprietary |
Bicycle share model | National | England, Wales | No | Static | A, R | Unknown |
Cycling Potential Tool | City | London | No | Static | A, I | Unknown |
Santa Monica model | City | Santa Monica | No | Static | P, OD, A | Unknown |
Robin Lovelace (Lead Developer, University of Leeds)
Included in Cycling and Walking Infrastructure Strategy (CWIS) and the Local Cycling and Walking Infrastructure Plan (LCWIP)
Shows on the map where there is high cycling potential, for 4 scenarios of change
Data analysts and 'scientists': don't wrangle, munge or 'hack' your valuable datasets. Use #datacarpentry: https://t.co/gXrlIJH91R pic.twitter.com/GSWS7O7zBz
— Robin Lovelace (robinlovelace) February 20, 2017
The humble data frame is at the heart of most analysis projects:
d = data.frame(x = 1:3, y = c("A", "B", "C")) d
## x y ## 1 1 A ## 2 2 B ## 3 3 C
In reality this is a list, making function work on each column:
summary(d)
## x y ## Min. :1.0 A:1 ## 1st Qu.:1.5 B:1 ## Median :2.0 C:1 ## Mean :2.0 ## 3rd Qu.:2.5 ## Max. :3.0
plot(d)
In base R, there are many ways to subset:
d[1,] # the first line
## x y ## 1 1 A
d[,1] # the first column
## [1] 1 2 3
d$x # the first column
## [1] 1 2 3
d [1] # the first column, as a data frame
## x ## 1 1 ## 2 2 ## 3 3
Recently the data frame has been extended:
library("tibble") dt = tibble(x = 1:3, y = c("A", "B", "C")) dt
## # A tibble: 3 × 2 ## x y ## <int> <chr> ## 1 1 A ## 2 2 B ## 3 3 C
It comes down to efficiency and usability
Like tibbles, has advantages over historic ways of doing things
[
do everythingghg_ems %>% filter(!grepl("World|Europe", Country)) %>% group_by(Country) %>% summarise(Mean = mean(Transportation), Growth = diff(range(Transportation))) %>% top_n(3, Growth) %>% arrange(desc(Growth))
# dplyr must be loaded with library(dplyr)
wb_ineq %>% filter(grepl("g", Country)) %>% group_by(Year) %>% summarise(gini = mean(gini, na.rm = TRUE)) %>% arrange(desc(gini)) %>% top_n(n = 5)
vs
top_n( arrange( summarise( group_by( filter(wb_ineq, grepl("g", Country)), Year), gini = mean(gini, na.rm = TRUE)), desc(gini)), n = 5)
Only 1 way to do it, making life simpler:
select(dt, x) # select columns
## # A tibble: 3 × 1 ## x ## <int> ## 1 1 ## 2 2 ## 3 3
slice(dt, 2) # 'slice' rows
## # A tibble: 1 × 2 ## x y ## <int> <chr> ## 1 2 B
u_pct = "https://github.com/npct/pct-data/raw/master/west-yorkshire/l.Rds" if(!file.exists("l.Rds")) download.file(u_pct, "l.Rds") library(stplanr)
## Loading required package: sp
l = readRDS("l.Rds") plot(l)
sel_walk = l$foot > 9 l_walk = l[sel_walk,] plot(l) plot(l_walk, add = T, col = "red")
library(dplyr) # for next slide...
l_walk1 = l %>% filter(All > 10) # fails
library(sf)
## Linking to GEOS 3.5.1, GDAL 2.1.3, proj.4 4.9.2, lwgeom 2.3.2 r15302
l_sf = st_as_sf(l) plot(l_sf[6])
l_walk2 = l_sf %>% filter(foot > 9) plot(l_sf[6]) plot(l_walk2, add = T)
l_sf$distsf = as.numeric(st_length(l_sf)) l_drive_short2 = l_sf %>% filter(distsf < 1000) %>% filter(car_driver > foot)
library(tmap) tmap_mode("view")
## tmap mode set to interactive viewing
qtm(l_drive_short2)
It is clear that geographical research can have large policy impacts.
But many questions remain:
r.lovelace@leeds.ac.uk
or @robinlovelace
Castelvecchi, Davide. 2016. “Can We Open the Black Box of AI?” Nature News 538 (7623): 20. doi:10.1038/538020a.
Lovelace, Robin, Anna Goodman, Rachel Aldred, Nikolai Berkoff, Ali Abbas, and James Woodcock. 2017. “The Propensity to Cycle Tool: An Open Source Online System for Sustainable Transport Planning.” Journal of Transport and Land Use 10 (1). doi:10.5198/jtlu.2016.862.
Reclus, Elisée. 2013. Anarchy, Geography, Modernity: Selected Writings of Elisée Reclus. Edited by John Clark and Camille Martin. Oakland, CA: PM Press.