stats19 v4.0.0: 45 Years of UK Road Crash Data, Unified
A release that unifies 45 years of UK road crash data into a single, consistent interface
stats19 R package logo and crash data visualisationThe stats19 R package has been updated to version 4.0.0. The main change is a unified column schema that lets you work with 45 years of UK road crash data (1979 to 2024) without running into mismatched column names.
Unified schema
Older data files have columns like carriageway_hazards_historic while newer ones use carriageway_hazards. v4.0.0 detects these variants, merges them into the modern names, and drops the redundant columns.
library(stats19)
crashes = get_stats19(year = 1979:2024, type = "crashes")
Parsing fixes
read_stats19() now builds a custom parser from the CSV header, which removes the warnings about unmatched columns that appeared in previous versions. We also fixed a bug where 2024 latitude and longitude values were truncated to integers.
Missing values
Codes like -1, “Code deprecated”, and “Data missing or out of range” are now standardised to NA during formatting, so is.na() works consistently.
Performance
The package now uses readr Edition 2 by default, which supports multi-threaded parsing. Loading large files is noticeably faster.
New functions
match_tag()joins government TAG cost estimates (RAS4001) to collision dataclean_make(),clean_model(), andclean_make_model()standardise the 2,400+ raw strings in the vehicle dataset
Multi-year downloads
Year ranges now download bulk historic files once and filter efficiently. The 1979 file is also handled correctly (it used to be returned as a catch-all for any older year).
Feedback wanted
We plan to submit to CRAN soon. Please install, test, and report any issues:
pak::pak("ropensci/stats19")
Issues: github.com/ropensci/stats19/issues
Acknowledgements
Contributions from David Ranzolin and Adam Sparks (rOpenSci review), Malcolm Morgan, Layik Hama, and Blaise Kelly. Funding from the RAC Foundation.