8 min read

Aggregating lines, part II

The previous post demonstrated a new method to aggregate overlapping lines. It showed how to combine 2 lines that have an area of overlap. More excitingly, it led to the creation of a new function in stplanr, overline_sf(), that lives in the development version of the package. The purpose of this post is to provide an update on the status of the work to refactor the overline() function, in a human friendly alternative to discussion in the relevant GitHub issue: https://github.com/ropensci/stplanr/issues/273

Set-up

To re-cap, these are the package versions we’ll be using in this post (run this line to get the development version of stplanr if you want to reproduce the results):

remotes::install_github("ropensci/stplanr", ref = "refactor-overline")

The library is loaded and the input data from the previous post was loaded as follows:

library(stplanr)
routes_fast_sf$value = 1
sl = routes_fast_sf[2:3, ]
sl$value = c(2, 5)
overline_sf2 = function(sl, attrib) {
  attrib = "value"
  attrib1 = paste0(attrib, ".1")
  sl_intersection = sf::st_intersection(sl[1, attrib], sl[2, attrib])
  sl_intersection[[attrib]] = sl_intersection[[attrib]] + sl_intersection[[attrib1]]
  sl_intersection[[attrib1]] = NULL
  sl_seg = sf::st_difference(sl[attrib], sf::st_geometry(sl_intersection))
  rnet = rbind(sl_intersection, sl_seg)
  return(rnet)
}

This was used as the basis of a demonstration of how overline works, demonstrated in the following code chunk and resulting plot:

rnet = overline_sf(sl, "value")
plot(rnet["value"], lwd = rnet$value)

plot(sl["value"], lwd = sl$value, col = sf::sf.colors(2, alpha = 0.5))

Dealing with many lines

The above method worked with 2 lines but how can it be used to process many lines? Clearly the same function could be implemented on another line, but it would need to work from the 3 lines of the newly created rnet object rather than the original 2 routes. Let’s introduce a 3rd route into the equation, that does not intersect with this newly created rnet object:

sl3 = routes_fast_sf[4, ]
rnet = overline_sf2(rnet, sl)
plot(rnet$geometry, lwd = rnet$value)
plot(sl3, col = "red", add = TRUE)

In this case the method of adding to rnet is simple: just add the entirety of the line to the rnet object:

attrib = "value"
rnet3 = rbind(rnet, sl3[attrib])
plot(rnet3$geometry, lwd = rnet3$value)

This works fine. In fact it works better than the original overline function because it does not add the value of the existing thickest line in the previous figure onto the new line, a problem associated with overline.sp() that is illustrated in the following code chunk and resulting figure:

sl1_3 = as(routes_fast_sf[2:4, ], "Spatial")
rnet3_sp = overline(sl1_3, attrib = "value")
plot(rnet3_sp, lwd = rnet3_sp$value)

A question that arises from the previous example is this: what if the next line intersects with the route network? It is no longer possible to simply add together two values. This can be illustrated by introducing 2 more lines:

sl4_5 = routes_fast_sf[5:6, ]
plot(rnet3$geometry, lwd = rnet3$value)
plot(sl4_5$geometry, col = "red", add = TRUE)

Both the new lines intersect with the newest part of the route network. This means that we cannot simply rbind() them to it as we did for sl3. They need to be dealt with separately.

Before we deal with them, it’s worth taking some time to consider what we mean by ‘intersect’. Intersection is actuall a specific type of geometric relation between 2 sets of features. We can see the type of relation by using the function st_relate():

relations = sf::st_relate(sl4_5, rnet3)
relations
#>      [,1]        [,2]        [,3]        [,4]        [,5]       
#> [1,] "FF1FF00F2" "FF1FF0102" "FF1FF0102" "FF1F00102" "1F1F00102"
#> [2,] "FF1FF00F2" "FF1FF0102" "FF1FF0102" "FF1F00102" "1F1F00102"
unique(as.vector(relations))
#> [1] "FF1FF00F2" "FF1FF0102" "FF1F00102" "1F1F00102"

This shows us something important: although 2 elements (1 and 4) of rnet relate in some way to the new lines, only the 4th feature has a linear, overlapping relation with it. That relation is 1F1F00102 which, as far as I can tell, is not a named spatial predicate (FF1F00102 means ‘intersects and touches’ but does not have a linear overlap). This relation is what we need to decide whether or not to simply bind a new feature to the growing rnet, whether we need to break it up (or at least part of it) into smaller lines before doing so (it also raises the wider question of which order should we do things).

In the simple case of whether to simply bind the next line (4) onto rnet3 the answer is simple now we know the string code associated with linear overlaps. First we’ll test it on the previous example of sl3 and the original rnet composed of 3 features:

relate_rnet_3 = sf::st_relate(rnet, sl3, pattern = "1F1F00102")
relate_rnet_3
#> Sparse geometry binary predicate list of length 4, where the predicate was `relate_pattern'
#>  1: (empty)
#>  2: (empty)
#>  3: (empty)
#>  4: (empty)
any(lengths(relate_rnet_3))
#> [1] FALSE

The FALSE meant there was no linear overlaps. So we simply used rbind(). When we ask the same question of rnet3 and sl4, however, the answer is TRUE:

sl4 = sl4_5[1, ]
relate_rnet_4 = sf::st_relate(rnet3, sl4, pattern = "1F1F00102")
relate_rnet_4
#> Sparse geometry binary predicate list of length 5, where the predicate was `relate_pattern'
#>  1: (empty)
#>  2: (empty)
#>  3: (empty)
#>  4: (empty)
#>  5: 1
any(lengths(relate_rnet_4))
#> [1] TRUE

How to proceed? We need to avoid rnet objects containing any overlapping lines. Because sl4 overlaps with part of rnet3 we will need to remove the overlapping line, run the overline_sf2() function, and then re-combine the result with the pre-existing route network object. We can split-up the rnet3 object into overlapping and non-overlapping features as follows:

sel_overlaps = lengths(relate_rnet_4) > 0
rnet_overlaps = rnet3[sel_overlaps, ]
rnet3_tmp = rnet3[!sel_overlaps, ]

We can check that there is only one overlapping feature as follows:

nrow(rnet_overlaps)
#> [1] 1

And we can proceed to join the two features together using our new function:

rnet_overlaps4 = overline_sf2(rbind(rnet_overlaps, sl4[attrib]))

Adding this back to the rnet3 object results in an larger rnet object incorporating all the value and geometry column data we have so far:

rnet = rbind(rnet3_tmp, rnet_overlaps4)
plot(rnet$geometry, lwd = rnet$value)

The information provided so far informed the creation of the following function:

stplanr::overline_sf
#> function (sl, attrib, fun = sum) 
#> {
#>     rnet = sl[1, attrib]
#>     for (i in 2:nrow(sl)) {
#>         sel_overlaps = overlaps(rnet, sl[i, ])
#>         if (!any(sel_overlaps)) {
#>             rnet = rbind(rnet, sl[i, attrib])
#>         }
#>         else {
#>             rnet_no_overlap = rnet[!sel_overlaps, ]
#>             rnet_overlap = rnet[sel_overlaps, ]
#>             if (sum(sel_overlaps) == 1) {
#>                 rnet_new = overline_sf2(rnet_overlap, sl[i, ], 
#>                   attrib = attrib, fun = fun)
#>             }
#>             else {
#>                 message("Multiple intersections for route ", 
#>                   i)
#>                 rnet_new = overline_sf2(rnet_overlap[1, ], sl[i, 
#>                   ], attrib = attrib, fun = fun)
#>                 rnet_new_ls = rnet_new[is_linestring(rnet_new), 
#>                   ]
#>                 sl$geometry[i] = rnet_new$geometry[!is_linestring(rnet_new)]
#>                 for (j in 2:nrow(rnet_overlap)) {
#>                   rnet_new_over = overline_sf2(rnet_overlap[j, 
#>                     ], sl[i, ], attrib = attrib, fun = fun)
#>                   sel_overlaps2 = overlaps(rnet_new_over, sl[i, 
#>                     ])
#>                   rnet_new2 = rnet_new_over[sel_overlaps2, ]
#>                   rnet_new = rbind(rnet_new, rnet_new2)
#>                   sl$geometry[i] = sf::st_difference(sl$geometry[i], 
#>                     rnet_new$geometry)
#>                 }
#>             }
#>             rnet = rbind(rnet_no_overlap, rnet_new)
#>         }
#>     }
#>     rnet_overlaps = sf::st_relate(rnet)
#>     if (length(unique(as.vector(rnet_overlaps))) > 3) 
#>         message("Unwanted relation")
#>     return(rnet)
#> }
#> <bytecode: 0x55ce1ea11208>
#> <environment: namespace:stplanr>

This function made use of the slightly simpler individual route-joining function, and associated helper functions:

stplanr::overline_sf2
#> function (sl, sl2, attrib = "value", fun = sum, return_linestring = FALSE) 
#> {
#>     attrib1 = paste0(attrib, ".1")
#>     suppressMessages({
#>         suppressWarnings({
#>             sl_intersection = sf::st_intersection(sl[attrib], 
#>                 sl2[attrib])
#>         })
#>     })
#>     for (i in seq_len(length(attrib))) {
#>         sl_intersection[[attrib[i]]] = fun(sl_intersection[[attrib[i]]], 
#>             sl_intersection[[attrib1[i]]])
#>         sl_intersection[[attrib1[i]]] = NULL
#>     }
#>     suppressMessages({
#>         suppressWarnings({
#>             sl_intersection = sf::st_line_merge(sl_intersection)
#>             sl_seg1 = sf::st_difference(sl[attrib], sf::st_geometry(sl_intersection))
#>             sl_seg2 = sf::st_difference(sl2[attrib], sf::st_geometry(sl_intersection))
#>         })
#>     })
#>     rnet = rbind(sl_seg1, sl_seg2, sl_intersection)
#>     if (return_linestring && any(!is_linestring(rnet))) {
#>         rnet = to_linestring(rnet)
#>     }
#>     return(rnet)
#> }
#> <bytecode: 0x55ce1ef0bc28>
#> <environment: namespace:stplanr>

stplanr::overlaps
#> function (sl, sl2, pattern = "1F1F00102|1F10F0102|1FF0FF102|1FF00F102") 
#> {
#>     suppressMessages({
#>         relate_rnet = sf::st_relate(sl, sl2)
#>     })
#>     grepl(pattern = pattern, relate_rnet)
#> }
#> <bytecode: 0x55ce1ebe8ca8>
#> <environment: namespace:stplanr>
stplanr::to_linestring
#> function (sl) 
#> {
#>     sel_linestring = is_linestring(sl)
#>     sl_linestring = sl[sel_linestring, ]
#>     sl_multilinestring = sl[!sel_linestring, ]
#>     sl_mlines = sf::st_cast(sl_multilinestring, "LINESTRING")
#>     rbind(sl_linestring, sl_mlines)
#> }
#> <bytecode: 0x55ce1ee7a148>
#> <environment: namespace:stplanr>

Aside from being clunky, this approach has the additional flaw of not working. It works fine for lines 2 to 6 in routes_fast_sf, as illustrated below.

r6 = routes_fast_sf[2:6, ]
rnet6 = overline_sf(r6, attrib = "value")
plot(rnet6)

A problem arises when we try to run the same command on the same data, but with one more line added. That will be the topic of the next post on aggregating lines to form route networks, which will also introduce a new function overline2(), developed by my colleague Malcolm Morgan.

Another benchmark

To provide a taster of the new function, let’s see it in action in a benchmark on this latest route network consisting of the five lines illustrated in the previous plot. Again, we use bench::mark() to benchmark each approach:

bench::mark(check = FALSE,
  overline = stplanr::overline(r6, "value"),
  overline_sf = stplanr::overline_sf(r6, "value"),
  overline2 = stplanr::overline2(r6, "value")
)
#> # A tibble: 3 x 10
#>   expression    min   mean median    max `itr/sec` mem_alloc  n_gc n_itr
#>   <chr>      <bch:> <bch:> <bch:> <bch:>     <dbl> <bch:byt> <dbl> <int>
#> 1 overline   31.4ms 32.6ms 32.1ms 37.1ms      30.6  288.81KB     4    11
#> 2 overline_… 68.9ms 68.9ms 68.9ms 68.9ms      14.5  314.16KB     6     1
#> 3 overline2  33.3ms 33.8ms 33.4ms 35.6ms      29.6    3.04MB     7     8
#> # … with 1 more variable: total_time <bch:tm>

Again the results are not Earth-shattering and should be taken with a large pinch of salt: performance should not be evaluated against these relatively tiny datasets, but against the scale of datasets that the functions were designed to handle: city scale route networks involving thousands of overlapping lines. The same principle is used in the benchmarking section of the h2oai website: focus on benchmarks that apply to the datasets you will actually be using. In this case, my hypothesis is that the geometric functions overline() and overline_sf() will perform disproportionately badly as the size of the input dataset grows (to be tested). Still, it is interesting to note that the new overline2() function uses way more memory than the other functions. More next time!