{accessibility} v1.1.0 overview

Daniel Herszenhut 2023-06-22 12 min read

A new version of the {accessibility} R package has finally arrived on CRAN with several new goodies! 🥳 This post overviews some of the new features added to {accessibility} in version v1.1.0.

New accessibility functions

The packages includes two novel accessibility functions. spatial_availability() and balancing_cost() implement the… well… Spatial Availability and Balancing Cost accessibility measures.

Spatial Availability has been originally proposed by Soukhov et al. (2023). The measure takes competition effects into account, and the accessibility levels that result from its usage are proportional both to the demand in each origin and to the travel cost it takes to reach the destinations. Just like any other accessibility function in the package, spatial_availability() is generic over any kind of travel cost, such as distance, time and money. In the code below, we use some datasets shipped with the package to demonstrate the function:

library(accessibility)

data_dir <- system.file("extdata", package = "accessibility")
travel_matrix <- readRDS(file.path(data_dir, "travel_matrix.rds"))
land_use_data <- readRDS(file.path(data_dir, "land_use_data.rds"))

spat_availability <- spatial_availability(
  travel_matrix,
  land_use_data,
  opportunity = "jobs",
  travel_cost = "travel_time",
  demand = "population",
  decay_function = decay_exponential(decay_value = 0.1)
)
head(spat_availability)

##                 id     jobs
## 1: 89a88cdb57bffff 186.0876
## 2: 89a88cdb597ffff 140.0738
## 3: 89a88cdb5b3ffff 736.5830
## 4: 89a88cdb5cfffff 900.9284
## 5: 89a88cd909bffff   0.0000
## 6: 89a88cd90b7ffff 204.7962

Balancing Cost has been originally proposed Barboza et al. (2021) (though under the name of Balancing Time). This measure is define as the travel cost required to reach as many opportunities as the number of people in a given origin.

The function includes most of the same parameters present in spatial_availability() and an additional cost_increment parameter, that should be used to specify the increment that defines the travel cost distribution from which the potential balancing costs will be picked. For example, an increment of 1 (the default) tends to suit travel time distributions, meaning that the function will first check if any origins reach their balancing cost with a travel time of 0 minutes, then 1 minute, 2, 3, 4, …, etc. On the other hand, an increment of 1 might be too big for a distribution of monetary costs, which could possibly benefit from a smaller increment of 0.05 (5 cents), for example. Such increment results in the function looking for balancing costs first at a monetary cost of 0, then 0.05, 0.10, …, etc. In the example below, we use the default cost increment of 1:

bc <- balancing_cost(
  travel_matrix,
  land_use_data,
  opportunity = "jobs",
  travel_cost = "travel_time",
  demand = "population"
)
head(bc)

##                 id travel_time
## 1: 89a881a5a2bffff          15
## 2: 89a881a5a2fffff          13
## 3: 89a881a5a67ffff          23
## 4: 89a881a5a6bffff           7
## 5: 89a881a5a6fffff          10
## 6: 89a881a5b03ffff           6

Inequality and poverty measures

Accessibility levels are often used to assess inequalities in the distribution and in the access to opportunities. In fact, that’s such a common analysis for us at the Access to Opportunities Project that we have implemented the same functions to calculate the same inequality measures probably dozens of times (here are some links to the functions I can remember of in less than 5 seconds: 1; 2; 3).

Implementing the same functions over and over is an obvious waste of time, so we have decided to include some functions to calculate inequality and poverty measures in this new {accessibility} version. Currently, palma_ratio() and gini_index() calculate the Palma Ratio and the Gini Index (wow) of an accessibility distribution and fgt_poverty() calculates the poverty using the Foster-Greer-Thorbecke (FGT) poverty measures.

All inequality and poverty measures work similarly, taking as inputs the accessibility distribution, a sociodemographic dataset and the columns in these datasets that contain the accessibility variable itself and the population variable that should be used to weigh the estimates. Each function may also include some specific parameters, which will be mentioned later. Let’s see how they work.

Palma Ratio

This measure is originally defined as the income share of the wealthiest 10% of a population divided by the income share of the poorest 40%. In the transport planning context, it has been adapted as the average accessibility of the wealthiest 10% divided by the average accessibility of the poorest 40%.

palma_ratio() includes an income parameter, in addition to those previously mentioned, used to list the column in the sociodemographic dataset with the income variable that should be used to classify the population in socioeconomic groups. This variable should describe income per capita (e.g. mean income per capita, household income per capita, etc), instead of the total amount of income in each cell. With the code below, we first calculate an accessibility distribution to then demonstrate the function:

access <- cumulative_cutoff(
  travel_matrix,
  land_use_data,
  opportunity = "jobs",
  travel_cost = "travel_time",
  cutoff = 30
)

palma <- palma_ratio(
  access,
  sociodemographic_data = land_use_data,
  opportunity = "jobs",
  population = "population",
  income = "income_per_capita"
)
palma

##    palma_ratio
## 1:    3.800465

Gini Index

Probably the most widely used inequality measure in the transport planning context, although not without some criticism¹, the Gini Index could not be left without an implementation in {accessibility}:

gini <- gini_index(
  access,
  sociodemographic_data = land_use_data,
  opportunity = "jobs",
  population = "population"
)
gini

##    gini_index
## 1:  0.4715251

Foster-Greer-Thorbecke (FGT) poverty measures

fgt_poverty() calculates the FGT metrics, a family of poverty measures originally proposed by Foster, Greer, and Thorbecke (1984) that capture the extent and severity of poverty within an accessibility distribution. The FGT family is composed of three measures that differ based on the $α$ parameter used to calculate them (either 0, 1 or 2) and which also changes their interpretation:

with $α = 0$ (FGT0) the measure captures the extent of poverty as a simple headcount - i.e. the proportion of people below the poverty line;
with $α = 1$ (FGT1) the measure, also know as the “poverty gap index”, captures the severity of poverty as the average percentage distance between the poverty line and the accessibility of individuals below the poverty line;
with $α = 2$ (FGT2) the measure simultaneously captures the extent and the severity of poverty by calculating the number of people below the poverty line weighted by the size of the accessibility shortfall relative to the poverty line.

This function includes an additional poverty_line parameter, used to define the poverty line below which individuals are considered to be in accessibility poverty.

poverty <- fgt_poverty(
  access,
  opportunity = "jobs",
  sociodemographic_data = land_use_data,
  population = "population",
  poverty_line = 95368
)
poverty

##         FGT0      FGT1      FGT2
## 1: 0.5745378 0.3277383 0.2218769

Multiple cutoffs, decay values, intervals, etc.

Much is said in accessibility literature about the boundary effect of the modifiable temporal unit problem (MTUP), which is basically concerned with the choice of travel time thresholds in cumulative opportunities measures (Pereira 2019). Fair point: if you only look at a single travel time threshold your accessibility estimates may be biased, or you may open the possibility for opportunistic analyses based on conclusions that result from a very specific combination of travel time thresholds and other criteria.

One way to mitigate this issue is to conduct a sensitivity analysis using several different thresholds, which is later used to arrive at any conclusions, policy recommendations, etc. To facilitate making such analyses, cumulative_cutoff() now accepts a numeric vector in the cutoff parameter. In the previous version it would only accept a single number. With the code below we calculate cumulative accessibility levels using travel time thresholds that range from 30 to 40 minutes:

access <- cumulative_cutoff(
  travel_matrix,
  land_use_data,
  opportunity = "jobs",
  travel_cost = "travel_time",
  cutoff = 30:40
)
access

##                    id travel_time   jobs
##    1: 89a881a5a2bffff          30  14561
##    2: 89a881a5a2bffff          31  16647
##    3: 89a881a5a2bffff          32  23877
##    4: 89a881a5a2bffff          33  24957
##    5: 89a881a5a2bffff          34  29096
##   ---                                   
## 9874: 89a88cdb6dbffff          36  36570
## 9875: 89a88cdb6dbffff          37  52321
## 9876: 89a88cdb6dbffff          38  68406
## 9877: 89a88cdb6dbffff          39  82076
## 9878: 89a88cdb6dbffff          40 100458

We can see that for each of our origins we have eleven accessibility estimates (from 30 to 40 minutes, every 1 minute). As one would imagine, the higher the travel time cutoff, the higher the accessibility levels.

An issue that is brought up much less frequently in the accessibility literature, however, is that any accessibility measure is susceptible to ad-hoc choices. For example, the result of gravity measures are completely dependent on the decay parameter used in the impedance function. The value of such parameter can be derived from calibrations based on observed travel behaviour, normative ad-hoc choices or even from recommended values frequently used in the literature. Still, the debate on the arbitrariness of gravity measures’ estimates is much less common than when dealing with cumulative measures.

I digress, but the main point here is that all accessibility functions from the package now allow us to use multiple decay values, threshold intervals, etc, not only cumulative_cutoff(). In other words, making sensitivity analysis to mitigate the impact of ad-hoc choices when calculating accessibility levels is much easier, whatever the measure you choose to use. For example, when calculating accessibility using a negative exponential impedance function, we can use several different decay values to see how that affect our results:

exp_access <- gravity(
  travel_matrix,
  land_use_data,
  decay_function = decay_exponential(decay_value = c(0.1, 0.2, 0.3)),
  opportunity = "schools",
  travel_cost = "travel_time"
)
exp_access

##                    id decay_function_arg    schools
##    1: 89a88cdb57bffff                0.1 1.39145243
##    2: 89a88cdb597ffff                0.1 4.64420315
##    3: 89a88cdb5b3ffff                0.1 4.22307428
##    4: 89a88cdb5cfffff                0.1 2.33540919
##    5: 89a88cd909bffff                0.1 3.29031323
##   ---                                              
## 2690: 89a881acda3ffff                0.3 0.20162596
## 2691: 89a88cdb543ffff                0.3 0.02162614
## 2692: 89a88cda667ffff                0.3 0.25812919
## 2693: 89a88cd900fffff                0.3 0.01176093
## 2694: 89a881aebafffff                0.3 0.00000000

Similarly, we can provide several cutoff intervals to cumulative_interval() and several minimum number of opportunities to cost_to_closest():

cum_interval <- cumulative_interval(
  travel_matrix = travel_matrix,
  land_use_data = land_use_data,
  interval = list(c(30, 60), c(40, 80)),
  opportunity = "jobs",
  travel_cost = "travel_time"
)
cum_interval

##                    id interval   jobs
##    1: 89a88cdb57bffff  [30,60] 231043
##    2: 89a88cdb597ffff  [30,60] 146922
##    3: 89a88cdb5b3ffff  [30,60] 219865
##    4: 89a88cdb5cfffff  [30,60] 340554
##    5: 89a88cd909bffff  [30,60] 231401
##   ---                                
## 1792: 89a881acda3ffff  [40,80] 403342
## 1793: 89a88cdb543ffff  [40,80] 480220
## 1794: 89a88cda667ffff  [40,80] 446719
## 1795: 89a88cd900fffff  [40,80] 326048
## 1796: 89a881aebafffff  [40,80]      0

min_cost <- cost_to_closest(
  travel_matrix,
  land_use_data,
  n = 1:5,
  opportunity = "schools",
  travel_cost = "travel_time"
)
min_cost

##                    id n travel_time
##    1: 89a881a5a2bffff 1          29
##    2: 89a881a5a2bffff 2          32
##    3: 89a881a5a2bffff 3          36
##    4: 89a881a5a2bffff 4          36
##    5: 89a881a5a2bffff 5          36
##   ---                              
## 4486: 89a88cdb6dbffff 1          24
## 4487: 89a88cdb6dbffff 2          29
## 4488: 89a88cdb6dbffff 3          32
## 4489: 89a88cdb6dbffff 4          37
## 4490: 89a88cdb6dbffff 5          37

Multiple travel costs

Last, but definitely not least, is that cumulative_cutoff() now accepts a character vector in travel_cost, enabling us to calculate accessibility using multiple travel costs. This is especially important when calculating accessibility levels from pareto frontiers of multiple travel costs, instead of a simpler travel matrix that consider only one cost. For example, let’s have a look at a pareto frontier of travel time and monetary cost that is shipped with the package:

pareto_frontier <- readRDS(file.path(data_dir, "pareto_frontier.rds"))

pareto_frontier

##                  from_id           to_id travel_time monetary_cost
##       1: 89a881a5a2bffff 89a881a5a2bffff           0             0
##       2: 89a881a5a2bffff 89a881a5a2fffff          24             0
##       3: 89a881a5a2bffff 89a881a5a2fffff          22             5
##       4: 89a881a5a2bffff 89a881a5a67ffff           8             0
##       5: 89a881a5a2bffff 89a881a5a6bffff          22             0
##      ---                                                          
## 1323424: 89a88cdb6dbffff 89a88cdb6cbffff          10             0
## 1323425: 89a88cdb6dbffff 89a88cdb6cfffff          18             0
## 1323426: 89a88cdb6dbffff 89a88cdb6d3ffff           7             0
## 1323427: 89a88cdb6dbffff 89a88cdb6d7ffff          19             0
## 1323428: 89a88cdb6dbffff 89a88cdb6dbffff           0             0

pareto_frontier[from_id == "89a881a5a2bffff" & to_id == "89a881a5a2fffff"]

##            from_id           to_id travel_time monetary_cost
## 1: 89a881a5a2bffff 89a881a5a2fffff          24             0
## 2: 89a881a5a2bffff 89a881a5a2fffff          22             5

We can see that the same origin-destination pair may appear multiple times in the matrix, with each trip alternative presenting different trade-offs of travel time and monetary cost. The first trip alternative between the pair highlighted above is slower but cheaper than the second one, for example.

Calculating accessibility levels from pareto frontiers using cumulative measures is relatively similar to calculating accessibility from a simpler matrix, but we have to pay attention to a fill things. The cost restriction must be applied simultaneously and we have to make sure that we’re not double counting opportunities. Luckily, the new implementation of cumulative_cutoff() deals with such issues for us. If we combine the ability of considering multiple travel costs with the ability of considering multiple cost cutoffs for each one of the travel costs we have specified, we have pretty rich and complex results from a single function call:

cum_pareto_access <- cumulative_cutoff(
  pareto_frontier,
  land_use_data = land_use_data,
  opportunity = "jobs",
  travel_cost = c("travel_time", "monetary_cost"),
  cutoff = list(c(20, 30), c(0, 5, 10))
)
cum_pareto_access

##                    id travel_time monetary_cost  jobs
##    1: 89a881a5a2bffff          20             0   397
##    2: 89a881a5a2bffff          20             5   397
##    3: 89a881a5a2bffff          20            10   397
##    4: 89a881a5a2bffff          30             0   846
##    5: 89a881a5a2bffff          30             5 20923
##   ---                                                
## 5384: 89a88cdb6dbffff          20             5   264
## 5385: 89a88cdb6dbffff          20            10   264
## 5386: 89a88cdb6dbffff          30             0   543
## 5387: 89a88cdb6dbffff          30             5  1567
## 5388: 89a88cdb6dbffff          30            10  1567

We can see that we have six accessibility estimates for each origin in our pareto frontier, which is the number of possible combinations of travel time and monetary cost cutoffs.

Currently, cumulative_cutoff() is the only accessibility function that can take multiple travel costs as input. Please follow this GitHub issue for updates on the other functions.

Conclusion

{accessibility} v1.1.0 is filled with goodies that help making accessibility research more methodologically robust and reliable. New accessibility measures, poverty and inequality measures, the ability to consider multiple cutoffs/decay values/cutoff intervals/etc and the ability to consider multiple travel costs in cumulative_cutoff(). We have already been using these features in our workflow at the Access to Opportunities Project, and we hope you find them useful as well.

References

Barboza, Matheus H. C., Mariana S. Carneiro, Claudio Falavigna, Gregório Luz, and Romulo Orrico. 2021. “Balancing Time: Using a New Accessibility Measure in Rio de Janeiro.” Journal of Transport Geography 90 (January): 102924. https://doi.org/10.1016/j.jtrangeo.2020.102924.

Foster, James, Joel Greer, and Erik Thorbecke. 1984. “A Class of Decomposable Poverty Measures.” Econometrica 52 (3): 761–66. https://doi.org/10.2307/1913475.

Pereira, Rafael H. M. 2019. “Future Accessibility Impacts of Transport Policy Scenarios: Equity and Sensitivity to Travel Time Thresholds for Bus Rapid Transit Expansion in Rio de Janeiro.” Journal of Transport Geography 74 (January): 321–32. https://doi.org/10.1016/j.jtrangeo.2018.12.005.

Soukhov, Anastasia, Antonio Páez, Christopher D. Higgins, and Moataz Mohamed. 2023. “Introducing Spatial Availability, a Singly-Constrained Measure of Competitive Accessibility.” Edited by Jun Yang. PLOS ONE 18 (1): e0278468. https://doi.org/10.1371/journal.pone.0278468.

Please see the “Why use the Palma Ratio?” box in Chapter 6 of our book Introduction to urban accessibility.↩︎