A new version of the {accessibility}
R package has finally arrived on CRAN with several new goodies! 🥳 This post overviews some of the new features added to {accessibility}
in version v1.1.0.
New accessibility functions
The packages includes two novel accessibility functions. spatial_availability()
and balancing_cost()
implement the… well… Spatial Availability and Balancing Cost accessibility measures.
Spatial Availability has been originally proposed by Soukhov et al. (2023). The measure takes competition effects into account, and the accessibility levels that result from its usage are proportional both to the demand in each origin and to the travel cost it takes to reach the destinations. Just like any other accessibility function in the package, spatial_availability()
is generic over any kind of travel cost, such as distance, time and money. In the code below, we use some datasets shipped with the package to demonstrate the function:
library(accessibility)
data_dir <- system.file("extdata", package = "accessibility")
travel_matrix <- readRDS(file.path(data_dir, "travel_matrix.rds"))
land_use_data <- readRDS(file.path(data_dir, "land_use_data.rds"))
spat_availability <- spatial_availability(
travel_matrix,
land_use_data,
opportunity = "jobs",
travel_cost = "travel_time",
demand = "population",
decay_function = decay_exponential(decay_value = 0.1)
)
head(spat_availability)
## id jobs
## 1: 89a88cdb57bffff 186.0876
## 2: 89a88cdb597ffff 140.0738
## 3: 89a88cdb5b3ffff 736.5830
## 4: 89a88cdb5cfffff 900.9284
## 5: 89a88cd909bffff 0.0000
## 6: 89a88cd90b7ffff 204.7962
Balancing Cost has been originally proposed Barboza et al. (2021) (though under the name of Balancing Time). This measure is define as the travel cost required to reach as many opportunities as the number of people in a given origin.
The function includes most of the same parameters present in spatial_availability()
and an additional cost_increment
parameter, that should be used to specify the increment that defines the travel cost distribution from which the potential balancing costs will be picked. For example, an increment of 1 (the default) tends to suit travel time distributions, meaning that the function will first check if any origins reach their balancing cost with a travel time of 0 minutes, then 1 minute, 2, 3, 4, …, etc. On the other hand, an increment of 1 might be too big for a distribution of monetary costs, which could possibly benefit from a smaller increment of 0.05 (5 cents), for example. Such increment results in the function looking for balancing costs first at a monetary cost of 0, then 0.05, 0.10, …, etc. In the example below, we use the default cost increment of 1:
bc <- balancing_cost(
travel_matrix,
land_use_data,
opportunity = "jobs",
travel_cost = "travel_time",
demand = "population"
)
head(bc)
## id travel_time
## 1: 89a881a5a2bffff 15
## 2: 89a881a5a2fffff 13
## 3: 89a881a5a67ffff 23
## 4: 89a881a5a6bffff 7
## 5: 89a881a5a6fffff 10
## 6: 89a881a5b03ffff 6
Inequality and poverty measures
Accessibility levels are often used to assess inequalities in the distribution and in the access to opportunities. In fact, that’s such a common analysis for us at the Access to Opportunities Project that we have implemented the same functions to calculate the same inequality measures probably dozens of times (here are some links to the functions I can remember of in less than 5 seconds: 1; 2; 3).
Implementing the same functions over and over is an obvious waste of time, so we have decided to include some functions to calculate inequality and poverty measures in this new {accessibility}
version. Currently, palma_ratio()
and gini_index()
calculate the Palma Ratio and the Gini Index (wow) of an accessibility distribution and fgt_poverty()
calculates the poverty using the Foster-Greer-Thorbecke (FGT) poverty measures.
All inequality and poverty measures work similarly, taking as inputs the accessibility distribution, a sociodemographic dataset and the columns in these datasets that contain the accessibility variable itself and the population variable that should be used to weigh the estimates. Each function may also include some specific parameters, which will be mentioned later. Let’s see how they work.
Palma Ratio
This measure is originally defined as the income share of the wealthiest 10% of a population divided by the income share of the poorest 40%. In the transport planning context, it has been adapted as the average accessibility of the wealthiest 10% divided by the average accessibility of the poorest 40%.
palma_ratio()
includes an income parameter, in addition to those previously mentioned, used to list the column in the sociodemographic dataset with the income variable that should be used to classify the population in socioeconomic groups. This variable should describe income per capita (e.g. mean income per capita, household income per capita, etc), instead of the total amount of income in each cell. With the code below, we first calculate an accessibility distribution to then demonstrate the function:
access <- cumulative_cutoff(
travel_matrix,
land_use_data,
opportunity = "jobs",
travel_cost = "travel_time",
cutoff = 30
)
palma <- palma_ratio(
access,
sociodemographic_data = land_use_data,
opportunity = "jobs",
population = "population",
income = "income_per_capita"
)
palma
## palma_ratio
## 1: 3.800465
Gini Index
Probably the most widely used inequality measure in the transport planning context, although not without some criticism1, the Gini Index could not be left without an implementation in {accessibility}
:
gini <- gini_index(
access,
sociodemographic_data = land_use_data,
opportunity = "jobs",
population = "population"
)
gini
## gini_index
## 1: 0.4715251
Foster-Greer-Thorbecke (FGT) poverty measures
fgt_poverty()
calculates the FGT metrics, a family of poverty measures originally proposed by Foster, Greer, and Thorbecke (1984) that capture the extent and severity of poverty within an accessibility distribution. The FGT family is composed of three measures that differ based on the \(\alpha\) parameter used to calculate them (either 0, 1 or 2) and which also changes their interpretation:
- with \(\alpha = 0\) (FGT0) the measure captures the extent of poverty as a simple headcount - i.e. the proportion of people below the poverty line;
- with \(\alpha = 1\) (FGT1) the measure, also know as the “poverty gap index”, captures the severity of poverty as the average percentage distance between the poverty line and the accessibility of individuals below the poverty line;
- with \(\alpha = 2\) (FGT2) the measure simultaneously captures the extent and the severity of poverty by calculating the number of people below the poverty line weighted by the size of the accessibility shortfall relative to the poverty line.
This function includes an additional poverty_line
parameter, used to define the poverty line below which individuals are considered to be in accessibility poverty.
poverty <- fgt_poverty(
access,
opportunity = "jobs",
sociodemographic_data = land_use_data,
population = "population",
poverty_line = 95368
)
poverty
## FGT0 FGT1 FGT2
## 1: 0.5745378 0.3277383 0.2218769
Multiple cutoffs, decay values, intervals, etc.
Much is said in accessibility literature about the boundary effect of the modifiable temporal unit problem (MTUP), which is basically concerned with the choice of travel time thresholds in cumulative opportunities measures (Pereira 2019). Fair point: if you only look at a single travel time threshold your accessibility estimates may be biased, or you may open the possibility for opportunistic analyses based on conclusions that result from a very specific combination of travel time thresholds and other criteria.
One way to mitigate this issue is to conduct a sensitivity analysis using several different thresholds, which is later used to arrive at any conclusions, policy recommendations, etc. To facilitate making such analyses, cumulative_cutoff()
now accepts a numeric vector in the cutoff
parameter. In the previous version it would only accept a single number. With the code below we calculate cumulative accessibility levels using travel time thresholds that range from 30 to 40 minutes:
access <- cumulative_cutoff(
travel_matrix,
land_use_data,
opportunity = "jobs",
travel_cost = "travel_time",
cutoff = 30:40
)
access
## id travel_time jobs
## 1: 89a881a5a2bffff 30 14561
## 2: 89a881a5a2bffff 31 16647
## 3: 89a881a5a2bffff 32 23877
## 4: 89a881a5a2bffff 33 24957
## 5: 89a881a5a2bffff 34 29096
## ---
## 9874: 89a88cdb6dbffff 36 36570
## 9875: 89a88cdb6dbffff 37 52321
## 9876: 89a88cdb6dbffff 38 68406
## 9877: 89a88cdb6dbffff 39 82076
## 9878: 89a88cdb6dbffff 40 100458
We can see that for each of our origins we have eleven accessibility estimates (from 30 to 40 minutes, every 1 minute). As one would imagine, the higher the travel time cutoff, the higher the accessibility levels.
An issue that is brought up much less frequently in the accessibility literature, however, is that any accessibility measure is susceptible to ad-hoc choices. For example, the result of gravity measures are completely dependent on the decay parameter used in the impedance function. The value of such parameter can be derived from calibrations based on observed travel behaviour, normative ad-hoc choices or even from recommended values frequently used in the literature. Still, the debate on the arbitrariness of gravity measures’ estimates is much less common than when dealing with cumulative measures.
I digress, but the main point here is that all accessibility functions from the package now allow us to use multiple decay values, threshold intervals, etc, not only cumulative_cutoff()
. In other words, making sensitivity analysis to mitigate the impact of ad-hoc choices when calculating accessibility levels is much easier, whatever the measure you choose to use. For example, when calculating accessibility using a negative exponential impedance function, we can use several different decay values to see how that affect our results:
exp_access <- gravity(
travel_matrix,
land_use_data,
decay_function = decay_exponential(decay_value = c(0.1, 0.2, 0.3)),
opportunity = "schools",
travel_cost = "travel_time"
)
exp_access
## id decay_function_arg schools
## 1: 89a88cdb57bffff 0.1 1.39145243
## 2: 89a88cdb597ffff 0.1 4.64420315
## 3: 89a88cdb5b3ffff 0.1 4.22307428
## 4: 89a88cdb5cfffff 0.1 2.33540919
## 5: 89a88cd909bffff 0.1 3.29031323
## ---
## 2690: 89a881acda3ffff 0.3 0.20162596
## 2691: 89a88cdb543ffff 0.3 0.02162614
## 2692: 89a88cda667ffff 0.3 0.25812919
## 2693: 89a88cd900fffff 0.3 0.01176093
## 2694: 89a881aebafffff 0.3 0.00000000
Similarly, we can provide several cutoff intervals to cumulative_interval()
and several minimum number of opportunities to cost_to_closest()
:
cum_interval <- cumulative_interval(
travel_matrix = travel_matrix,
land_use_data = land_use_data,
interval = list(c(30, 60), c(40, 80)),
opportunity = "jobs",
travel_cost = "travel_time"
)
cum_interval
## id interval jobs
## 1: 89a88cdb57bffff [30,60] 231043
## 2: 89a88cdb597ffff [30,60] 146922
## 3: 89a88cdb5b3ffff [30,60] 219865
## 4: 89a88cdb5cfffff [30,60] 340554
## 5: 89a88cd909bffff [30,60] 231401
## ---
## 1792: 89a881acda3ffff [40,80] 403342
## 1793: 89a88cdb543ffff [40,80] 480220
## 1794: 89a88cda667ffff [40,80] 446719
## 1795: 89a88cd900fffff [40,80] 326048
## 1796: 89a881aebafffff [40,80] 0
min_cost <- cost_to_closest(
travel_matrix,
land_use_data,
n = 1:5,
opportunity = "schools",
travel_cost = "travel_time"
)
min_cost
## id n travel_time
## 1: 89a881a5a2bffff 1 29
## 2: 89a881a5a2bffff 2 32
## 3: 89a881a5a2bffff 3 36
## 4: 89a881a5a2bffff 4 36
## 5: 89a881a5a2bffff 5 36
## ---
## 4486: 89a88cdb6dbffff 1 24
## 4487: 89a88cdb6dbffff 2 29
## 4488: 89a88cdb6dbffff 3 32
## 4489: 89a88cdb6dbffff 4 37
## 4490: 89a88cdb6dbffff 5 37
Multiple travel costs
Last, but definitely not least, is that cumulative_cutoff()
now accepts a character vector in travel_cost
, enabling us to calculate accessibility using multiple travel costs. This is especially important when calculating accessibility levels from pareto frontiers of multiple travel costs, instead of a simpler travel matrix that consider only one cost. For example, let’s have a look at a pareto frontier of travel time and monetary cost that is shipped with the package:
pareto_frontier <- readRDS(file.path(data_dir, "pareto_frontier.rds"))
pareto_frontier
## from_id to_id travel_time monetary_cost
## 1: 89a881a5a2bffff 89a881a5a2bffff 0 0
## 2: 89a881a5a2bffff 89a881a5a2fffff 24 0
## 3: 89a881a5a2bffff 89a881a5a2fffff 22 5
## 4: 89a881a5a2bffff 89a881a5a67ffff 8 0
## 5: 89a881a5a2bffff 89a881a5a6bffff 22 0
## ---
## 1323424: 89a88cdb6dbffff 89a88cdb6cbffff 10 0
## 1323425: 89a88cdb6dbffff 89a88cdb6cfffff 18 0
## 1323426: 89a88cdb6dbffff 89a88cdb6d3ffff 7 0
## 1323427: 89a88cdb6dbffff 89a88cdb6d7ffff 19 0
## 1323428: 89a88cdb6dbffff 89a88cdb6dbffff 0 0
pareto_frontier[from_id == "89a881a5a2bffff" & to_id == "89a881a5a2fffff"]
## from_id to_id travel_time monetary_cost
## 1: 89a881a5a2bffff 89a881a5a2fffff 24 0
## 2: 89a881a5a2bffff 89a881a5a2fffff 22 5
We can see that the same origin-destination pair may appear multiple times in the matrix, with each trip alternative presenting different trade-offs of travel time and monetary cost. The first trip alternative between the pair highlighted above is slower but cheaper than the second one, for example.
Calculating accessibility levels from pareto frontiers using cumulative measures is relatively similar to calculating accessibility from a simpler matrix, but we have to pay attention to a fill things. The cost restriction must be applied simultaneously and we have to make sure that we’re not double counting opportunities. Luckily, the new implementation of cumulative_cutoff()
deals with such issues for us. If we combine the ability of considering multiple travel costs with the ability of considering multiple cost cutoffs for each one of the travel costs we have specified, we have pretty rich and complex results from a single function call:
cum_pareto_access <- cumulative_cutoff(
pareto_frontier,
land_use_data = land_use_data,
opportunity = "jobs",
travel_cost = c("travel_time", "monetary_cost"),
cutoff = list(c(20, 30), c(0, 5, 10))
)
cum_pareto_access
## id travel_time monetary_cost jobs
## 1: 89a881a5a2bffff 20 0 397
## 2: 89a881a5a2bffff 20 5 397
## 3: 89a881a5a2bffff 20 10 397
## 4: 89a881a5a2bffff 30 0 846
## 5: 89a881a5a2bffff 30 5 20923
## ---
## 5384: 89a88cdb6dbffff 20 5 264
## 5385: 89a88cdb6dbffff 20 10 264
## 5386: 89a88cdb6dbffff 30 0 543
## 5387: 89a88cdb6dbffff 30 5 1567
## 5388: 89a88cdb6dbffff 30 10 1567
We can see that we have six accessibility estimates for each origin in our pareto frontier, which is the number of possible combinations of travel time and monetary cost cutoffs.
Currently, cumulative_cutoff()
is the only accessibility function that can take multiple travel costs as input. Please follow this GitHub issue for updates on the other functions.
Conclusion
{accessibility}
v1.1.0 is filled with goodies that help making accessibility research more methodologically robust and reliable. New accessibility measures, poverty and inequality measures, the ability to consider multiple cutoffs/decay values/cutoff intervals/etc and the ability to consider multiple travel costs in cumulative_cutoff()
. We have already been using these features in our workflow at the Access to Opportunities Project, and we hope you find them useful as well.
References
Please see the “Why use the Palma Ratio?” box in Chapter 6 of our book Introduction to urban accessibility.↩︎