Hurricanes are one of the most destructive weather events that we face. From 1980 to 2021 it is estimated that hurricanes caused an $1.1 trillion in damage, averaging approximately $20.5 billion per storm (NOAA, n.d.), and with a changing climate also exists the potential for an increase in hurricane intensity, frequency, or both, which leads to a potential increase in loss of life. To limit the potential loss of life, it is critical coastal cities have accessible and effective evacuation plans and routes for all citizens, regardless of demographic. This study aims to examine the relationship between neighborhood demographics, specifically income and race, and access to primary evacuation routes in Virginia Beach, VA. The main question that this project will answer is: how do income and race affect/determine people’s access to major evacuation routes (in this case, interstates, and highways) based on distance? To answer this question, geographically weighted regression (GWR) was used to examine the influence of the two independent variables, percentage of non-white residents and median household income, both at the block group level, on the distance required to travel to the closest of two evacuation points. The GWR model proves to be a good fit and shows that percentage of non-white residents has an inverse relationship with distance whereas median household income has little to no infleunce on distance.
Hurricanes are one of the most destructive weather events that we face. From 1980 to 2021 it is estimated that hurricanes caused an $1.1 trillion in damage, averaging approximately $20.5 billion per storm (NOAA, n.d.). As our climate continues to change and oceans warm, more moisture will be in the air, leading to more intense rain and weather events (Colbert, 2022), especially hurricanes. More intense hurricanes combined with rising sea levels means increased damage and potentially increased loss of life. To limit the potential loss of life, it is critical coastal cities have accessible and effective evacuation plans and routes for all citizens, regardless of demographic.
This study aims to examine the relationship between neighborhood demographics, specifically income and race, and access to primary evacuation routes in Virginia Beach, VA. Previous studies have examined the relationship between income and evacuation for disasters. One such study conducted a poll of residents in the Hampton Roads area, which consists of Virginia Beach, Norfolk, Chesapeake, and Newport News, VA, among others, inquiring how likely a household would evacuate based on several factors, to include income. The study found that income was a marginal contributor to the decision to evacuate (Diaz et al., 2023). Another study also examined the relationship between income and the decision to evacuate based on a post analysis of location data combined with ACS block group information after Hurricane Harvey in Houston, TX. This study found there was a positive correlation between evacuation and non-poor white neighborhoods (Deng et al., 2021). Neither study examined the relationship between income and access (defined as distance) to primary evacuation routes.
The main question that this project will answer is: how do income and race affect/determine people’s access to major evacuation routes (in this case, interstates, and highways) based on distance? The null hypothesis to be tested is income level and race have no impact to major evacuation route access, and distance to access primary evacuation routes, regardless of income and race, is random.
To answer this question, two primary types of data will be required:
neighborhood demographic data and transportation network data. For
neighborhood demographic data, data from the 2021 American Community
Survey (ACS) will be utilized. The 2021 ACS provides various types of
data, such as social, housing and demographics characteristics, as well
as economic characteristics from 2017-2021. Specifically, this study
will examine median household income and race for current residence at
the block level. The data will be accessed in R using
tidycensus package.
Transportation network data is available in various locations, such
as through Virginia Beach Open Data Portal and OpenStreetMaps. Given the
type of analysis that will be conducted, and the R packages
used, OpenStreetMap-Based Routing Service (OSRM),
OpenStreetMap data will be used. Per the Virginia Department of
Emergency Management evacuation routes consist primarily of interstates,
highways and state roads. While this project is focused on Virginia
Beach the transportation network will not be limited to the confines of
Virginia Beach City limits.
Virginia Beach is expansive, and the primary evacuation routes extend well beyond city limits, as can be seen in Figure 1. Limiting the transportation network to just the confines of city limits would introduce unnecessary artificiality and not accurately reflect reality, especially given the choice of evacuation points as discussed below (i.e. the shortest route to evacuation points along primary evacuation routes will be through a neighboring cities). To account for this, the transportation network will include all roads in Virginia Beach and neighboring cities of Norfolk, Chesapeake, Suffolk and Hampton, up to and including the I-64/I-664 interchange (Evac Point 1) in Hampton, and the US-58 split (Evac Point 2) in Suffolk.
To answer the question posed and determine if the null hypothesis
should be accepted or rejected, various packages in R will
be utilized, the flow of which can be seen in Figure 2.
Figure 2. Processing flow of assessing access to evacuation
routes based on income and race. Yellow objects depict
databases, light red depict data, gray depicts processes, blue depicts
output objects, and green depicts display.
The first step in the data analysis is to gather the demographic data from ACS. In order to use the data, it first needs to be manipulated to remove unnecessary columns and converted from polygons to centroids in order to compute distances.
As with any good code, the first step is to load required libraries.
library(tidycensus)
library(tidyverse)
library(here)
library(ggplot2)
library(scales)
library(psych)
library(sf)
library(tmap)
library(DT)
library(osrm)
library(spgwr)
library(RColorBrewer)
With the libraries loaded, the appropriate datasets for median household income and race for Virginia Beach at the block group level can be obtained. The naming convention for these datasets is based on codes from Census.gov. The data this research is interested in are race at the block group, which is represented by B02001_001 through B02001_008, and median household income at the block groupo level, which is represented by B19049_001.
# Define variables for the ACS race data
vaBeach_vars <- c("B02001_001", # Total population
"B02001_002", # White alone
"B02001_003", # Black or African American alone
"B02001_004", # American Indian and Alaska Native alone
"B02001_005", # Asian alone
"B02001_006", # Native Hawaiian and Other Pacific Islander alone
"B02001_007", # Some Other Race alone
"B02001_008", # Two or More Races
"B19049_001") # Median HH Income
# Get ACS data for race and income at the block group level for a specific geography
# (e.g., Virginia Beach, VA) in wide format
vaBeach_ACS <- get_acs(geography = "block group",
variables = vaBeach_vars,
state = "VA",
county = "Virginia Beach city",
year = 2021,
geometry = TRUE,
output = "wide",
progress_bar = FALSE)
# Calculate percents for white and non-white
# then clean up data by dropping unneeded and renaming
vaBeach_ACS <- vaBeach_ACS %>%
mutate(total_non_white = B02001_003E + B02001_004E +
B02001_005E + B02001_006E + B02001_007E + B02001_008E,
per_non_white = total_non_white/B02001_001E * 100,
per_white = B02001_002E/B02001_001E * 100) %>%
rename("total_pop" = "B02001_001E",
"white_pop" = "B02001_002E",
"median_HH" = "B19049_001E") %>%
select(-B02001_001M, -B02001_002M,
-B02001_003E, -B02001_003M,
-B02001_004E, -B02001_004M,
-B02001_005E, -B02001_005M,
-B02001_006E, -B02001_006M,
-B02001_007E, -B02001_007M,
-B02001_008E, -B02001_008M,
-B19049_001M)
vaBeach_ACS_centroid <- st_point_on_surface(vaBeach_ACS) # Creating Centroids
First, a list of these variables is defined and then
get_acs() from the tidycensus package is used
to access them. With this, the total non-white for each block group is
calculated and then the percentages for white and non-white are
determined for each block group and the unnecessary columns are dropped
to make the resulting dataframe manageable. The resulting dataframe can
be seen in Table 1.
datatable(vaBeach_ACS)
Table 1. Final Spatial Data Frame for Virginia Beach ACS Data at the Block Group. The table shows the remaining necessary variables after manipulation and cleaning. The resulting variables are total population, total white, total non-white and their respective percentages, as well as median household income, all at the block group level.
Finally, in order to calculate distances from each block group, block group polygons were converted to centroids, which assigns all the variables for the spatial object to a point. To visualize the outcome of the process, a plot of median household income as polygons is plotted with the centroids. Of note, a few of the block groups did not have values associated with median household incomes. This includes block groups associated with Naval Air Station (NAS) Oceana, in the center of Virginia Beach, as well as along the Virginia Beach boardwalk, a predominantly tourist location, and others. While NAS Oceana makes sense, a few of the others do not given these block groups contain residents, as depicted by the percent populations.
# Static plot of income and centroids for Virginia Beach
pal.1 <- brewer.pal(4, "YlOrRd")
ggplot() +
geom_sf(data = vaBeach_ACS,
aes(fill = median_HH),
lwd = 0) +
scale_fill_viridis_c(option = "plasma", begin = 0.1) +
guides(fill=guide_legend(title = "Median Household Income ($)")) +
geom_sf(data = vaBeach_ACS_centroid,
color = "white",
size = 0.25) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + # Rotating Longitude to be readable
labs(title = "Median Household Income with Centroids",
subtitle = "Virginia Beach, Virginia")
Figure 3. Median Household Income and Associated
Centroids for Virginia Beach, VA. Map depicts the
median household income for block groups in Virginia Beach along with
associated centroids. Of note, some of the block groups do not have
median household incomes but do contain percentage of
residents.
# Static plot of income and centroids for Virginia Beach
ggplot() +
geom_sf(data = vaBeach_ACS,
aes(fill = per_non_white),
lwd = 0) +
scale_fill_viridis_c(option = "plasma", begin = 0.1) +
guides(fill=guide_legend(title = "Percent Non-White (%)")) +
geom_sf(data = vaBeach_ACS_centroid,
color = "white",
size = 0.25) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + # Rotating Longitude to be readable
labs(title = "Percentage of Non-White Resindets with Centroids",
subtitle = "Virginia Beach, Virginia")
Figure 4. Percentage of Non-White Residents and
Associated Centroids for Virginia Beach, VA. Map
depicts the percentage of non-white residents for block groups in
Virginia Beach along with associated centroids.
With the ACS data gathered and converted into a usable format for distance calculations, centroids, the next step is to determine the shortest distance from each block group to one of two primary evacuation points, as identified in Figure 1 as Evac Points 1 and 2. Network distance is chosen as more representative of how an evacuee would travel along a road network vice Euclidean distance, which is a shortest path straight line between two points. An initial desire was to calculate the distance from each block group to the closest entry point to one of the main evacuation routes outlined in Figure 1 (e.g. distance from a block group to the entrance ramp of I-264). While this is a valid option, it would have required knowing the coordinates for every point and then iterating through a loop to determine which was shortest. For ease of calculations, the two primary points were chosen, and even though a loop is required to determine the shortest path, it is only a comparison between two points vice hundreds.
# Setting evacuation points
evacpoints <- data.frame(
point_id = c("I64", "US58"),
lon = c(-76.3837, -76.5183),
lat = c(37.0334, 36.7537))
# Projecting and turning evac points into sf object
projcrs <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
evacpoints <- st_as_sf(x = evacpoints,
coords = c("lon", "lat"),
crs = projcrs)
Prior to conducting an analysis of the shortest distance, the
evacuation points are created. These two points were chosen based on
their location and convergence of multiple evacuation routes into one.
It is acknowledged there may be other evacuation routes based on the
hurricane, such as south through North Carolina, but these evacuation
routes are defined and sanctioned by the Virginia Department of
Emergency Management. With the evacuation points defined, the distances
to each from block groups can be calculated using
osrmroute() from in the osrm package.
The osrmroute function works by taking a source
(src), in this instance each block group, and calculating
the distance to a desitantion (dst). This package has the
ability to provide a best route to multiple destinations. Here, the
concern is not with shortest route between multiple destinations, but
shortest distance from multiple starting points to one of two
destinations. To accomplish this, the block group centroids are iterated
through a for loop calculating the distance to each evacuation point. A
comparison is then done to determine which of the two routes is shorter
with the associated values of distance, duration, and route stored into
the original dataframe consisting of polygons.
An interesting finding is that the majority of the block groups show US-58 as the closest evacuation point. Given the throughput and accessibility to the interstate system, it was assumed I-64 would be the evacuation point based on distance. Examining Figure 5, block goups in northern Virginia Beach have distances to evacuation points around 40 to 50 km, whereas those in southern Virginia Beach increase to 60 and 70 km. A comparison was completed for closest evacuation point based on duration of travel with very little change from the distance comparison.
# Dropping NA values from our centroids
vaBeach_ACS_centroid <- na.omit(vaBeach_ACS_centroid)
# Creating a new spatial dataframe to store distance and duration of travel
vaBeach_ACS_shortdist <- vaBeach_ACS %>%
na.omit() %>%
add_column(duration = NA,
distance = NA,
route = NA)
# Determine the length of the dataframe for the for loop
n = count(st_drop_geometry(vaBeach_ACS_shortdist))
n = n[,1]
# For Loop to calculate shortest route based on distance
for (x in 1:n){
#Calculating the distance to both points and saving as a variable
route64 <- osrmRoute(src = vaBeach_ACS_centroid[x, ], dst = evacpoints[1, ])
route58 <- osrmRoute(src = vaBeach_ACS_centroid[x, ], dst = evacpoints[2, ])
# Determining which distance is shorter and storing the distance, duration, and
# evac point
if(route64$distance < route58$distance){
vaBeach_ACS_shortdist[x,10] = route64$duration
vaBeach_ACS_shortdist[x,11] = route64$distance
vaBeach_ACS_shortdist[x,12] = "I64"
} else {
vaBeach_ACS_shortdist[x,10] = route58$duration
vaBeach_ACS_shortdist[x,11] = route58$distance
vaBeach_ACS_shortdist[x,12] = "US58"
}
}
# Converting the route to a factor
vaBeach_ACS_shortdist$route <- as.factor(vaBeach_ACS_shortdist$route)
ggplot() +
geom_sf(data = vaBeach_ACS_shortdist,
aes(fill = route),
lwd = 0) +
guides(fill=guide_legend(title = "Evacuation Point")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + # Rotating Longitude to be readable
labs(title = "Closest Evacuation Point for Each Block Group",
subtitle = "Virginia Beach, Virginia")
Figure 5. Distance to the Closest Evacuation Point based
on Distance. This map depicts the closest evacuation
point, as defined in Figure 1, for each block group in Virginia Beach.
Interestingly, the shortest distance for most of the block groups in
Virginia Beach is US-58 to the west vice I-64 to the north.
# Static plot of distance and centroids for Virginia Beach
ggplot() +
geom_sf(data = vaBeach_ACS_shortdist,
aes(fill = distance),
lwd = 0) +
scale_fill_viridis_c(option = "plasma", begin = 0.1) +
guides(fill=guide_legend(title = "Shortest Distance (km)")) +
geom_sf(data = vaBeach_ACS_centroid,
color = "white",
size = 0.25) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + # Rotating Longitude to be readable
labs(title = "Shortest Distance to an Evacuation Point with Centroids",
subtitle = "Virginia Beach, Virginia")
Figure 6. Shortest Distance to Evacuation Points
Associated Centroids for Virginia Beach, VA. Map
depicts the shortest ditance to an evacuation point for block groups in
Virginia Beach along with associated centroids. Of note, some of the
block groups do not have median household incomes, which was used to
determine block groups for distance calculation.
With the required data manipulated, distances calculated, and dataframe formatted, analysis can begin, starting with examining the basic descriptive statistics. First, while data was collected for white and non-white, given multicollinearity between white and non-white, or in other words the direct correlation, only non-white is being examined.
# Getting descriptive stats and dropping geom
desc_stats <- vaBeach_ACS_shortdist %>%
st_drop_geometry() %>%
describe()
datatable(desc_stats)
Table 2. Descriptive Statistics for Virginia
Beach. Table shows the descriptive statistics for
Virginia Beach. Those we are most concerned with are median household
income, percent non-white, and distance.
Table 2 shows the descriptive statistics for all the data but this study is only concerned with three of the variables - the two independent variables, median_HH (median household income) and per_non_white (percentage of non-white residents), and the dependent variable, distance. For household income, the mean is approximately $89,500 and median is $84,050, indicating a skewed right distribution and relatively lower income for Virginia Beach. A skewed right profile is also true for the percentage of non-white residents, which has a mean of 34.32% and median of 32.31%, and distance, with a mean of 43.2 km and media of 42.1 km. This can also be seen when examining the histograms in Figures 7-9 .
For median household income, the preponderance of income lies between $50k and ~$100k, with very few outliers in the $200k and above range. Percentage of non-white residents varies with the majority of neighborhoods having between 10% and 70% of the communities of non-white residents. Finally, distance to an evacuation point varies between apprximately 35km and 50km, with a few outliers 60km and higher, as can also be seen in Figure 6.
# Calculating number of bins using median HH income
bin_number <- sqrt(desc_stats[c("median_HH"), c("n")]) %>%
round(digits = 0)
# Histogram plot of median household income
ggplot(vaBeach_ACS_shortdist, aes(x = median_HH, color = "red")) +
geom_histogram(bins = bin_number) +
scale_x_continuous(labels = label_dollar()) +
theme_classic() +
theme(legend.position = "none") +
labs(x = "Median Household Income ($)",
y = "Count")
Figure 7. Histogram of Median Household Income per Block
Group. Histogram depicts the median household income
in Virginia Beach. Based on the plot, and from the descriptive
statistics, the date is skewed slightly right indicating lower income
for Virginia Beach.
# Histogram plot of percent non-white residents
ggplot(vaBeach_ACS_shortdist, aes(x = per_non_white, color = "red")) +
geom_histogram(bins = bin_number) +
theme_classic() +
theme(legend.position = "none") +
labs(x = "Percentage of Non-White Residents (%)",
y = "Count")
Figure 8. Histogram of Percentage of Non-White Residents
Per Block Group. Histogram depicts the percentage of
non-white residents per block group in Virginia Beach. Based on the
plot, and from the descriptive statistics, the date is skewed slightly
right which indicates there is a higher number of block groups where
non-white residents are the minority.
# Histogram plot of percent non-white residents
ggplot(vaBeach_ACS_shortdist, aes(x = distance, color = "red")) +
geom_histogram(bins = bin_number) +
theme_classic() +
theme(legend.position = "none") +
labs(x = "Distance to Evacuation Points (km)",
y = "Count")
Figure 9. Histogram of Evacuation Distance per Block
Group. Histogram depicts the distance to evacuation
points per block group. Based on the plot, and from the descriptive
statistics, the date is skewed slightly right indicating there are more
blocks with shorter distances to evacuate than longer.
Next, a simple correlation between distance and median household income and then distance and percent non-white residents is plotted. In Figure 10, which depicts distance versus median household income, there is a slightly positive correlation, indicating that as income increases, distance to an evacuation point increases. In Figure 11, which is distance to an evacuation point versus percentage of non-white residents, there is a slightly negative correlation, indicating as the percentage of non-white residents decreases, distance increases.
# Plot fo median HH income versus distance with a simple best fit line
plot(vaBeach_ACS_shortdist$distance, vaBeach_ACS_shortdist$median_HH,
main = "Regression for Distance and Median Household Income",
xlab = "Travel Distance (km)", ylab = "Median Household Income ($)")
abline(lm(median_HH~distance, data = vaBeach_ACS_shortdist), col = "red")
Figure 10. Correlation Plot of Distance vs Median
Household Income per Block Group. A slightly positive
correlation between distance to an evacuation point and income can be
seen. As income increases, so does the distance to evacuate.
# Plot fo percent non white versus distance with a simple best fit line
plot(vaBeach_ACS_shortdist$distance, vaBeach_ACS_shortdist$per_non_white,
main = "Regression for Distance and Percentage of Non-White Residents",
xlab = "Travel Distance (km)", ylab = "Percentage Non-White Residents")
abline(lm(per_non_white~distance, data = vaBeach_ACS_shortdist), col = "red")
Figure 11. Correlation Plot of Distance vs Percent
Non-White Residents per Block Group. A slightly
negative correlation between distance to an evacuation point and
percentage of non-white residents. As percentage of non-white residents
decreases, distance increases.
Geographically weighted regression (GWR) is local model used to examine how relationships between variables change from place to place. GWR fits individual regression models for different locations, weighting variables based on a distance from the area being examined, allowing for an examination of how geography influences the data, potentially uncoverring spatial nuances which a global model might miss.
Prior to running GWR, a simple global linear regression was conducted to obtain a sense of the data as well as to compare to GWR. As described earlier, the two independent variables, median household income and percentage of non-white residents, are used as predictors for the dependent variable, distance to an evacuation point.
# Create linear regression model
model <- lm(distance ~ per_non_white + median_HH, data = vaBeach_ACS_shortdist)
summary(model)
##
## Call:
## lm(formula = distance ~ per_non_white + median_HH, data = vaBeach_ACS_shortdist)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.6292 -5.1148 -0.7206 4.1073 28.6158
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.502e+01 1.598e+00 28.167 < 2e-16 ***
## per_non_white -9.229e-02 2.018e-02 -4.574 6.95e-06 ***
## median_HH 1.506e-05 1.222e-05 1.232 0.219
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.589 on 308 degrees of freedom
## Multiple R-squared: 0.1059, Adjusted R-squared: 0.1001
## F-statistic: 18.25 on 2 and 308 DF, p-value: 3.245e-08
From the model, the linear regression equation is \(\hat{y}\) = 45.02 - 9.23\(e^{-2}\)\(x_{1}\) + 1.51\(e^{-5}\)\(x_{2}\), where \(x_{1}\) is percent non-white residents and \(x_{2}\) is median household income. The multiple R-squared is equal to 0.1059, indicating that the independent variables do not do a good job of predicting distance using linear regression.
To determine the significnce of each of the variables, the t-values can be examined. For degrees of freedom equal to 310 (n-1), a t-value at 0.999 confidence is 3.322, indicating that a value greater than 3.322 occurring by chance is less than 0.1%. The absolute value of the t-value for percentage of non-white residents is 4.574, meaning it is a rare event and is significant. Median household income, on the other hand, is not significant given the significance factor is 1, or 0%. T-values for this were obtained from simulation-math.com.
Next, examining the Residuals versus Fitted confirms that the R-squared is weak as the data is dispersed around the centerline. Had the R-squared been strong, it would have appeared more tightly grouped about the centerline. As for the Q-Q plot, the data points curve upwards at the beginning and end of centerline, indicating a slightly right-skewed dataset.
# Plot model with 2x2 frame
par(mfrow = c(1, 2))
plot(model, which = c(1, 2))
Figure 12. Plot of Linear Regression Model
Values. The plot on the left is of residuals versus
fitted values. The dispersion of the variables about the center line
indicate a weak R-squared, which coincides with the value 0.1059. The
right plot is the Q-=Q plot, which describes the distribution of the
data. The curves at the top and bottom indicate slightly right-skewed
data.
Finally for the linear regression model, a map of fitted values and residuals shows that the model does not perform well, specifically in the northwest of Virginia Beach, where distances are over predicted, and in the central to southern portions of Virginia Beach, where distances are under predicted.
# Calculate residuals and fitted values from the regression model, combine with original data # and plot
resids <- residuals(model)
fitted_values <- fitted.values(model)
lm_values <- c("fitted_values", "residuals")
lm_labels <- c("Fitted Values", "Residuals")
# Add residuals and fitted values to the original spatial dataframe
map.model <- cbind(vaBeach_ACS_shortdist,
setNames(data.frame(apply(X = array(data = c(fitted_values, resids), dim = c(311, 2)),
MARGIN = c(1,2),
FUN = function(x) {return(x)})), lm_values))
# Setting color palette
pal <- rev(brewer.pal(9, "RdYlBu"))
tmap_mode("view") # setting mapviewer to interactive
# Plotting residuals versus predicted values
tm_shape(map.model) +
tm_polygons(lm_values,
n = 9,
style = "jenks",
midpoint = NA,
lwd = 0.5,
palette = pal,
border.col = "black",
title = lm_labels) +
tmap_options(basemaps = "CartoDB.Positron") +
tm_facets(nrow = 1, sync = TRUE)
Figure 13. Linear Regression Model - Fitted Values
and Residuals. The maps depict the fitted, or
predicted values, on the left with the residuals, or errors, on the
right. From the residuals map, the model over predicts in the
northeastern portion of Virginia Beach and under predicts int he central
to southern portion.
With a linear regression model complete, GWR model is created to see if it does a better job of predicting distance based on the two independent variables, percent non-white residents and median household income. First, the bandwidth, or size of the neighborhood for each local regression equation, is calculated. This bandwidth is then used to create the GWR model, the results of which are stored in a spatial data frame (SDF).
# Obtaining Coordinates to use with GWR models
coords <- st_coordinates(vaBeach_ACS_centroid)
# Calcualte bandwidth for GWR with an adaptive kernel
gwrBW <- gwr.sel(distance ~ per_non_white + median_HH,
data = vaBeach_ACS_shortdist,
coords = cbind(coords),
longlat = TRUE,
adapt = TRUE,
verbose = FALSE)
# Create GWR Model with Distance as dependent and
# per non-white and median household income as independent
gwrModel <- gwr(distance ~ per_non_white + median_HH,
data = vaBeach_ACS_shortdist,
coords = cbind(coords),
longlat = TRUE,
adapt = gwrBW,
hatmatrix = TRUE,
se.fit = TRUE)
# Convert SDF with model results to DF
results <- as.data.frame(gwrModel$SDF)
In order to plot the GWR model results, the results are
joined with the original polygons. The first plot is of local R-squared
(localR2). The local R-Squared indicates how well the GWR
model predicts the values for each block group. Values close to 1
indicate the model is a good predictor where as those close to 0
indicate a poor predictor. For this plot, along with the rest of the
plots, natural jenks were chosen as breakpoints with a total of 9 bins,
an arbitrarily chosen value after multiple attempts to find an
appropriate bin size to properly represent the data.
Based on the plot of local R-squared values, the GWR model appears to be a good predictor of the dependent variable based on the two independent variables, percentage of non-white residents and median household income per block group. With the exception of a single negative R-squared value in the northern portion of Virginia Beach, and a few values less than 0.5 (approximately 6 block groups), the majority of the local R-squared values are 0.5 or higher with a large portion 0.635 and higher. This indicates for the majority of Virginia Beach, the GWR model is a good fit.
# Combine results with the original dataframe
gwrMap <- cbind(vaBeach_ACS_shortdist, as.matrix(results))
tm_shape(gwrMap) +
tm_polygons("localR2",
n = 9,
style = "jenks",
midpoint = NA,
palette = pal,
border.col = "black",
lwd = 0.5,
title = "Local R2")
Figure 14. Local R-Squared Values from
GWR. This plot depicts the local R-squared values for
each block group in Virginia Beach. Values close to 1 indicate the model
has a good fit, whereas those close to 0 indicate a poor fit. The
majority of values are 0.635 or higher, indicating the model does a good
job of predicting distance to an evacuation point based on median
household income and percentage of non-white residents. Of note, there
is a single negative local R-squared, indicating poor fit or an issue
with the model for that block group.
Next, the predicted values and residuals are examined. The residuals plot, located on the right, like the linear regression model, still over predicts distances in some block groups in the northwestern corner of Virginia Beach and slightly under predicts distances in the southern portion of the city. That said, the range of residuals are much smaller than those of the linear regression model (approximately -3km to 6.7km for GWR vice -12.6km to 28.6km for linear regression), indicating the GWR model is a much better fit.
facets.1 <- c("pred", "gwr.e")
labels.1 <- c("Predicted Values", "Residuals")
tm_shape(gwrMap) +
tm_polygons(facets.1,
n = 9,
midpoint = NA,
style = "jenks",
palette = pal,
border.col = "black",
lwd = 0.5,
title = labels.1) +
tm_facets(nrow = 1, sync = TRUE)
Figure 15. Plot of Predicted Values and Residuals
with GWR. This plot depicts the predicted distances
based on the two independent variables, percentage of non-white
residents and median household income, on the left, and the residuals,
or errors, on the right. Based on the residuals, the GWR model is a good
fit with errors only ranging from -3km to 6.7km.
Finally, we examine the coefficients for each independent variable in each block group. The larger the coefficient, the more influence that independent variable has for the specific block group in determining the dependent variable. The smaller the coefficient, the less impact the independent variable has. Regarding the sign of the coefficient, negative values have an inverse relationship whereas positive coefficients indicate a direct relationship.
The plot on the left are the coefficients for percentage of non-white residents per block group, with values ranging between -0.413 and 0.324. For block groups with negative values, such as those in the southern portion of Virginia Beach, as the percentage of non-white residents decreases, the distance to an evacuation point increases. Examining the spatial distribution of the percentage of non-white coefficient, the majority of the block groups have a negative coefficient or one very close to zero. This indicates in many block groups there is a negative correlation or, in the case where the coefficient is close to 0, the percentage of non-white residents plays little role in determining distance. There are some outliers, such as in the northeast portion of the city where coefficients of 0.32 and 0.22 exist, which indicates that as percentage of non-white residents increases, so does the distance.
Regarding the independent variable median household income, whose coefficients are plotted on the right, the values range from -1.4\(e^{-5}\) and 1.4\(e^{-5}\). Despite the variability throughout Virginia Beach which may give the impression of influence, given these values are extremely close to 0 their influence is essentially negligible, which is in line with the findings from the linear regression analysis.
facets.2 <- c("per_non_white.1", "median_HH.1")
labels.2 <- c("Precent Non-White Coefficients", "Median Household Income Coefficients")
tm_shape(gwrMap) +
tm_polygons(facets.2,
n = 9,
style = "jenks",
midpoint = 0,
palette = pal,
border.col = "black",
lwd = 0.5,
title = labels.2) +
tm_facets(nrow = 1, sync = TRUE)
Figure 16. Plot of Independent Variable Coefficients
from GWR Model. The plot on the right depicts the
coefficients for the percentage of non-white residents. The
preponderance of the values are negative or close to zero, indicating an
inverse relationship between percentage of non-white residents and
distance or, in the case values close to 0, little influence. The the
left plot depicts the coefficients for median household income. These
values are extremely small, indicating little to no influence on the
dependent variable.
The intent of this research was to determine if demographics, specifically race and median household income, play a significant role in determining the distance to one of two evacuation points. A simple analysis was completed comparing the calculated distance to both median household income and percentage of non-white residents. As seen in figure 10, median household income shows a slightly positive correlation with distance, meaning the greater the income the further the distance to travel. Figure 11, which depicts percentage of non-white residents versus distance, has a negative correlation, indicating that as the percentage of non-white residents increases, the distance to travel decreases.
With a simple review complete, a linear regression was completed to test the effectiveness of the model. Based on the R-squared values, as well as the residuals, it was determined the linear regression model was not a good fit, but it did provide insight into the influence of the two independent values. From the linear regression, the percentage of non-white residents was a good indicator of distance to an evacuation point whereas median household income was not.
Finally, GWR model was created to examine the influence of the independent variables on distance to an evacuation point. Based on the residuals seen in figure 15, the GWR model does a much better job of predicting distance to an evacuation point. Examining the coefficients for each independent variable confirms the results found from the linear regression model that percentage of non-white residents is a good indicator of distance to travel while median household income is almost negligible.
Based on these results, the null hypothesis can be partially rejected. It appears that race does play a role in determining distance to an evacuation point, where block groups with a higher percentage of non-white residents have a shorter distance to travel, indicating this portion of the null hypothesis can be rejected. Median household income plays little role in determining the distance needed to travel, indicating this portion of the null hypothesis must be accepted.
This project was an extremely interesting undertaking. The processes
undertaken, many of which were used for the first time, such as
osrm to route and spgwr for the GWR model,
were easy to learn and worked extremely well. An interesting discovery
was that income played little role in determining distance to an
evacuation point. This does coincide with the study by Diaz et.
al. which found that income played little role in the decision to
evacuate.
One struggle with osrm was being able to review the
transportation network data for accuracy and quuality. Due to the size
of the data and the desire to include all roads, this would have been a
herculean task and too great for a term porject. Also, attempting to
determine the closest point to access one of the major evacuation
routes, for example the distance to an entrance ramp for I-64, was a
challenge. The way osrm is understood to work, a list of
destination locations must be known to go along with the list of source
locations. This would be possible but would require having coordinates
for every intersection and entrance ramp for all the evacuation routes,
and then require for loops within for loops, increasing compute time.
Finally, block group centroids were used to determine distance to an
evacuation point, which may not be a good representation of the entire
block group. The route function in osrm works by finding
the closest road to the source point provided. For small block groups,
such as those in northern Virginia Beach, the location of the centroid
compared to the edges is likely small and negligible. For large block
groups, such as those in southern Virginia Beach, the difference between
the southern or northern portion of the block group can be measured in
multiple kilometers and thus introduce significant error or
artificiality when using only centroids. In other words, for these large
block groups, the distance those in the northern portion of the block
group would be significantly less than both the center of the block
group and southern portion.
In addition to the struggle with osrm, calculating the
GWR model introduced some issues based on the few variables chosen, asnd
additional independent variables would have added value to the project.
The initial proposal called for just examining median household income,
and due to the ease of use of tidycensus to gather ACS
data, race was added. This turned out to be a good choice as income
played little role whereas race played a more vital role. Other
variables to consider may be home prices or level of education.
Finally, one assumption made in determining distance for residents to travel is that each resident or household had a private means to travel, or a personal vehicle capable of traveling on highways and interstates. It is very possible that some of the residents, especially those in poorer neighborhoods, may not have access to a vehicle, meaning even a short distance may be insurmountable if a vehicle isn’t available. It would be beneficial to also examine ability to evacuate based on vehicle access, and barring that, examine access to public transportation or shelters.
Colbert, A. (2022). A Force of Nature: Hurricanes in a Changing
Climate. NASA. Retrieved Aug 27 from
https://climate.nasa.gov/news/3184/a-force-of-nature-hurricanes-in-a-changing-climate/
Deng, H., Aldrich, D. P., Danziger, M. M., Gao, J., Phillips, N. E., Cornelius, S. P., & Wang, Q. R. (2021). High-resolution human mobility data reveal race and wealth disparities in disaster evacuation patterns. Humanities and Social Sciences Communications, 8(1). https://doi.org/10.1057/s41599-021-00824-8
Diaz, R., Acero, B., Behr, J. G., & Hutton, N. S. (2023). Impacts of household vulnerability on hurricane logistics evacuation under COVID-19: The case of U.S. Hampton Roads. Transp Res E Logist Transp Rev, 176, 103179. https://doi.org/10.1016/j.tre.2023.103179
NOAA. (n.d.). Hurricane Costs. Retrieved Aug 27 from https://coast.noaa.gov/states/fast-facts/hurricane-costs.html