The Social Vulnerability Index (SoVI) is a widely used tool for assessing community vulnerability to natural hazards by aggregating social and demographic variables. This study evaluates the relationship between SoVI and disaster outcomes, specifically property damage per capita from Hurricane Harvey, using American Community Survey (ACS) and Spatial Hazard Events and Losses Database for the United States (SHELDUS) data. SoVI was calculated at the county level using both national and regional datasets, followed by ordinary least squares regression to analyze its correlation with property damage. Results indicate a weak positive relationship between SoVI and property damage per capita, but the relationship was not statistically significant in either dataset. Challenges in replicating SoVI included ambiguities in variable selection, subjectivity in determining component cardinality, and concerns with potential inconsistencies and spatial granularity in SHELDUS data. These findings suggest that while SoVI provides useful insights into social vulnerability, its application as a predictor of disaster outcomes is limited without accounting for additional factors. The study highlights the need for improved methods to refine SoVI calculations and validate its effectiveness.
The Social Vulnerability Index (SoVI) aims to measure the vulnerability of communities to natural hazards by examining various social and demographic factors to assist policymakers and planners in risk assessment and disaster management with a standardized and repeatable method (Cutter & Finch, 2008). SoVI aggregates various variables, such as income, age, race, transportation access, and disability status, typically gathered from census data, to create a composite score, which is then applied to assess the susceptibility of communities to disasters. Since many variables and complex datasets are involved in this process, principal component analysis (PCA) is used to reduce complex data sets into lower dimensions to reveal the hidden, simplified structures that often underlie them (Shlens, 2014).
Since its creation, SoVI has been a critical tool in disaster risk management in the U.S. and abroad (Bronfman et al., 2021; Fekete, 2009). However, it still has some important drawbacks. Social Vulnerability cannot be directly measured and quantified, so it must be inferred from indirect indicators. Distilling these complex and diverse social variables into a single index through PCA raises issues with its internal consistency (do different measurements produce similar results) (Spielman et al., 2020) and its sensitivity to variable selection and geographic context (Flanagan et al., 2021; Rufat et al., 2019). Also, the social vulnerability of a community may differ based on the type of hazard encountered – it may not be a one-size-fits-all index (Rufat et al., 2019; Spielman et al., 2020; Tellman et al., 2020). Finally, the spatial scale may influence the index, as SoVI may differ at the regional, county, or local level (Hinojos et al., 2023). Given the widespread use of SoVI by policymakers and planners, combined with the challenges and concerns raised, it is critical that SoVI be verified and validated. Despite SoVI’s widespread use and acceptance as a predominant tool, from a research perspective, validating its assessment remains a challenge. Various methods have been used in an attempt to validate SoVI, such as spatial autocorrelation, post-hazard analysis, and regression models, but there is no universally accepted standard. The lack of the ability to validate a critical tool underscores the need for continued research of validation methods that can be applied consistently.
Various validation methods have been used since SoVI’s inception to test the validity of the index. One method is to utilize spatial autocorrelation, such as Moran’s I and Local Indicators of Spatial Association (LISA), due to their ability to quantify and visualize spatial patterns. The use of spatial autocorrelation showed social vulnerability is not randomly distributed but instead clustered; additionally, the research showed vulnerability status in some areas remained persistent, while the status of other areas changed over time with shifting demographics and socioeconomic conditions (Bronfman et al., 2021; Park & Xu, 2020). The use of spatial autocorrelation also showed using localized variables and considering higher spatial resolution (i.e., block groups or census tracts versus counties or their equivalent outside of the U.S.) to calculate SoVI could be more effective in capturing areas at risk (Hinojos et al., 2023; Park & Xu, 2021). Regarding indicators, research found income, housing, race, and age were key factors in determining vulnerability.
Post-hazard analysis is another method used to validate and verify social vulnerability. These studies utilized surveys to assess the impacts of natural hazards on different communities – one in Houston, TX, post-hurricane Harvey (Griego et al., 2020) and the other in Germany, which focused on significant river flooding in 2002 (Fekete, 2009). These studies found the survey approach assisted in determining variables, such as income, housing quality, and age, which may identify patterns in social vulnerability within communities affected by natural disasters. While these surveys effectively highlighted a correlation between these variables and disaster impacts, they did not necessarily establish causation (Fekete, 2009). These studies also faced the challenge of the potential risk of recall bias in the household surveys, where participants might not accurately remember or report the details of their experiences during the disaster, likely due to negative or positive experiences, leading to inaccuracies and inconsistencies (Griego et al., 2020) combined with the required fieldwork and securing participation to produce a suitable sample size.
A final common method to validate and verify social vulnerability indexes is to utilize regression models with post-hazard empirical data (i.e., damage and fatality assessments). Various methods of regression analysis have been utilized, such as ordinary least squares (OLS) (Rufat et al., 2019; Yoon, 2012), logistic regression (Rufat et al., 2019), and zero-inflated negative binomial (ZINB) regression (Zahran et al., 2008). Previous research that validated SoVI using post-disaster data from the Federal Emergency Management Administration (FEMA), which provides requests for disaster assistance, or Spatial Hazard Events or Losses Database for the United States (SHELDUS), which provides property damage in dollars and fatalities, aimed to determine if a correlation existed between specific vulnerability factors (e.g., income, race, etc. ) and disaster outcomes, such as FEMA assistance or property damage and fatalities. While each of these studies found a correlation between SoVI variables–specifically income, housing quality, race, age, and disaster outcomes, they also did identify challenges. One primary challenge is data availability, finding it to be incomplete, inconsistent, or not at an appropriate resolution (Tellman et al., 2020; Yoon, 2012; Zahran et al., 2008). Additionally, researchers determined that what may apply to one type of hazard or geographic location may not apply to other types of hazards or locations. (Rufat et al., 2019; Tellman et al., 2020; Zahran et al., 2008)
While SoVI is a widely accepted index for determining socially vulnerable areas in our communities, it is an indirect measurement that lacks a consistent method for validation and verification. Previous research has attempted to validate SoVI, each having strengths and weaknesses, but no standard method exists. Of the previous research investigated, those using post-disaster outcomes and regression analysis seem the most promising as they utilize readily available empirical data and proven regression analysis methods.
The intent of this research is to expound upon these methods. It will first calculate SoVI at the county level using American Community Survey (ACS) data, as well as data from other sources where applicable, for the entire United States and then using data just from Texas, Louisisana, and Mississippi. From there, SHELDUS data from hurricane Harvey along the Gulf Coast will be used to compare SoVI values to property damage. By analyzing how or if SoVI correlates with a disaster outcome, this research will attempt to determine how well SoVI captures actual social vulnerability.
Two primary datasets were used in this research: the American Community Survey (ACS) and the Spatial Hazard Events or Losses Database for the United States (SHELDUS).
The Spatial Hazard Events and Losses Database for the United States (SHELDUS), managed by Arizona State University, is a detailed database that records the impacts of natural hazards across U.S. counties. SHELDUS includes information on events such as hurricanes, floods, wildfires, and earthquakes, providing data on property damage, crop losses, injuries, and fatalities at the county level. For this research, SHELDUS data from hurricane Harvey in 2017 was used. Of note, SHELDUS is not a free service.
With SoVI calculated for the Texas, Lousiana, and Mississippi region using two different datasets, the next step is to determine how well SoVI correlates with disaster data. For this analysis, the SHELDUS dataset was used to determine the correlation between SoVI and disaster data through ordinary least squares. SHELDUS is a comprehensive database of natural disasters in the U.S. and includes information on the type of disaster, the date it occurred, and the location. The dataset also includes information on the property damage, which can be adjusted for inflation and per capita for each county, as well as injuries and deaths. For this project, the initial focus is on using property damage per capita in 2023 dollars.
# Read in the data
sheldus <- read_csv(here("data/SHELDUS_Harvey", "SHELDUS_harvey.csv")) %>%
select('County FIPS', 'PropertyDmgPerCapita(ADJ 2023)', 'InjuriesPerCapita',
'FatalitiesPerCapita') %>%
rename(GEOID = 'County FIPS', Prop_DMG = 'PropertyDmgPerCapita(ADJ 2023)',
Injuries = 'InjuriesPerCapita', Fatalities = "FatalitiesPerCapita") %>% # Renaming columns
mutate(GEOID = gsub("^'|'$", "", GEOID))
# Merge SHELDUS data with SoVI data, retaining all colmuns without matching SHELDUS
us.sovi.sheldus <- left_join(counties.US, sheldus, by = "GEOID") %>%
filter(substr(GEOID, 1, 2) %in% c("48", "22", "28"))
txlams.sovi.sheldus <- left_join(counties.TXLAMS, sheldus, by = "GEOID") %>%
filter(substr(GEOID, 1, 2) %in% c("48", "22", "28"))
The SHELDUS data was purchased from the Spatial Hazard Events and Losses Database for the United States (SHELDUS) website, which is managed by Arizona State University. The data was filtered to include only information related to Hurricane Harvey, with a date range spanning a few days prior to the event, which occurred from August 25–29, 2017, and several months after to ensure all information was captured. Property damage values were adjusted to 2023 dollars automatically. The SHELDUS data was then joined with county-level dataframes containing SoVI scores and further refined to include only counties in Texas, Louisiana, and Mississippi.
Figure 3.4 illustrates property damage per capita from Hurricane Harvey (adjusted to 2023 dollars). The majority of the highest per capita damage occurred along the Texas Gulf Coast, particularly around Houston, TX, and extended into western Louisiana. However, reported damages are observed as far north as DeSoto County in northern Mississippi and as far west as Ector County in western Texas. For visualization, the data was categorized into four quantiles, and includes a group for “NA” or “No Data.” The map effectively highlights the spatial distribution of property damage categorized into these five groups
Figure 3.4. Property Damage from Hurricane Harvey
(2017). The map shows the distribution of property damage per capita in
2023 dollars from Hurricane Harvey in the Texas, Louisiana, and
Mississippi region. The property damage is categorized into five groups
based on quantiles. The majority of the damage per capita occurs in the
vicinity of Houston, TX.
To explore the relationship between social vulnerability and property damage, scatter plots were created for both the U.S. dataset (Figure 3.5) and the Texas, Louisiana, and Mississippi dataset (Figure 3.6) using ordinary least squares regression. These plots illustrate the relationship between property damage per capita from Hurricane Harvey and the SoVI scores. A slight positive trend is observed, suggesting that counties with higher social vulnerability tend to experience greater property damage. This pattern appears consistent across both datasets, implying that social vulnerability could be an important factor influencing the extent of damage communities face during natural disasters.
ggplot(us.sovi.sheldus, aes(x = Prop_DMG, y = SoVI)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Property Damage vs. SoVI (US)",
x = "Property Damage Per Capita",
y = "SoVI")
Figure 3.5. Property Damage vs. SoVI (U.S. Dataset). The
figure shows the relationship between property damage per capita in 2023
dollars from Hurricane Harvey and the Social Vulnerability Index (SoVI)
calculated using the entire U.S. dataset. The plot shows a positive
relationship between property damage and social
vulnerability.
ggplot(txlams.sovi.sheldus, aes(x = Prop_DMG, y = SoVI)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Property Damage vs. SoVI (TX-LA-MS)",
x = "Property Damage Per Capita",
y = "SoVI")
test2
Figure 3.6. Property Damage vs. SoVI (TX-LA-MS
Dataset). The figure shows the relationship between property damage per
capita in 2023 dollars from Hurricane Harvey and the Social
Vulnerability Index (SoVI) calculated using the TX, LA, and MS dataset.
The plot shows a positive relationship between property damage and
social vulnerability.
To truly determine the relationship between social vulnerability and property damage, ordinary least squares (OLS) regression was used to examine the correlation between the two variables in both datasets. The regression analysis provides a p-value to determine the statistical significance of the relationship. A low p-value (typically < 0.05) indicates that the relationship is statistically significant. The direction and strength of the relationship are determined by the correlation coefficient derived from the regression model, where a value of 1 indicates a perfect positive relationship, -1 indicates a perfect negative relationship, and 0 indicates no relationship.
An OLS regression was performed to examine the relationship between property damage per capita (response variable) and SoVI (predictor variable) for counties impacted by Hurricane Harvey along in Texas, Louisiana, Mississippi using the entire U.S. dataset. The analysis suggests a marginal positive relationship between SoVI and property damage per capita. While counties with higher social vulnerability appear to have slightly higher property damage, the relationship is not strongly statistically significant (p = 0.0562) as the p-value is slightly above the nominal threshold of 0.5, and the model explains only a small portion (5.5%) of the variability in property damage, which can be seen in Table 4.1. This indicates that while social vulnerability may play a role, additional factors likely contribute to property damage outcomes and should be explored in further analysis.
# OLS regression of US County Dataset and SHELDUS Prop DMG
us.model <- lm(Prop_DMG ~ SoVI, data = us.sovi.sheldus)
summary(us.model)
##
## Call:
## lm(formula = Prop_DMG ~ SoVI, data = us.sovi.sheldus)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14090 -8765 -5185 3782 78539
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8828.1 2101.6 4.201 8.27e-05 ***
## SoVI 1039.7 534.7 1.944 0.0562 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16390 on 65 degrees of freedom
## (333 observations deleted due to missingness)
## Multiple R-squared: 0.05496, Adjusted R-squared: 0.04042
## F-statistic: 3.78 on 1 and 65 DF, p-value: 0.05619
Table 4.1. OLS Regression of U.S. County Data. The
table shows the results of the ordinary least squares (OLS) regression
analysis examining the relationship between property damage per capita
from Hurricane Harvey and SoVI calculated using the entire U.S. dataset.
The results (p > 0.05) indicate a marginally significant relationship
between property damage and social vulnerability.
Similarly to the U.S. dataset, an OLS regression was performed to examine the relationship between property damage per capita (response variable) and SoVI (predictor variable) for counties impacted by Hurricane Harvey in Texas, Louisiana, and Mississippi using the Texas, Louisiana, and Mississippi dataset. The analysis again shows a positive relationship, although extremely weak, indicating counties with higher social vulnerability tend to have higher property damage. However, the relationship is not statistically significant (p = 0.0735) and the model explains only a small portion (4.7%) of the variability in property damage, as seen in Table 4.2. Again, this suggests that while social vulnerability may influence property damage outcomes, other factors likely play a role and should be explored in further analysis.
# OLS regression of TX, LA, MS County Dataset and SHELDUS Prop DMG
txlams.model <- lm(Prop_DMG ~ SoVI, data = txlams.sovi.sheldus)
summary(txlams.model)
##
## Call:
## lm(formula = Prop_DMG ~ SoVI, data = txlams.sovi.sheldus)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15741 -7672 -5517 741 79300
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9293.6 2227.6 4.172 9.14e-05 ***
## SoVI 973.2 546.7 1.780 0.0797 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16470 on 65 degrees of freedom
## (333 observations deleted due to missingness)
## Multiple R-squared: 0.04649, Adjusted R-squared: 0.03182
## F-statistic: 3.169 on 1 and 65 DF, p-value: 0.0797
Table 4.2. OLS Regression of TX-LA-MS County Data.
The table shows the results of the ordinary least squares (OLS)
regression analysis examining the relationship between property damage
per capita from Hurricane Harvey and SoVI calculated using the TX-LA-MS
dataset. The results (p > 0.05) indicate a these is no statistical
significance between property damage and social
vulnerability.
This research aimed to evaluate the relationship between social vulnerability, as measured by SoVI, and disaster outcomes, specifically property damage per capita from Hurricane Harvey. The OLS regression analysis revealed a positive relationship between SoVI and property damage per capita, but the relationship was not statistically significant in either dataset (SoVI calculated with U.S.-wide and TX, LA, and MS county-level ACS data). These findings suggest that social vulnerability alone is not a strong predictor of property damage outcomes. The weak relationship may indicate that other factors, such as local infrastructure or disaster preparedness, play a more significant role. Alternatively, discrepancies in the recreation of SoVI could have contributed to the results.
Prior to conducting an OLS regression anlysis, SoVI values were calculated at the county level using the SoVI “recipe” provided by Universty of South Carolina College of Arts and Sciences. This was an attempt to recreate the methods used by Cutter and others to quantify social vulnerability, taking up the bulk of the research with an initial attempt to recreate the 2014 SoVI values provided in the “recipe”, which was unsuccessful. Recreating SoVI proved to be a challenge as neither the “recipe” nor other literature were as clear as anticipated. The “recipe” provided a general outline of the steps to follow but did not provide specific details on how to calculate the index. This led to some ambiguity and subjectivity in the process.
The SoVI “evolution” outlines 29 variables for calculating SoVI. Selecting these variables is a critical and foundational step, but it involves several challenges. While some variables, such as the percentage of Black or Hispanic populations, are directly tied to ACS data, others require additional computation or interpretation.
For example, the “recipe” suggests including the percentage of children living in two-parent families. The closest ACS proxy is “B09002_002: Children - Married Couples,” which excludes non-married couples and may underestimate this population. Similarly, the suggested variable “Nursing Home Residents per capita” does not have a direct equivalent in ACS data. The closest proxy, “B09019_038: Population in Group Quarters,” lacks age specificity and may include individuals in non-nursing home facilities, further complicating its use.
Additionally, due to data availability, 2024 hospital data was used instead of 2017 data. These substitutions introduce potential discrepancies in the recreated SoVI values, highlighting the ambiguity and subjectivity inherent in variable selection.
Following variable selection, variables were normalized and standardized as prescribed by the SoVI “recipe.” However, determining the cardinality of each principal component after PCA and varimax rotation presented significant challenges. The “recipe” recommends using a loading threshold of 0.7 but suggests that a threshold of 0.5 is acceptable in some cases. With no clear guidance on when to apply each threshold, the decision becomes subjective. In this study, a 0.7 threshold was used, but the choice may have influenced the results, as applying a 0.5 threshold could yield different cardinalities.
Various research literature provides guidance on which variables increase vulnerability and which decreases it, helping to determine overall cardinality. Still the process can be extremely subjective. For example, in Table 3.1, Dimension 4 includes a negative loading for “Percent Female” and a positive loading for “Nursing Home Residents per capita.” Literature suggests that “Percent Female” increases vulnerability, supporting a negative cardinality. However, the effect of nursing home residents is less clear: while their presence could indicate increased vulnerability due to age and need for assistance, it could also represent reduced vulnerability due to available care. Determining which variable has greater influence — or whether the overall cardinality should change — requires subjective interpretation.
This ambiguity extends to other potential scenarios, such as correlated variables. For instance, if “Percent Rich” and “Percent Poverty” appear in the same component with positive loadings, one might question which variable dominates and whether the cardinality should reflect a positive or negative influence. The “recipe” provides no detailed guidance on resolving such conflicts, leaving these decisions to researcher subjectivity and introducing potential inconsistencies.
An important consideration when calculating SoVI is deciding the appropriate spatial extent — should the analysis use data from the entire U.S. or focus on a local data? Previous research, such as that by Spielman et al. (2020), demonstrated that the spatial extent significantly impacts the results, a finding corroborated in this study (see Figure 3.3). This variation arises because PCA compares all data within the chosen extent, potentially amplifying national-level disparities while overlooking regional specifics. For example, the cost of living in DeSoto County, MS, is likely much lower than in Los Angeles County, CA. Since cost of living and other factors often vary by region, comparing counties across the entire U.S. may not appropriately represent local vulnerabilities. Expanding the spatial extent to include neighboring states rather than restricting the analysis to a single state might offer a more balanced comparison. However, the optimal extent remains unclear and requires further exploration.
To determine the relationship between social vulnerability and disaster outcomes, property damage data from the SHELDUS database was used. The SHELDUS database provides comprehensive county-level information on natural disasters, including property damage, injuries, and fatalities. However, it has limitations, such as potential under-reporting or inconsistencies in data collection. For example, property damage values may not account for uninsured losses and are often based on estimates rather than verified costs. Also, disaster-related injuries and fatalities may be under-counted, resulting in incomplete datasets. Additionally, SHELDUS evenly distributes property damage estimates across all counties affected by a disaster, which can introduce spatial autocorrelation and mask the true distribution of impacts (Yoon, 2012). Furthermore, as Tellman et al. (2020) note, these damage estimates often rely on “guesstimates,” with variability in how counties define and report damage, leading to potential inaccuracies of up to 40%. Higher damage estimates may also not indicate higher vulnerability. As Rufat et al. (2019) found in their study of Hurricane Sandy, higher property values were associated with higher damage but decreased social vulnerability. It is possible this same relationship is present in the Houston area. Due to the lack of higher spatial granularity in the SHELDUS data (i.e., not available at the block group or census tract level), it is difficult to determine if this is the case.
This research highlights several areas for improvement in the calculation and validation of SoVI. If using SoVI, future studies should prioritize refining variable selection to address ambiguities, create a consistent and repeatable process, and incorporate more granular spatial data, such as census tracts, to better capture localized vulnerability patterns. Another option would be to utilize the Center for Disease Control and Prevention (CDC) Social Vulnerability Index (SVI) to test validation methods. Additionally, exploring hazard-specific variables and alternative statistical methods, such as geographically weighted PCA, could enhance the index’s relevance and accuracy. Finally, applying SoVI to diverse disaster events and integrating more detailed empirical data, such as FEMA disaster assistance records, would provide a more comprehensive assessment of its effectiveness.
2.1 Social Vulnerability Data
The ACS is an ongoing survey conducted by the U.S. Census Bureau that provides essential data on a wide range of demographic, social, economic, and housing characteristics of the U.S. population. For calculating the Social Vulnerability Index (SoVI), ACS data is crucial because it supplies detailed information on factors like income, age, race, housing, and transportation access — all of which can be key indicators of a community’s vulnerability. For the purposes of this research, data from the ACS 5-Year Estimates at the county level were used, specifically for 2017 and 2014, the latter of which was used to validate the SoVI calculation method.
In addition to the ACS data for SoVI, hospital locations were acquired from the Homeland Infrastructure Foundation-Level Data (HIFLD) portal. Unfortunately, the only available data was for 2024. This data was used to calculate concentration of hospitals in each county.