Geographical distribution of neurosurgeons
Get the data
First, we’ll load in some of the data that’s already been created. To begin, let’s get the shapefiles of all the counties in the continental US using the tigris package
multiCounties <- tigris::counties(state = state.abb, cb=T) %>%
filter(STATEFP!="02", STATEFP!="15")
# Save a subset with a smaller geography
subset_states <- c("tx", "la", "ok")
some_counties <- tigris::counties(state = subset_states, cb=T)
We’ll also grab some data about TBI mortality. This was downloaded using the CDC’s WISQARS database, and has the crude and age-adjusted rates per 100,000 Population for each county of fatal TBI injuries (all intents, all mechanisms, all ages, all race/ethnicity, both sexes) from 2008-2014. It’s based on the 2000 standard population (all races, both sexes). Data was downloaded on 1/17/2020
TBIs <- read_csv("~/Github/NSG_geography/Data/TBI_Deaths.csv")
Finally, we need to know the location of the providers. Using data from the 2017 Medicare Provider Utilization and Payment Data: Physician and Other Supplier, I selected all neurosurgeons and grabbed their NPI1 along with their first and last name. I used the Google Maps API to geocode2 each provider, and this file contains their longitude and latitude
provs <- read_csv("~/Github/NSG_geography/Google_results/geo_google_raw.csv") %>%
filter(geoState!="hi", geoState!="ak") %>%
st_as_sf(coords = c("lon", "lat"), crs=4269)
Find closest provider for each county
With all of this data available, now we need to find the closest provider for each county. I made a function below to acomplish this
find_closest <- function(county_sf, prov_sf, centroid=T) {
tic() # time the function
library(testthat) ## Testing for bugs
test_that("inputs are sf objects", {
expect_s3_class(county_sf, "sf")
expect_s3_class(prov_sf, "sf")
})
test_that("Correct geometries", {
county_sf_geo <- as.character(st_geometry_type(county_sf)[[1]])
expect_equal(county_sf_geo, "MULTIPOLYGON")
prov_sf_geo <- as.character(st_geometry_type(prov_sf)[[1]])
expect_equal(prov_sf_geo, "POINT")
})
# Finds index of nearest provider for each county
i <- st_nearest_feature(county_sf, prov_sf)
p <- prov_sf[i,]
# Calculate distance for each element
if(centroid){
distance <- st_distance(p, st_centroid(county_sf), by_element=T)
} else({
distance <- st_distance(p, county_sf, by_element=T)
})
# Create a data frame with results
df <- tibble(
GEOID = county_sf$GEOID,
npi = p$npi,
providerCity = p$geoCity,
providerST = p$geoState,
Distance = units::set_units(distance, mi),
Distance_m = distance
) %>%
mutate(Distance = as.numeric(Distance))
# Join to county_sf to make sf object
df_sf <- geo_join(county_sf, df, by="GEOID")
toc() # end timer
return(df_sf)
}
Maps
## 2.2 sec elapsed
## 13.028 sec elapsed
We can see there is a correlation among counties with a higher mortality rate and those with further distances to the nearest neurosurgeon
National Provider Identifier: unique 10-digit identification number issued to health care providers in the United States by the Centers for Medicare and Medicaid Services↩
By “geocode”, I mean that I queried the Google Maps API with the string
{First Name}
+{Last Name}
+neurosurgeon
. For the vast majority of providers/NPIs, this returned the a result, including the longitude, latitude, state, city, and country of the provider. Additional details can be found in Geocode_with_google.Rmd file↩