Geographical distribution of neurosurgeons

Get the data

First, we’ll load in some of the data that’s already been created. To begin, let’s get the shapefiles of all the counties in the continental US using the tigris package

multiCounties <- tigris::counties(state = state.abb, cb=T) %>%
  filter(STATEFP!="02", STATEFP!="15")


# Save a subset with a smaller geography
subset_states <- c("tx", "la", "ok")
some_counties <- tigris::counties(state = subset_states, cb=T)

We’ll also grab some data about TBI mortality. This was downloaded using the CDC’s WISQARS database, and has the crude and age-adjusted rates per 100,000 Population for each county of fatal TBI injuries (all intents, all mechanisms, all ages, all race/ethnicity, both sexes) from 2008-2014. It’s based on the 2000 standard population (all races, both sexes). Data was downloaded on 1/17/2020

TBIs <- read_csv("~/Github/NSG_geography/Data/TBI_Deaths.csv")

Finally, we need to know the location of the providers. Using data from the 2017 Medicare Provider Utilization and Payment Data: Physician and Other Supplier, I selected all neurosurgeons and grabbed their NPI1 along with their first and last name. I used the Google Maps API to geocode2 each provider, and this file contains their longitude and latitude

provs <- read_csv("~/Github/NSG_geography/Google_results/geo_google_raw.csv") %>%
  filter(geoState!="hi", geoState!="ak") %>%
  st_as_sf(coords = c("lon", "lat"), crs=4269)

Find closest provider for each county

With all of this data available, now we need to find the closest provider for each county. I made a function below to acomplish this

find_closest <- function(county_sf, prov_sf, centroid=T) {
  tic() # time the function
  
  library(testthat) ## Testing for bugs
  test_that("inputs are sf objects", {
    expect_s3_class(county_sf, "sf")
    expect_s3_class(prov_sf, "sf")
  })
  test_that("Correct geometries", {
    county_sf_geo <- as.character(st_geometry_type(county_sf)[[1]])
    expect_equal(county_sf_geo, "MULTIPOLYGON")
    prov_sf_geo <- as.character(st_geometry_type(prov_sf)[[1]])
    expect_equal(prov_sf_geo, "POINT")
  })
  
  
  # Finds index of nearest provider for each county
  i <- st_nearest_feature(county_sf, prov_sf)
  p <- prov_sf[i,]
  
  # Calculate distance for each element
  if(centroid){
    distance <- st_distance(p, st_centroid(county_sf), by_element=T)
  } else({
    distance <- st_distance(p, county_sf, by_element=T)
  })
  
  
  # Create a data frame with results
  df <- tibble(
    GEOID         = county_sf$GEOID,
    npi           = p$npi,
    providerCity  = p$geoCity,
    providerST    = p$geoState,
    Distance      = units::set_units(distance, mi),
    Distance_m    = distance
  ) %>%
  mutate(Distance = as.numeric(Distance))
  
  # Join to county_sf to make sf object
  df_sf <- geo_join(county_sf, df, by="GEOID") 
  
  toc() # end timer
  return(df_sf)
}

Maps

## 2.2 sec elapsed

## 13.028 sec elapsed

We can see there is a correlation among counties with a higher mortality rate and those with further distances to the nearest neurosurgeon


  1. National Provider Identifier: unique 10-digit identification number issued to health care providers in the United States by the Centers for Medicare and Medicaid Services

  2. By “geocode”, I mean that I queried the Google Maps API with the string {First Name} + {Last Name} + neurosurgeon. For the vast majority of providers/NPIs, this returned the a result, including the longitude, latitude, state, city, and country of the provider. Additional details can be found in Geocode_with_google.Rmd file

Hunter Ratliff, MD, MPH
Hunter Ratliff, MD, MPH
Infectious Diseases Fellow

My research interests include epidemiology, social determinants of health, and reproducible research.

Related