Online geocoding in R using the NYS GIS server

Previously I wrote a post on using the NYS ESRI geocoding server in python. I recently wrote a function in R to do the same. The base url server has changed since I wrote the Python post, but it is easy to update that (the JSON returned doesn’t change.) This should also be simple to update for other ESRI servers, just change the base variable in the first function. This uses the httr package to get the url and the jsonlite package to parse the response.

#################################################################
#Functions for geocoding using online NYS GIS Esri API, https://gis.ny.gov/
library(httr)
library(jsonlite)

#getting a single address, WKID 4326 is WGS 1984, so returns lat/lon
get_NYSAdd <- function(address,WKID='4326'){
  base <- "http://gisservices.dhses.ny.gov/arcgis/rest/services/Locators/Street_and_Address_Composite/GeocodeServer/findAddressCandidates"
  soup <- GET(url=base,query=list(SingleLine=address,maxLocations='1',outSR=WKID,f='pjson'))
  dat <- fromJSON(content(soup,as='text'),simplifyVector=TRUE)$candidates
  return(dat)
}
#looping over a vector of addresses, parsing, and returning a data frame
geo_NYSAdd <- function(addresses,...){
  #make empy matrix
  l <- length(addresses)
  MyDat <- data.frame(matrix(nrow=l,ncol=3))
  names(MyDat) <- c("Address","Lon","Lat")
  for (i in 1:l){
    x <- get_NYSAdd(address=addresses[i],...)
    if (length(x) > 0){
      MyDat[i,1] <- x[,1]
      MyDat[i,2] <- x[,2][1]
      MyDat[i,3] <- x[,2][2]
    }
  }
  MyDat$OrigAdd <- addresses
  return(MyDat)
}
#################################################################

The first function takes a single address, gets and parses the returning JSON. The second function loops over a list of addresses and returns a data frame with the original addresses, the matched address, and the lat/lon coordinates. I use a loop instead of an apply type function because with the web server you really shouldn’t submit large jobs that it would take along time anyway. The NYS server is free and has no 2,500 limit, but I wouldn’t submit jobs much bigger than that though.

AddList <- c("100 Washington Ave, Albany, NY","100 Washington Ave Ext, Albany, NY",
             "421 New Karner Rd., Albany, NY","Washington Ave. and Lark St., Albany, NY","poop")
GeoAddresses <- geo_NYSAdd(addresses=AddList)
GeoAddresses

We can compare these to what the google geocoding api returns (using the ggmap package):

library(ggmap)
googleAddresses <- geocode(AddList,source="google")
GeoAddresses$G_lon <- googleAddresses$lon
GeoAddresses$G_lat <- googleAddresses$lat
GeoAddresses

And we can see that the nonsense "poop" address was actually geocoded! See some similar related funny results from the google maps geocoding via StackMaps.

We can also see some confusion between Washington Ave. Ext as well. The NYS online server should theoretically have more up to date data than Google, but as above shows it is not always better. To do geocoding well takes some serious time to examine the initial addresses and the resulting coordinates in my experience.

To calculate the great circle distance between the coordinates we can use the spDists function in the sp library.

library(sp)
spDists(x = as.matrix(GeoAddresses[1:4,c("Lon","Lat")]),
        y = as.matrix(GeoAddresses[1:4,c("G_lon","G_lat")]),
        longlat=TRUE,diagonal=TRUE) #distance in kilometers

But really, we should just project the data and calculate the Euclidean distance (see the proj4 library). Note that using the law of cosines is typically not recommended for very small distances, so the last distance is suspect. (For reference I point to some resources and analysis showing how to calculate great circle distances in SPSS on Nabble recently.)

Leave a comment

3 Comments

  1. Scott

     /  April 6, 2022

    This is a great post and made it so easy to geocode with this free service! One update is the url for the NYS geocoder has changed.

    Use “https://gisservices.its.ny.gov/arcgis/rest/services/Locators/Street_and_Address_Composite/GeocodeServer/findAddressCandidates” for the base argument.

    Cheers

    Reply
  1. Some GIS data scraping adventures: Banksy graffiti and gang locations in NYC | Andrew Wheeler
  2. Data sources for crime generators | Andrew Wheeler

Leave a Reply to Scott Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: