Previously I wrote a post on using the NYS ESRI geocoding server in python. I recently wrote a function in R to do the same. The base url server has changed since I wrote the Python post, but it is easy to update that (the JSON returned doesn’t change.) This should also be simple to update for other ESRI servers, just change the base
variable in the first function. This uses the httr package to get the url and the jsonlite package to parse the response.
#################################################################
#Functions for geocoding using online NYS GIS Esri API, https://gis.ny.gov/
library(httr)
library(jsonlite)
#getting a single address, WKID 4326 is WGS 1984, so returns lat/lon
get_NYSAdd <- function(address,WKID='4326'){
base <- "http://gisservices.dhses.ny.gov/arcgis/rest/services/Locators/Street_and_Address_Composite/GeocodeServer/findAddressCandidates"
soup <- GET(url=base,query=list(SingleLine=address,maxLocations='1',outSR=WKID,f='pjson'))
dat <- fromJSON(content(soup,as='text'),simplifyVector=TRUE)$candidates
return(dat)
}
#looping over a vector of addresses, parsing, and returning a data frame
geo_NYSAdd <- function(addresses,...){
#make empy matrix
l <- length(addresses)
MyDat <- data.frame(matrix(nrow=l,ncol=3))
names(MyDat) <- c("Address","Lon","Lat")
for (i in 1:l){
x <- get_NYSAdd(address=addresses[i],...)
if (length(x) > 0){
MyDat[i,1] <- x[,1]
MyDat[i,2] <- x[,2][1]
MyDat[i,3] <- x[,2][2]
}
}
MyDat$OrigAdd <- addresses
return(MyDat)
}
#################################################################
The first function takes a single address, gets and parses the returning JSON. The second function loops over a list of addresses and returns a data frame with the original addresses, the matched address, and the lat/lon coordinates. I use a loop instead of an apply type function because with the web server you really shouldn’t submit large jobs that it would take along time anyway. The NYS server is free and has no 2,500 limit, but I wouldn’t submit jobs much bigger than that though.
AddList <- c("100 Washington Ave, Albany, NY","100 Washington Ave Ext, Albany, NY",
"421 New Karner Rd., Albany, NY","Washington Ave. and Lark St., Albany, NY","poop")
GeoAddresses <- geo_NYSAdd(addresses=AddList)
GeoAddresses
We can compare these to what the google geocoding api returns (using the ggmap package):
library(ggmap)
googleAddresses <- geocode(AddList,source="google")
GeoAddresses$G_lon <- googleAddresses$lon
GeoAddresses$G_lat <- googleAddresses$lat
GeoAddresses
And we can see that the nonsense "poop" address was actually geocoded! See some similar related funny results from the google maps geocoding via StackMaps.
We can also see some confusion between Washington Ave. Ext as well. The NYS online server should theoretically have more up to date data than Google, but as above shows it is not always better. To do geocoding well takes some serious time to examine the initial addresses and the resulting coordinates in my experience.
To calculate the great circle distance between the coordinates we can use the spDists
function in the sp library.
library(sp)
spDists(x = as.matrix(GeoAddresses[1:4,c("Lon","Lat")]),
y = as.matrix(GeoAddresses[1:4,c("G_lon","G_lat")]),
longlat=TRUE,diagonal=TRUE) #distance in kilometers
But really, we should just project the data and calculate the Euclidean distance (see the proj4 library). Note that using the law of cosines is typically not recommended for very small distances, so the last distance is suspect. (For reference I point to some resources and analysis showing how to calculate great circle distances in SPSS on Nabble recently.)
Scott
/ April 6, 2022This is a great post and made it so easy to geocode with this free service! One update is the url for the NYS geocoder has changed.
Use “https://gisservices.its.ny.gov/arcgis/rest/services/Locators/Street_and_Address_Composite/GeocodeServer/findAddressCandidates” for the base argument.
Cheers