# Geocoding with census data and the Census API

For my online GIS class I have a tutorial on creating an address locator using street centerline data in ArcGIS. Eventually I would like to put all of my class online, but for now I am just sharing that one, as I’ve forwarded it alot recently.

That tutorial used local street centerline data in Dallas that you can download from Dallas’s open data site. It also gives directions on how to use an online ESRI geocoding service — which Dallas has. But what if those are not an option? A student recently wanted to geocode data from San Antonio, and the only street data file they publicly provide lacks the beginning and ending street number.

Once you download the data with the begin and ending street numbers you can follow along with that tutorial the same as the public data.

Previously I’ve written about using the Google geocoding API. If you just have crime data from one jurisdiction, it is simple to make a geocoder for just that locality. But if you have data for many cities (say if you were geocoding home addresses) this can be more difficult. An alternative online API to google that does not have daily limits is the Census Geocoding API.

Here is a simple example in R of calling the census API and geocoding a list of addresses.

``````library(httr)
library(jsonlite)

soup <- GET(url=base,query=list(street=street,city=city,state=state,zip=zip,format='json',benchmark=benchmark))
dat <- fromJSON(content(soup,as='text'), simplifyVector=TRUE)
if (length(D_dat) > 1){
return(c(D_dat['matchedAddress'],D_dat['coordinates'][[1]])) #error will just return null, x[1] is lon, x[2] is lat
}
else {return(c('',NA,NA))}
}

#now create function to loop over data frame and return set of addresses
geo_CensusTIGER <- function(street,city,state,zip,sleep=1,benchmark=4){
#make empy matrix
l <- length(street)
MyDat <- data.frame(matrix(nrow=l,ncol=3))
for (i in 1:l){
if (length(x) > 0){
MyDat[i,1] <- x[1]
MyDat[i,2] <- x[2]
MyDat[i,3] <- x[3]
}
Sys.sleep(sleep)
}
MyDat\$street <- street
MyDat\$city <- city
MyDat\$zip <- zip
MyDat\$state <- state
return(MyDat)
}

## Arbitrary dataframe for an exercise
IdNum = c(1,2,3,4,5),
Address = c("450 W Harwood Rd", "2878 Fake St", "2775 N Collin St", "2775 N Collins St", "Lakewood Blvd and W Shore Dr"),
City = c("Hurst", "Richardson", "Arlington", "Arlington", "Dallas"),
State = c("TX", "TX", "TX", "TX", "TX")
)

If you check out the results, you will see that this API does not appear to do fuzzy matching. 2775 N Collin St failed, whereas 2775 N Collins St was able to return a match. You can also see though it will return an intersection, but in my tests "/" did not work (so in R you can simply use `gsub` to replace different intersection types with `and`). I haven’t experimented with it too much, so let me know if you have any other insight into this API.