All posts in category Crime Analysis

Reducing folium map sizes

Recently for a crimede-coder project I have been building out a custom library to make nice leaflet maps using the python folium library. See the example I have posted on my website. Below is a screenshot:

This map ended up having around 3000 elements in it, and was a total of 8mb. 8mb is not crazy to put on a website, but is at the stage where you can actually notice latency when first rendering the map.

Looking at the rendered html code though it was verbose in a few ways for every element. One is that lat/lon are in crazy precision by default, e.g. [-78.83229390597961, 35.94592660794455]. So a single polygon can have many of those. Six digits of precision for lat/lon is still under 1 meter of precision, which is plenty sufficient for my mapping applications. So you can reduce 8+ characters per lat/lon and not really make a difference to the map (you can technically have invalid polygons doing this, but this is really pedantic and should be fine).

A second part of the rendered folium html map for every object is given a full uuid, e.g. geo_json_a19eff2648beb3d74760dc0ddb58a73d.addTo(feature_group_2e2c6295a3a1c7d4c8d57d001c782482);. This again is not necessary. I end up reducing the 32 length uuids to the first 8 alphanumeric characters.

A final part is that the javascript is not minified – it has quite a bit of extra lines/spaces that are not needed. So here are my notes on using python code to take care of some of those pieces.

To clean up the precision for geometry objects, I do something like this.

import re

# geo is the geopandas dataframe
redg = geo.geometry.set_precision(10**-6).to_json()
# redg still has floats, below regex clips values
rs = r'(\d{2}\.|-\d{2}\.)(\d{6})(\d+)'
re.sub(rs,r'\1\2',redg)

As most of my functions add the geojson objects to the map one at a time (for custom actions/colors), this is sufficient to deal with that step (for markers, can round lat/lon directly). It may make more sense for the set precision to be 10**-5 and then clip the regex. (For these regex’s I am showing there is some risk they will replace something they should not, I think it will be pretty safe though.)

Then to clean up the UUID’s and extra whitespace, what I do is render the final HTML and then use regex’s:

# fol is the folium object
html = fol.get_root()
res = html.script.get_root().render()
# replace UUID with first 8
ru = r'([0-9a-f]{8})[0-9a-f]{4}[0-9a-f]{4}[0-9a-f]{4}[0-9a-f]{12}'
res = re.sub(ru,r'\1',res)
# clean up whitespace
rl = []
for s in res.split('\n'):
    ss = s.strip()
    if len(ss) > 0:
        rl.append(ss)
rlc = '\n'.join(rl)

There is probably a smarter way to do this directly with the folium object for the UUID’s. For whitespace though it would need to be after the HTML is written. You want to be careful with the cleaning up the whitespace step – it is possible you wanted blank lines in say a leaflet popup or tooltip. But for my purposes this is not really necessary.

Doing these two steps in the Durham map reduces the size of the rendered HTML from 8mb to 4mb. So reduced the size of the file by around 4 million characters! The savings will be even higher for maps with more elements.

One last part is my map has redundant svg inserted for the map markers. I may be able to use css to insert the svg, e.g. something like in css .mysvg {background-image: url("vector.svg");}, and then in the python code for the marker svg insert <div class="mysvg"></div>. For dense point maps this will also save quite a few characters. Or you could add in javascript to insert the svg as well (although that would be a bit sluggish I think relative to the css approach, although sluggish after first render if the markers are turned off).

I have not done this yet, as I need to tinker with getting the background svg to look how I want, but could save another 200-300 characters per marker icon. So will save a megabyte in the map for every 3000-5000 markers I am guessing.

The main reason I post webdemo’s on the crimede-coder site is that there a quite a few grifters in the tech space. Not just for data analysis, but for front-end development as well. I post stuff like that so you can go and actually see the work I do and its quality. There are quite a few people now claiming to be “data viz experts” who just embed mediocre Tableau or PowerBI apps. These apps in particular tend to produce very bad maps, so here you can see what I think a good map should look like.

If you want to check out all the interactions in the map, I posted a YouTube video walking through them

Durham hotspot map walkthrough of interactions

Leave a comment

by Andy Wheeler on August 4, 2024 • Permalink

Posted in Crime Analysis, Crime Mapping, data science, Data Visualization, Mapping, Python

Tagged folium, hot spots, leaflet, webmapping

Posted by Andy Wheeler on August 4, 2024

https://andrewpwheeler.com/2024/08/04/reducing-folium-map-sizes/

Using GenAI to describe charts for reports

One of the ideas that has come up with the recent GenAI craze is to use these tools to conduct end-to-end data analysis. So you feed it a dataset + a question and out pops an analysis. Mike Zidar has notes on using a Google tool to do this:

I am not worried about these tools usurping crime analysts though. The reason is that the vast majority of data analysis (crime or business or whatever) is very superficial. The hardest part is not generating the chart, it is knowing what chart to generate, and how it will be used by real people to make real decisions. Often in crime analysis you get the ambiguous “well this will help allocate resources” – when in reality your chart can in no way help dictate any realistic decision a department is going to make.

If you cut out analysts, and just have front line individuals asking google “do crime analysis”, it will be hopelessly superficial, and the front line will just ignore it altogether.

I however do think that GenAI has the ability to become power-tools for super-users. That is, someone who does know what they want to calculate, but uses the computer to help them get that information faster. Not dissimilar to how auto-complete while texting helps you type faster. And here is one use case I have been thinking about – so analysts spend a ton of time automating different products, such as monthly CompStat reports. The reports should have tables/graphs in them, like this chart of gun crimes in DC for example:

Now, most reports you want to also have a plain text summary of what is going on in the chart. Currently when auto-generating reports, it is difficult to mix that plain text in and not make it very superficial using a rule based system. The newest round of many of the GenAI tools lets you upload an image and ask it questions about the image. So this is still very open ended, but has many more guiderails than simply telling the machine “go brr and generate analysis”. You have already decided the chart, you are just asking for a nice description of the chart to fill in your already pre-made metrics.

I did this with ChatGPT, Google’s Gemini, and Claude to see how it does with the gun crime chart, the below Raleigh weekly chart:

And in some cases a more tame monthly time series chart:

Because these models are changing all the time, keep in mind that when you read this the newest models may do even better than what I show here (and see the end section on how you may prompt engineer this to produce better results as well). That said lets check out the results!

ChatGPT

For ChatGPT I used the GPT-4o model, I first just asked about the DC chart “Describe the patterns in this chart”. ChatGPT as it is known, is quite verbose:

I then asked ChatGPT to keep it to two to three sentences, and I think it did very well.

Ok, now to the Raleigh chart with error bars:

So this is OK, it spotted the recent increasing trend. “Frequently exceeded the average of the prior 8 weeks (gray shaded area)” is wrong, there are only 2! But I think the last sentence about notable recent spikes is good.

I then gave it the Durham chart that had the anomalies in early 2019/2020:

And I again think this is quite reasonable. I mean an analyst should probably say “these must be reporting idiosyncrasies”, but I don’t think this is not so bad a description as to be misleading.

All in all very happy with the results for ChatGPT here – these charts are not typical line and bar charts, and ChatGPT interpreted each quite well. At least the description is not so bad that if I did these directly in an automated report it would be embarrassing.

Google

For all of these examples I am using the free tools (that typically have limits that I run out by just these two queries). I did this on 5/22 for Google (which I think is Gemini-1.5, I am not 100% sure). So here is the DC gun crime seasonal chart for Google with the prompt Describe the trends in this chart:

This is very wrong in multiple places. It did not do any better with the Raleigh chart:

There is not strong seasonality here, and it includes some filler “important to note” that is not actually important to note. After this I gave it a more tame monthly crime counts of Robberies in Houston chart to see if Google could redeem itself:

And it flopped pretty hard again. Maybe most charts are increasing, so the model is biased to say increasing (I don’t know).

So again this is Google’s free version, and so the paid may be better (or recent updates may be better). But this isn’t even close to make me want to prompt engineer further.

Claude

I did the tests with Sonnet 3.5 (so around a month after the tests with Google/ChatGPT). I used the shorter 2/3 sentences prompt.

I like this description even slightly better than ChatGPT. How about the Raleigh MV thefts chart:

Similar to ChatGPT it is not quite right but in very subtle ways. It does catch the upward trend. It is wrong in terms of data ends before 2024, and the gray area indicating greater variability is technically true but I would not describe it as noticeable. So again not embarrassingly wrong (like Google), but not quite right either.

How about the anomaly Durham chart:

Very similar to ChatGPT, which I think is again OK.

Prompt Engineering Ideas

So the idea behind prompt engineering is you can ask “Describe this chart” or you can ask “Describe this chart in two to three sentences” and it changes the results (in any of these tools). Subsequently a big part of this is figuring out the prompts that give the most reasonable results. Prompts in these GenAI tools when submitting images have two parts, the image and the text. Do not take me as an expert by any means, but for other analysts here are my guesses as to how to prompt engineer this to maybe return better results.

For the text part, one thing you may do for auto reports is to give examples of the text you want. So for example, you could have a prompt that is “Please fill in the blanks: The chart shows _____ trends over time.”. So provide more guidance as to the structure of how you want to the response to look. Or you could do: “Here is example description 1: …., description 2: ….”. This is how RAG applications work, but with a static report you can give static exemplars, they don’t really need to be dynamically looked up from prior reports.

For the image part of the prompt, in an auto-application you may submit a different image than is actually shown in the report. So for example I may have the X axis for monthly crimes be labeled with the actual months (instead of numbers). Putting all the months in smaller font I bet the GenAI tools will still read it just fine, even if I don’t want it to look like that in the final report.

And I probably shouldn’t include logos (since they are immaterial and just cause extra info that may distract the description), and the text footers. I also think making my legends more descriptive may help guide the tools interpretation. I may remove the title text all together and place the relevant info in the prompt “Here is a chart of Robberies in Houston from …. to … Please describe the chart, including any long term trends, or anomalous spikes (high or low) in any month.” The text prompt may keep the tools on track a bit more with the specific details, but still allow them leeway to interpret the chart without being too rigid.

For the error bar chart, you could insert into the prompt explicit dates they were outside, e.g. “weeks a and b were high, make sure to mention that”. So you could have a mix of explicit anomaly detection, insert those anomalies into the prompt, to just keep the results on track.

It would still be a lot of work to automate a report with such plain text language, but I think it could be a quite reasonable iterative workflow. So you generate the report in a format you can edit, like Word, review it. And then in subsequent reports try to tweak the parameters a wee bit to produce better outputs.

2 Comments

by Andy Wheeler on July 16, 2024 • Permalink

Posted in Crime Analysis, Criminal Justice, data science

Tagged ChatGPT, genai

Posted by Andy Wheeler on July 16, 2024

https://andrewpwheeler.com/2024/07/16/using-genai-to-describe-charts-for-reports/

Wake LP talk on LPRs and javascript hacks in WooCommerce

For some Crime De-Coder news, I will be giving a tech talk on automated license plate readers for the Wake County Libertarian Party on Wednesday July 17th in Raleigh.

See my slides on the CRIME De-Coder website to get a preview of what I will be talking about.

This post will just be a quick one that is hopefully helpful to others. So I use WooCommerce + LuLu to distribute shipping for my print book. For those who have purchased a copy thank you! Many of the paperback purchases will be arriving at your homes very soon. There have been two hiccups with my store website for individuals.

Problem 1 is an error nonce is invalid popping up after trying to add the book to the cart. It is difficult to replicate and is an underlying cache error with WordPress as best I can tell. My advice to fix is go to the individual book page directly to add the book to your cart (paperback, ebook). It seems to mostly happen to me when I am on the main shop page and add directly to cart. If the problem persists for you let me know.

The second problem is that for the print book, to do shipping on LuLu’s end you need to input a phone number. As far as I can tell, most website advice in WooCommerce suggest to pay for an additional plug in to edit this option. I was able to use javascript (and the WPCode free plugin) though to force the phone number to be filled in for free though. Sharing here as I hope it will be helpful to others.

Check out the onload function first, that sets the attribute for either billing-phone or shipping-phone to be required. If you are fast and are looking at the exact right spot of the checkout page, you would be able to see this change from “Phone (Optional)” to “Phone”.

The change_label event listener is to modify the error message when you don’t put in a phone number (by default it confusingly says “Please enter a valid phone number (optional)”. So that part is a bit hacky with attaching the event listener to the entire webpage, but unless you are trying to purchase from a Commodore 64 it should be fine.

<script>
// This script forces the billing/shipping
// phone number to be filled in and not optional
// Andy Wheeler, done via WPCode Extension in Footer
function change_label(){
    var xl = document.getElementById("billing-phone");
    if (xl){
        var ll = xl.nextSibling;
        var nd = ll.nextSibling;
        if (nd) {
            if (nd.getAttribute('role') == 'alert') {
                nd.firstChild.innerText = "Please enter a valid phone number"
            };
        };
    };
    var xs = document.getElementById("shipping-phone");
    if (xs){
        var ls = xs.nextSibling;
        var ns = ls.nextSibling;
        if (ns) {
            if (ns.getAttribute('role') == 'alert') {
                ns.firstChild.innerText = "Please enter a valid phone number"
            };
        };
    };
};

// So click is not working when people
// just use tabs/keyboard select
// not sure how to fix that, but just results in a
// bad red note that says "optional" (but still need
// to fill in
document.addEventListener('click',change_label);

window.onload = function() {
    var x = document.getElementById("billing-phone");
    if (x) {
        var lab = x.nextSibling;
        lab.innerText = "Phone";
        x.setAttribute('aria-label','Phone')
        x.setAttribute('required','')
        // These don't seem to work unfortunately!
        //x.addEventListener("change",change_label);
        //x.setAttribute("onChange","change_label();")
    };
    var x2 = document.getElementById("shipping-phone");
    if (x2) {
        var lab2 = x2.nextSibling;
        lab2.innerText = "Phone";
        x2.setAttribute('aria-label','Phone')
        x2.setAttribute('required','')
    };
};
</script>

Because there is no phone verification, you could technically put in a fake number for these FYI and no one would know. (I have a google voice number I use for instances in which I don’t really want to give out personal.)

Thanks again for those who have purchased a copy – appreciate the support.

Leave a comment

by Andy Wheeler on July 11, 2024 • Permalink

Posted in Crime Analysis, data science, Python

Tagged javascript, woocommerce, wordpress

Posted by Andy Wheeler on July 11, 2024

https://andrewpwheeler.com/2024/07/11/wake-lp-talk-on-lprs-and-javascript-hacks-in-woocommerce/

Some notes on self-publishing a tech book

So my book, Data Science for Crime Analysis with Python, is finally out for purchase on my Crime De-Coder website. Folks anywhere in the world can purchase a paperback or epub copy of the book. You can see this post on Crime De-Coder for a preview of the first two chapters, but I wanted to share some of my notes on self publishing. It was some work, but in retrospect it was worth it. Prior books I have been involved with (Wheeler 2017; Wheeler et al. 2021) plus my peer review experience I knew I did not need help copy-editing, so the notes are mostly about creating the physical book and logistics of selling it.

Academics may wish to go with a publisher for prestige reasons (I get it, I was once a professor as well). But it is quite nice once you have done the legwork to publish it yourself. You have control of pricing, and if you want to make money you can, or have it cheap/free for students.

Here I will detail some of the set up of compiling the book, and then the bit of work to distribute it.

Compiling the documents

So the way I compiled the book is via Quarto. I posted my config notes on how to get the book contents to look how I wanted on GitHub. Quarto is meant to run code at the same time (so works nicely for a learning to code book). But even if I just wanted a more typical science/tech book with text/images/equations, I would personally use Quarto since I am familiar with the set up at this point. (If you do not need to run dynamic code you could do it in Pandoc directly, not sure if there is a way to translate a Quarto yaml config to the equivalent Pandoc code it turns into.)

One thing that I think will interest many individuals – you write in plain text markdown. So my writing looks like:

# Chapter Heading

blah, blah blah

## Subheading

Cool stuff here ....

In a series of text files for each chapter of the book. And then I tell Quarto quarter render, and it turns my writing in those text files into both an Epub and a PDF (and other formats if you cared, such as word or html). You can set up the configuration for the book to be different for the different formats (for example I use different fonts in the PDF vs the epub, nice fonts in one look quite bad in the other). See the _quarto.yml file for the set up, in particular config options that are different for both PDF and Epub.

One thing is that ebooks are hard to format nicely – if I had a book I wanted to redo to be an epub, I would translate it to markdown. There are services online that will translate, they will do a bad job though with scientific texts with many figures (and surely will not help you choose nice fonts). So just learn markdown to translate. Folks who write in one format and save to the other (either Epub/HTML to PDF, or PDF to Epub/HTML) are doing it wrong and the translated format will look very bad. Most advice online is for people who have just books with just text, so science people with figures (and footnotes, citations, hyperlinks, equations, etc.) it is almost all bad advice.

So even for qualitative people, learning how to write in markdown to self-publish is a good skill to learn in my opinion.

Setting up the online store

For awhile I have been confused how SaaS companies offer payment plans. (Many websites just seem to copy from generic node templates.) Looking at the Stripe API it just seems over the top for me to script up all of my own solution to integrate Stripe directly. If I wanted to do a subscription I may need to figure that out, but it ended up being for my Hostinger website I can set up a sub-page that is WordPress (even though the entire website is not), and turn on WooCommerce for that sub-page.

WooCommerce ends up being easy, and you can set up the store to host web-assets to download on demand (so when you purchase it generates a unique URL that obfuscates where the digital asset is saved). No programming involved to set up my webstore, it was all just point and click to set things up one time and not that much work in the end.

I am not sure about setting up any DRM for the epub (so in reality people will purchase epub and share it illegally). I don’t know of a way to prevent this without using Amazon+Kindle to distribute the book. But the print book should be OK. (If there were a way for me to donate a single epub copy to all libraries in the US I would totally do that.)

I originally planned on having it on Amazon, but the low margins on both plus the formatting of their idiosyncratic kindle book format (as far as I can tell, I cannot really choose my fonts) made me decide against doing either the print or ebook on Amazon.

Print on Demand using LuLu

For print on demand, I use LuLu.com. They have a nice feature to integrate with WooCommerce, the only thing I wish shipping was dynamically calculated. (I need to make a flat shipping rate for different areas around the globe the way it is set up now, slightly annoying and will change the profit margins depending on area.)

LuLu is a few more dollars to print than Amazon, but it is worth it for my circumstance I believe. Now if I had a book I expected to get many “random Amazon search buys” I could see wanting it on Amazon. I expect more sales will be via personal advertising (like here on the blog, social media, or other crime analyst events). My Crime De-Coder site (and this blog) will likely be quite high in google searches for some of the keywords fairly quickly, so who knows, maybe just having on personal site is just as many sales.

LuLu does has an option to turn on distribution to other wholesalers (like Barnes & Noble and Amazon) – have not turned that on but maybe I will in the future.

LuLu has a pricing calculator to see how much to print on their website. Paperback and basically the cheapest color option for letter sized paper (which is quite large) is just over $17 for my 310 page book (Amazon was just over $15). For folks if you are less image heavy and more text, you could get away with a smaller size book (and maybe black/white) and I suspect will be much cheaper. LuLu’s printing of this book is higher quality compared to Amazon as well (better printing of the colors and nicer stock for the paperback cover).

Another nice thing about print on demand is I can go in and edit/update the book as I see fit. No need to worry about new versions. Not sure what that exactly means for citing the work (I could always go and change it), you can’t have a static version of record and an easy way to update at the same time.

Other Random Book Stuff

I purchased ISBNs on Bowker, something like 10 ISBNs for $200. (You want a unique ISBN for each type of the book, so you may want three in the end if you have epub/paperback/hardback.) Amazon and LuLu though have options to have them give you an ISBN though, so that may have not been necessary. I set the imprint to be my LLC though in Bowker, so CRIME De-Coder is the publisher.

You don’t technically need an ISBN at all, but it is a simple thing, and there may be ways for me to donate to libraries in the future. (If a University picks it up as a class text, I have been at places you need at least one copy for rent at the Uni library.)

I have not created an index – I may have a go at feeding my book through LLMs and seeing if I can auto-generate a nice index. (I just need a list of key words, after that can just go and find-replace the relevent text in the book to fill in so it auto-compiles an index.) I am not sure that is really necessary though for a how-to book, you should just look at the table of contents to see the individual (fairly small) sections. For epub you can just doing a direct text search, so not sure if people use an index at all in epubs.

Personal Goals

So I debated on releasing the book open source, I do want to try and see if I can make some money though. I don’t have this expectation, but there is potential to get some “data science” spillover, and if that is the case sales could in theory be quite high. (I was surprised in searching the “data science python” market on Amazon, it is definitely not saturated.) Personally I will consider at least 100 sales to be my floor for success. That is if I can sell at least 100 copies, I will consider writing more books. If I can’t sell 100 copies I have a hard time justifying the effort – it would just be too few of people buying the book to have the types of positive spillovers I want.

To make back money relative to the amount of work I put in, I would need more than 1000 sales (which I think is unrealistic). I think 500 sales is about best case, guesstimating the size of the crime analyst community that may be interested plus some additional sales for grad students. 1000 sales it would need to be in the multiple professors using it for a class book over several years. (Which if you are a professor and interested in this for a class let me know, I will give your class a discount.)

Another common way for individuals to make money off of books is not for sales, but to have training’s oriented around the book. I am hoping to do more of that for crime analysts directly in the future, but those opportunities I presume will be correlated with total sales.

I do enjoy writing, but I am busy, so cannot just say “I am going to drop 200 hours writing a book”. I would like to write additional python topics oriented towards crime analysts/criminology grad students like:

GIS analysis in python
Regression
Machine Learning & Optimization
Statistics for Crime Analysis
More advanced project management in python

Having figured out much of this grunt work definitely makes me more motivated, but ultimately in the end need to have a certain level of sales to justify the effort. So please if you like the blog pick up a copy and tell a friend you like my work!

References

Wheeler, A.P. (2024) Data Science for Crime Analysis with Python. CRIME De-Coder. ISBN 979-8-9903770-1-1
Wheeler, A.P. (2017) Geospatial analytics. In SPSS® Statistics for Data Analysis and Visualization. Eds. McCormick, K. & Salcedo J., with contributions by Peck J. & Wheeler, A.P. Wiley. ISBN 978-1119003557
Wheeler, A.P., Herrmann C.R., & Block R.L. (2021). Micro-place homicide patterns in Chicago. Springer. ISBN 978-3-030-61445-4

6 Comments

by Andy Wheeler on July 2, 2024 • Permalink

Posted in Crime Analysis, Criminal Justice, data science, Personal Productivity, Python, writing

Tagged data visualization, ebook

Posted by Andy Wheeler on July 2, 2024

https://andrewpwheeler.com/2024/07/02/some-notes-on-self-publishing-a-tech-book/

Conformal Sets and Setting Recall

I had a friend the other day interested in a hypothesis along the lines of “I think the mix of crime at a location is different”, in particular they think it will be pushed to more lower level property (and fewer violent) based on some local characteristics. I had a few ideas on this – Brantingham (2016) and Lentz (2018) have examples of creating a permutation type test. And I think I could build a regression multinomial type model (similar to Wheeler et al. 2018) to generate a surface of crime category prediction types over a geographic area (e.g. area A has a mix of 50% property and 50% violent, and area B has a mix of 10% violent and 90% property).

Another approach though is pure machine learning and using conformal sets. I have always been confused about them (see my comment on Gelman’s blog) – reading some more about conformal sets though my comments on Andrew Gelman’s post are mostly confused but partly right. In short you can set recall on a particular class using conformal sets, but you cannot set precision (or equivalently set the false positive rate). So here are my notes on that.

For a CJ application of conformal sets, check out Kuchibhotla & Berk (2023). The idea is that you are predicting categorical classes, in the Berk paper it is recidivism classification with three categories {violent,non-violent,no recidivism}. Say we had a prediction for an individual for the three categories as {0.1,0.5,0.4} – you may say that this person has the highest predicted category of non-violent. Conformal sets are different, in that they can return multiple categories based on a decision threshold, e.g. predict {non-violent,no-recidivism} in this example.

My comment on Gelman’s blog is confused, in that I always thought “why not just take the probabilities to get the conformal set”, so if I wanted a conformal set of 90%, the non-violent and no recidivism probabilities add up to 90%, so wouldn’t they count? But that is not what conformal sets give you, conformal sets only make sense in the repeated frequentist sense I do this thing over and over again, what happens. So with conformal sets so you get Prob(Predict > Threshold | True ) = 0.95, or whatever conformal proportion you want. (I like to call this “coverage”, so out of those True outcomes, what threshold will cover 95% of them.)

This blog post with the code examples really helped my understanding, and how conformal sets can be applied to individual categories (which I think makes more sense than the return multiple labels scenario).

I have code to replicate on github, using data from the NIJ recidivism competition (Circo & Wheeler, 2022) as an example. See some of my prior posts for the feature engineering example, but I use the out of bag trick for random forests in lieu of having a separate calibration sample.

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

# NIJ Recidivism data with some feature engineering
pdata = pd.read_csv('NIJRecid.csv') # NIJ recidivism data

# Train/test split and fit model
train = pdata[pdata['Training_Sample'] == 1]
test = pdata[pdata['Training_Sample'] == 0]

yvar = 'Recidivism_Arrest_Year1'
xvar = list(pdata)[2:]

# Random forest, need to set OOB to true
# for conformal (otherwise need to use a seperate calibration sample)
rf = RandomForestClassifier(max_depth=5,min_samples_leaf=100,random_state=10,n_estimators=1000,oob_score=True)
rf.fit(train[xvar],train[yvar])

# Out of bag predictions
probs = rf.oob_decision_function_

Now I have intentionally made this as simple as possible (the towards data science post has a small sample quantile correction, plus has a habit of going back and forth between P(y=1) and 1 - P(y=1)). But to get a conformal threshold in this scenario to set the recall at 95% is quite simple:

# conditional predictions for actual 1's
p1 = probs[train[yvar]==1,1]

# recall 95% coverage
k = 95
cover95 = np.percentile(p1,100-k)
print(f'Threshold to have conformal set of {k}% for capturing recidivism')
print(f'{cover95:,.3f}')

# Now can check out of sample
ptest = rf.predict_proba(test[xvar])
out_cover = (ptest[test[yvar]==1,1] > cover95).mean()
print(f'\nOut of sample coverage at {k}%')
print(f'{out_cover:,.3f}')

And this results in for this data:

So this is again recall, or I like to call it the capture rate of the true positives. It is true_positives / (true_positives + false_negatives). This threshold value is estimated purely based on the calibration sample (or here the OOB estimates). The model I will show is not very good, but the conformal sets you still get good performance. So this is quite helpful, having a good estimator (based on exchangeability, so no drift over time). I think in practice though that will not be bad (I by default auto-retrain models I put into production on a regular schedule, e.g. retrain once a month), so I don’t bother monitoring drift.

You can technically do this for each class, so you can have a recall set for the true negatives as well:

# can also set the false negative rate in much the same way
p0 = probs[train[yvar]==0,0]

# false negative rate set to 5%
k = 95
cover95 = np.percentile(p0,100-k)
print(f'Threshold (for 0 class) to have conformal set of {k}% for low risk')
print(f'{cover95:,.3f}')

# Now can check out of sample
out_cover = (ptest[test[yvar]==0,0] > cover95).mean()
print(f'\nOut of sample coverage at {k}%')
print(f'{out_cover:,.3f}')

Note that the threshold here is for P(y=0),

Threshold (for 0 class) to have conformal set of 95% for low risk
0.566

Out of sample coverage at 95%
0.953

Going back to the return multiple labels idea, in this example for predictions (for the positive class) we would have this breakdown (where 1 – 0.566 = 0.434):

P < 0.19 = {no recidivism}
0.19 < P < 0.434 = {no recidivism,recidivism}
0.434 < P = {recidivism}

Which I don’t think is helpful offhand, but it would not be crazy for someone to want to set the recall (for either class on its own) as a requirement in practice. So say we have a model that predicts some very high risk event (say whether to open an investigation into potential domestic terrorist activity). We may want the recall for the positive class to be very high, so even if it is a lot of nothing burgers, we have some FBI agent at least give some investigation into the predicted individuals.

For the opposite scenario, say we are doing release on recognizance for pre-trial in lieu of bail. We want to say, of those who would not go onto recidivate, we only want to “falsely hold pretrial” 5%, so this is a 95% conformal set of True Negative/(True Negative + False Positive) = 0.95. This is what you get in the second example above for no-recidivism.

Note that this is not the false positive rate, which is False Positive/(True Positive + False Positive), which as far as I can tell you cannot determine via conformal sets. If I draw my contingency table (use fp as false positive, tn as true negative, etc.) Conformal sets condition on the columns, whereas the false positive rate conditions on the second row.

          True
          0     1 
       -----------
Pred 0 | tn  | fn |
       ------------
     1 | fp  | tp |
       ------------

So what if you want to set the false positive rate? In batch I know how to set the false positive rate, but this random forest model happens to not be very well calibrated:

# This models calibration is not very good, it is overfit
dfp = pd.DataFrame(probs,columns=['Pred0','Pred1'],index=train.index)
dfp['y'] = train[yvar]
dfp['bins'] = pd.qcut(dfp['Pred1'],10)
dfp.groupby('bins')[['y','Pred1']].sum()

So I go for a more reliable logistic model, which does result in more calibrated predictions in this example:

# So lets do a logit model to try to set the false positive rate
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_curve

# Making a second calibration set
train1, cal1 = train_test_split(train,train_size=10000)
logitm = LogisticRegression(random_state=10,penalty=None,max_iter=100000)
logitm.fit(train1[xvar],train1[yvar])
probsl = logitm.predict_proba(cal1[xvar])

# Can see here that the calibration is much better
dflp = pd.DataFrame(probsl,columns=['Pred0','Pred1'],index=cal1.index)
dflp['y'] = cal1[yvar]
dflp['bins'] = pd.qcut(dflp['Pred1'],10)
dflp.groupby('bins')[['y','Pred1']].sum()

Now the batch way to set the false positive rate, given you have a well calibrated model is as follows. Sort your batch according the predicted probability of the positive class in descending value. Pretend we have a simple set of four cases:

Prob
 0.9
 0.8
 0.5
 0.1

Now if we set the threshold to be 0.6, we would then have {0.9,0.8} as our two predictions, we then estimate that the false positive rate would be (0.1 + 0.2)/2 = 0.3/2, If we set the threshold to be 0.4, we would have a false positive rate estimate of (0.1 + 0.2 + 0.5)/3 = 0.8/3. So this relies on a batch of characteristics that we are predicting, and is not determined beforehand (this is the idea I use in this post on prioritizing audits):

# The batch way to set the false positive rate
ptestl = logitm.predict_proba(test[xvar])
dftp = pd.DataFrame(ptestl,columns=['Pred0','Pred1'],index=test.index)
dftp['y'] = test[yvar]

dftp.sort_values(by='Pred1',ascending=False,inplace=True)
dftp['PredictedFP'] = (1 - dftp['Pred1']).cumsum()
dftp['AcutalFP'] = (dftp['y'] == 0).cumsum()
dftp['CumN'] = np.arange(dftp.shape[0]) + 1
dftp['PredRate'] = dftp['PredictedFP']/dftp['CumN']
dftp['ActualRate'] = dftp['AcutalFP']/dftp['CumN']
dftp.iloc[range(1000,7001,1000)]

What happens if we try to estimate where to set the threshold in the training/calibration data though? NOTE: I have a new blog post showing how to construct a more appropriate estimate of the false positive rate ENDNOTE. In practice, we often need to make decisions one at a time, in the parole case it is not like we hold all parolees in the queue for a month to save and batch process them. So lets use our precision in the calibration sample to get a threshold:

# Using precision to set the threshold (based on calibration set)
fp_set = 0.45
pr_data = precision_recall_curve(cal1[yvar], probsl[:,1])
loc = np.arange(pr_data[0].shape[0])[pr_data[0] > fp_set].min()
thresh_fp = pr_data[2][loc]

print(f'Threshold estimate for FP rate at {fp_set}')
print(f'{thresh_fp:,.3f}')

print(f'\nActual FP rate in test set at threshold {thresh_fp:,.3f}')
test_fprate = 1 - test[yvar][ptest[:,1] > thresh_fp].mean()
print(f'{test_fprate:,.3f}') # this is not a very good estimate!

Which gives us very poor out of sample estimates – we set the false positive rate to 45%, but ends up being 55%:

Threshold estimate for FP rate at 0.45
0.333

Actual FP rate in test set at threshold 0.333
0.549

So I am not sure what the takeaway from that is, whether we need to be doing something else to estimate the false positive rate (like an online learning approach that Chohlas-Wood et al. (2021) discuss). A takeaway though from the NIJ competition I have is that false positives tend to be a noisy measure (and FP for fairness between groups just exacerbates the problem), so maybe we just shouldn’t be worried about false positives at all. In many CJ scenarios, we do not get any on-policy feedback on false positives – think the bail case where you have ROR vs held pre-trial, you don’t observe false positives in that scenario in practice.

Conformal sets though, if you want recall for particular classes are the way to go though. You can also do them for subsets of data, e.g. different conformal thresholds for male/female, minority/white. So have an easy way to accomplish a fairness ideal with post processing. And I may do a machine learning approach to help that friend out with the crime mix in places idea as well (Wheeler & Steenbeek, 2021).

References

Brantingham, P. J. (2016). Crime diversity. Criminology, 54(4), 553-586.
Chohlas-Wood, A., Coots, M., Zhu, H., Brunskill, E., & Goel, S. (2021). Learning to be fair: A consequentialist approach to equitable decision-making. arXiv preprint arXiv:2109.08792.
Circo, G. M., & Wheeler, A. P. (2022). An Open Source Replication of a Winning Recidivism Prediction Model. International Journal of Offender Therapy and Comparative Criminology, Online First
Kuchibhotla, A.K., & Berk, R.A. (2023). Nested conformal prediction sets for classification with applications to probation data. The Annals of Applied Statistics, 17(1), 761-785.
Lentz, T.S. (2018). Crime diversity: reexamining crime richness across spatial scales. Journal of Contemporary Criminal Justice, 34(3), 312-335.
Wheeler, A.P., & Steenbeek, W. (2021). Mapping the risk terrain for crime using machine learning. Journal of Quantitative Criminology, 37, 445-480
Wheeler, A.P., Steenbeek, W., & Andresen, M. A. (2018). Testing for similarity in area‐based spatial patterns: Alternative methods to Andresen’s spatial point pattern test. Transactions in GIS, 22(3), 760-774.

1 Comment

by Andy Wheeler on June 7, 2024 • Permalink

Posted in Crime Analysis, data science, Python

Tagged conformal-prediction, machine-learning

Posted by Andy Wheeler on June 7, 2024

https://andrewpwheeler.com/2024/06/07/conformal-sets-and-setting-recall/

Notes on MMc queues

Recently had a project related to queues at work, so wanted to put some of my notes in a blog post. For a bit of up-front, the notation MMc refers to a queuing system with multiple servers (c), and the inputs are Poisson distributed (the first M), and have exponential service rates M (these Ms can be different though). That is a mouthful, but basically saying events that arrive in independently and have a left skewed distribution of times it takes to resolve those events. (That may seem like a lot of assumptions, they are often reasonable though for many systems, and if not deviations may not be that big of deal to the estimates in practice.)

Main reason for blog post is that the vast majority of stuff online is about MM1 queue systems, so systems that only have 1 server. I basically never deal with this situation. The formulas for multiple servers are much more complicated, so took me a bit to gather code examples and verify correctness. These are notes based on that work.

So for up-front, the group I was dealing with at work had a fundamental problem, their throughput was waaay too small. In this notation, we have:

Number of arrivals per time period, N
Mean time it takes to exit the queue, S
Number of servers, c

So first, you need to have N*S < c! This is simple accounting, so say we are talking about police calls for service, you have on average 5 calls per hour, and they take on average 0.5 hours (30 minutes) to handle. You then need more than 5*0.5 = 2.5 officers to handle this, so a minimum of 3 officers. If you don’t have 3 officers, the queue will grow, you won’t be able to handle all of the calls.

At work I was advising a situation where they were chronically too low of staff serving for a particular project, and it has ballooned over an extended period of time to create an unacceptable backlog. So think S is really tiny and N is very large – at first the too small of servers could cycle through the tickets, but the backlog just slowly grew, and then after months, they had unacceptable wait times. This is a total mess – there is no accounting trick to solve this, you need c > N*S. It makes no sense to talk about anything else like average wait time in the queue unless that condition is met.

OK, so we know you need c > N*S, a common rule of thumb is that capacity should not be over 80%, so that is c > (N*S)/0.8. (This is not for policing, but more common for call centers, see also posts on Erlang-C formulas.) The idea behind 80% it is at the point where wait times (being held in the queue) start to grow.

If you want to get more into the nitty gritty though, such as calculating the actual probability in the queue, average wait time, etc., then you will want to dig into the MMc queue lit. Here I have posted some python notes (that is itself derivative work others have posted). Hoping just posting and giving my thumbs up makes it easier for others.

So first here is an example of using those functions to estimate our queue example above. Note you need to give the inverse of the mean service time for this function.

# queuing functions in python
from queue import MMc, nc

N = 6    # 6 calls per hour
S = 0.5  # calls take 30 minutes to resolve
c = 7    # officers taking calls

# This function expects inverse service average
qS = MMc(N,1/S,c)

# Now can get stats of interest

# This is the probability that when a call comes
# in, it needs to wait for an officer
qS.getQueueProb()

And this prints out 0.0376.... So when a call comes in, we have a 3% probability of having to wait in the queue for an officer to respond. How about how long on average a call will wait in the queue?

# This is how long a call on average needs
# to wait in the queue in minutes
qS.getAvgQueueTime()*60

And this gives 0.28.... The multiplication by 60 goes from hours to minutes, and we are waiting less than 1 minute on average. This seems good, but somewhat counter-intuitively, this is an average of a bunch of calls answered immediately, plus the 3.8% of calls that are held for some time. We can calculate the estimate of if a call is held, how long will it be held on average:

# If a call is queued however, how long to wait?
qS.getAvgQueueTime_Given()*60

And this is a less rosy 17.5 minutes! Queues are tricky. Unless you have a lot of extra capacity, there are going to be wait times. We can also calculate how often all officers will be idle in this setup.

# Idle time, no one taking any calls
qS.getIdleProb()

And this gives rounded 0.05, so we have only 5% idle time in this set up. This is not that helpful though for police planning, you want individual officers to have capacity to do proactive work, that is more you want officers to only spend 40-60% on responding to calls for service, so that suggests c > (N*S)/0.5 is where you want to be. Which is where we are at in this scenario with 7 officers. This is the probability all 7 officers will be idle at once, which does not really matter.

Now you can technically just run this through multiple values of c to get this, but Rosetti (2021) has listed an approximate square root staffing formula that given an input probability wait in the queue, how many servers do you need. So here is that function:

# If you want probability of holding in the queue to only be 3%
est_serv = nc(N,S,0.03)
print(est_serv)

Which prints out 6.387..., so since you need to take the ceiling of this, you will need to 7 officers to keep to that probability (agreeing with the MMc object above).

In terms of values, the nc function will work with very large/small N and S inputs just fine. The MMc function also looks fine, minus one submethod uses a factorial, .getPk (so cannot have very large inputs to that method), but the rest is OK. So if you wanted to do nc(very_big,very_small,0.1) that is fine and should be no floating point issues.

The nc function relies on scipy, but the MMc class is all base python (just the math library). So the MMc functions can really be embedded in any particular python application you want with no real problem.

Rough Estimates for Spatial Police Planning

I have prior work on spatial allocation of patrol units with workload equality constraints (Wheeler, 2018). But generally, you need to first estimate how many units you will have, and after that you can worry about optimally distributing them. The reason for this is that the number of units is much more important, too few and you will have more queuing, in which case the spatial arrangement does not matter at all. Larson & Stevenson (1972) estimate optimal spatial allocation only beats random allocation by 25%.

So for police response times you can think about time waiting in queue, time spent driving to the event, and time spent resolving the event (time to dispatch tends to be quite trivial, but is sometimes included in the wait in the queue part, Verlaan & Ruiter, 2023).

There is somewhat of a relationship between the above “service” time, fewer people have to drive farther, and so service time goes up. But there happens to some simple rules of thumb, if you have N patrol units, you can calculate (2/3)*sqrt(Square Miles)/sqrt(N) = average distance traveled in miles for your jurisdiction (Stenzel, 1993, see page 135 in the PDF). Then you can translate that miles driven to time, by say taking an average of 45 miles per hour. Given a fixed N, you can then just add this into the service time estimate for your given jurisdiction to get a rough estimate of more officers will reduce response times by X amount.

It ends up being though this tends to be trivial relation to the waiting in the queue time (or the typical it takes 30 minutes to resolve police incidents on average). So it is often more important to get rough estimates for that if you want to reduce wait times for calls for service. And this does not even take into account priority levels in calls, but to start simpler folks should figure out a minimum to handle the call stack (whether in policing or in other areas) and then go onto more complicated scenarios.

References

Larson, R.C., & Stevenson, K.A. (1972). On insensitivities in urban redistricting and facility location. Operations Research, 20(3), 595-612.
Rossetti, M.D. (2021). Simulation Modeling and Arena, 3rd and Open Text Edition.
Stenzel, W. (1993). Police Personnel Allocation Manual Users Guide. US Dept of Transportation National Highway Traffic Safety.
Verlaan, T., & Ruiter, S. (2023). Predictors of police response time: a scoping review. Crime Science, 12(1), 19.
Wheeler, A. P. (2018). Creating optimal patrol areas using the p-median model. Policing: An International Journal, 42(3), 318-333.

Leave a comment

by Andy Wheeler on May 29, 2024 • Permalink

Posted in Crime Analysis, Criminal Justice, data science, Python

Tagged queue

Posted by Andy Wheeler on May 29, 2024

https://andrewpwheeler.com/2024/05/29/notes-on-mmc-queues/

New preprint and Monitoring Time Between Events

Will be a long post today, have some updates on a preprint, quotes on Flock cameras, an upcoming webinar, plus some R analysis examples of monitoring time between rare crime events.

Pre-print on JTC and examining the Buffer Zone

For a few updates on other projects, I have a pre-print out with Kim Rossmo, The Journey-to-Crime Buffer Zone: Measurement Issues and Methodological Challenges.

Two parts to this paper. Part 1, to test whether a journey to crime (JTC) distribution conforms to a buffer zone (an area with lower, but non-zero, probability of offending nearby their home for predatory crimes against strangers), it only makes sense to look at an individual offenders JTC. This is because mixtures of multiple offenders can each individually have a buffer, but in the aggregate do not (in particular if offenders have varying travel distances). This is the same point in Van Koppen & De Keijser (1997), and the fact that offenders have different travel distance distributions is pretty well established now (Andresen et al., 2014, Drawve et al., 2015; Townsley & Sidebottom, 2010).

The second part is given we need to examine individual offenders, I worked out estimates of how many observations you need to effectively measure whether a buffer zone exists. I estimate you need around 50 observations when using a gamma distribution to measure the existence of a buffer vs monotonically decreasing. Above graphs shows a kernel density estimator that takes into account to not smear the probability below 0 distance, using a transform trick to calculate the KDE on the log scale and then back transform. Both case studies we look at suggest a more peaked distribution for the buffer than gamma probably makes more sense for those samples, but pretty strong evidence the buffer exists. The code to replicate the methods and papers findings is on Github.

If you are a department and you have a good case study of a prolific offender get in touch, would be happy to add more case studies to the paper. Part of the difficulty is having high fidelity measures, offenders tend to move a lot (Wheeler, 2012), and so it is typically necessary to have an analyst really make sure the home (or nearest anchor node) locations are all correct. In addition to most prolific offenders don’t have that many observations.

Flock Story

For a second update, I had a minor quote in Tyler Duke’s story on Flock Cameras in North Carolina, Camera by camera, North Carolina police build growing network to track vehicles. I think license plate readers are good investments for PDs (see Ozer 2016 for a good example, I actually like the mobile ones in vehicles more than fixed ones, see Wheeler & Phillips, 2018 for a case study in Buffalo using them at low friction road blocks). But I do think more regulation to prevent people doing indiscriminate searches is in order (similar to how doing background checks in most states have state rules).

I get more annoyed by Flock’s advertising that suggests they solve 10% of crime nationwide, which is absurd. It is very poorly done design (Snow & Charpentier, 2024) that does a regression of clearance rates regressed on cameras per officer, and suggests the cameras increase clearance rates by 10%. There is multiple things wrong with this – interpreting regression coefficients incorrectly (an increase in 1 in camera per-officer is quite a few cameras and does not in translate to they increase clearances 10% in toto), confounding in the design (smaller agencies with higher clearance by default will have more cameras per officer), not taking into account weights in the modeling or interpretation (e.g. a 20% increase in a small department and a 0% increase in a large department should not average to an overall 10% increase). Probably the worse part about this though is extrapolating from they have cameras in a few hundred departments to saying they help solve 10% of crime nationwide.

It is extra silly because it does not even matter – it makes close to no material difference to the quality of Flock’s products (which look to me high quality, they certainly don’t need to increase clearance rates by 10% to be worth investing in ALPRs for an agency, ALPRs are so cheap if a single camera helps with say 10-20 arrests they are worth it). If anyone from Flock is listening and wants to fund a real high quality study just let me know and ask, but this work they have put out is ridiculous.

Tyle Duke’s (and the Newsobserver in general) I think do a really good job on various data stories. So highly recommend checking out that and their other work.

Webinar on Monitoring Temporal Crime Trends

I am doing a webinar for the Carolina Crime Analysis Association on Monitoring Temporal Crime Trends at the end of the month on May 31st.

Free for CCAA members and $10 for non-members. I will be going over work I have written in various places before, such as the Poisson Z-score for CompStat reports (Wheeler, 2016). I did a recent blog post on the Crime De-Coder site on using the Poisson distribution to flag outliers for rare crime events, e.g. if you have 0.8 robberies per month, is a month with 3 robberies weird?

For other regional crime analysis groups, if you have requests like that feel free. I am thinking I want to spend more time with regional groups than worrying about the bigger IACA group going forward.

And this Poisson example segways into the final section of the blog post.

Monitoring Time in Between Events

So the above of using the Poisson distribution to say is 3 robberies in a month weird, you have to think about the nature of how a crime analyst will use and act on that information. So in that scenario, that may be an analyst has a monthly CompStat report, and it is useful to say ‘yeah 3 is high, but is consistent with chance variation that is not uncommon’. In this scenario though, if you have counts that are high, it is not the best (although better than nothing) to wait until the end of the month CompStat report.

Another common case though is the analyst is regularly reading reports, and they come in and read a new robbery, and right then and there say “I feel like there are more robberies than usual”. How would they tell then if there are more than you would expect? It does not make sense to wait for the end of the month (you technically can back-calculate in the prior month and use a scan statistic, but I think what I will suggest below is a more diagnostic approach).

Here I will outline an approach by examining the time in between events, which is motivated by a comment Rob Fornango mentioned on LinkedIn.

So there is a duality between the Poisson distribution and the exponential distribution – if you have a mean of 0.8 events per month, the inter-arrival times are exponentially distributed with a mean of 1/0.8. The typical motivation for a Poisson distribution is the inter-arrival times are independent, so you can technically just work with the inter-arrival times directly.

Here is a quick simulation in R to show that you can simulate inter-arrival times, and then turn them into counts per unit time. The counts per unit time will then be Poisson distributed. Note that R you give the lambda term directly in the R parameterization, whereas others (like in scipy) you specify 1/lambda. I know it is not documented well, but I leave as an exercise to the reader who cares enough to figure out what I am doing here and why.

set.seed(10)
pmean <- 0.8
n <- 50000

re <- rexp(n,pmean) # simulating exponential
rec <- cumsum(re)   # translating to times
frec <- floor(rec)  # will aggregate to counts per 1 unit

# factor is to include units with 0 counts
recV <- 0:max(frec)
frec <- factor(frec,levels=recV)
re_tab <- as.data.frame(table(frec))
re_tab$frec <- recV
re_tab$Freq <- factor(re_tab$Freq,levels=0:max(re_tab$Freq))

# Two tables are to aggregate to units, and then get a count
# of counts per unit
count_tab <- as.data.frame(table(re_tab$Freq))
names(count_tab) <- c("Count","ExpSim")
count_tab$Count <- as.numeric(levels(count_tab$Count))
count_tab$PoisExp <- round(dpois(count_tab$Count,pmean)*length(recV))

And this prints out a table that shows very close correspondence between the two.

> print(count_tab)
  Count ExpSim PoisExp
      0  28538   28411
      1  22959   22729
      2   8813    9092
      3   2356    2424
      4    482     485
      5     75      78
      6      5      10
      7      2       1

Ok, with that established, how do we take into account the time in between events, and use that to flag if recent events are occurring too close to each other? Going with my suggestion of using 1/100 or 1/1000 probability to flag an outlier, for a single “time between two events”, you can look at the quantiles of the exponential distribution. So for the 1/100 threshold:

qexp(0.01,0.8)

Gives 0.01256292. Note this is in months, so if we say a month is 30 days, we could then say it is 0.01256292*30, which is 0.38 days, or just over 9 hours. So this is saying basically if you had two robberies on the same shift, given a mean of 0.8 per month in your jurisdiction, that may be worth looking into if they are the same offender. Not terribly helpful as that would be something most analysts would spot without the help of analytics.

But say you had an event with a rate of 0.1 per month (so on average just over one per year). Two events in three days then would be cause for alarm, qexp(0.01,0.1)*30 is just over 3 days.

So that is examining two recent events, you could extend this to several recent events nearby in time (what I think is likely to be more useful for crime analysts). So say you had a crime on Monday, Wednesday, and then Saturday. So two times in between of 2 days and 3 days. I would say the probability of this occurring is:

prod(pexp(c(2,3),0.8/30))

Which R gives as 0.003993034, so around 4 in 1,000. This is the probability of the 2 days multiplied by the probability of 3 days. We can make a graph of the string of three events (so two times in-between) that meet our less than 0.01 chance.

library(ggplot2)

theme_cdc <- function(){
  theme_bw() %+replace% theme(
     text = element_text(size = 16),
     panel.grid.major= element_line(linetype = "longdash"),
     panel.grid.minor= element_blank()
) }

days <- 1:20
df <- expand.grid(x=days,y=days)
df$p <- pexp(df$x,pmean/30)*pexp(df$y,pmean/30)
df <- df[df$p < 0.01,]
p <- ggplot(df,aes(x=x,y=y)) + 
     geom_point(pch=21,fill='grey',size=7.5) +
     labs(x=NULL,y=NULL,title='Nearby Days < 0.01') +
     scale_y_continuous(breaks=days,limits=c(1,max(days))) +
     scale_x_continuous(breaks=days,limits=c(1,max(days))) +
     theme_cdc()

p

So this gives a chart that meets the criteria for days between in 3 nearby events for the 0.8 per month scenario. So if you have times between of 3 and 4 it meets this threshold, as well as 2 and 8, etc. This data for 1+ days it pretty much never gets to the 1/1000 threshold.

You can technically extend this to multiple crimes. We are in the cusum chart territory then. The idea behind cusum charts is if you have an expected value of say 10, in a typical process control chart if you had a bunch of values 12,14,11,13,12, they may not individually alarm. But you can see that the process is consistently above the expected value, which for random data it should fluctuate sometimes below 10 and sometimes above 10. The consistent above the expected value is itself a signal that will alarm in the cusum approach.

I debate on doing more cusum type process control charts with crime data, but they are abit of work to reset (they will always alarm eventually, and then you reset the cumulative statistics and start the process over again) – but in this scenario the reset is not too difficult.

So the approach would be something like:

probs <- pexp(days_between,mean_per_unit)
snorm <- qnorm(probs)
cumvals <- cumsum(snorm)

The cusum approach works like this here. So only start counting if the days between are less than qexp(0.5,pmean), which here is about 26 days. If you have any time in between more than 26 days, you reset the cusum chart. But if you have several events with times less than 26 days, you do the above calculations, and if the cumulative sum gets lower than -4 (so multiple events nearby less than 26 days apart), you alarm. So for example, for our mean of 0.8, if you had a string of 7,6,12,9,10 days in between for crimes:

pmean <- 0.8
days_between <- c(7,6,12,9,10)
probs <- pexp(days_between,pmean/30)
snorm <- qnorm(probs)
cumvals <- cumsum(snorm)

That would alarm on the final crime, even though those are 6 crimes spread apart 44 days.

This is because snorm will have a standard normal distribution, and so the typical alarm rate for cusum charts with mean zero and standard deviation of 1 is +/- 4. You can technically use it for events too far apart as well here, although I don’t know of situations where people would care too much about that (either in crime or other monitoring situations).

This is all more complicated than 5+ in a month example, partly why I haven’t used cusum charts (or days in between) in other examples. But hopefully someone finds that useful to monitor rare events, and not wait for their end of month stats to alert them!

References

Andresen, M. A., Frank, R., & Felson, M. (2014). Age and the distance to crime. Criminology & Criminal Justice, 14, 314-333.
Drawve, G., Walker, J. T., & Felson, M. (2015). Juvenile offenders: An examination of distance-to-crime and crime clusters. Cartography and Geographic Information Science, 42, 122-133.
Ozer, M. (2016). Automatic licence plate reader (ALPR) technology: Is ALPR a smart choice in policing?. The Police Journal, 89(2), 117-132.
Rossmo, D. K., & Wheeler, A. (2024). The Journey-to-Crime Buffer Zone: Measurement Issues and Methodological Challenges. CrimRxiv
Snow, A., & Charpentier, C. (2024) Flock Safety Technologies in Law Enforcement: An Initial Evaluation of Effectiveness in Aiding Police in Real-World Crime Clearance
Townsley, M., & Sidebottom, A. (2010). All offenders are equal, but some are more equal than others: Variation in journeys to crime between offenders. Criminology, 48, 897-917.
Wheeler, A. (2012). The moving home effect: A quasi experiment assessing effect of home location on the offence location. Journal of Quantitative Criminology, 28, 587-606.
Wheeler, A. P. (2016). Tables and graphs for monitoring temporal crime trends: Translating theory into practical crime analysis advice. International Journal of Police Science & Management, 18(3), 159-172.
Wheeler, A. P., & Phillips, S. W. (2018). A quasi-experimental evaluation using roadblocks and automatic license plate readers to reduce crime in Buffalo, NY. Security Journal, 31, 190-207.
Van Koppen, P. J., & De Keijser, J. W. (1997). Desisting distance decay: On the aggregation of individual crime trips. Criminology, 35(3), 505-515.

Leave a comment

by Andy Wheeler on May 12, 2024 • Permalink

Posted in Crime Analysis, Papers, R

Tagged outlier, preprint, temporal

Posted by Andy Wheeler on May 12, 2024

https://andrewpwheeler.com/2024/05/12/new-preprint-and-monitoring-time-between-events/

Power for analyzing Likert items

First for some other updates of interest to folks on the blog. On CRIME De-Coder a blog post, You should be geocoding crime data locally. I give python code to create a local geocoding engine using the arcpy library.

This is a bit more techy, so would typically post this on this blog instead of the CRIME De-Coder one. But, currently the web is sorely lacking in good advice for local geocoding solutions. Vast majority of sites discuss online geocoding APIs (e.g. google or the census geocoder), which I guess are common for web-apps, but they do not make sense for crime analysis. For the few webpages that are actually relevant to describe local solutions (that do not involve calling an online web API), all the exmples use PostGIS that I am aware of. PostGIS is both very difficult to setup and has worse results compared to ESRI. So I know ESRI is a paid for service, but they have reasonable academic and small business pricing (and most police departments already have access), so to me this is a reasonable use case. If you need to geocode 100k cases, the license fee for ESRI is worth it at that point relative to using the web engines.

Definitely do not spend thousands of dollars if you need to batch geocode a few million records. That is something that is worth getting in touch with me about. And so hopefully that gets picked up by search engines and drives a bit more traffic to my consulting website.

A second example I posted some python code to help construct network experiments. So the idea here is you want to spread out the treated nodes so you have a specific allocation of treated, connected to treated (what I call spillover here), and those not connected to treated (the leftover control group). This python code constructs linear programs to accomplish certain treated/not-touched proportions. So this graph shows if you choose to treat 1 person, but have constraints on 1,2,3 leftover.

And then you can apply this to bigger networks, here the network is 311 nodes, and 90 are treated and I want a total of 150 not treated.

Idea derivative from some work Bruce Desmarais discussed on Twitter, but also have thought about this in some discussion with Barak Ariel in focused deterrence style interventions. So hopefully that comes in handy.

My linear programming formulation is not as svelte as the optimal treatment assignment with spillovers formulation, it is 3*N + 2*E decision variables and 5*N + 2*E constraints (where N is the number of nodes and E is the number of un-directed edges). I have a feeling my formulation is redundant, so if I write my constraints smarter can be more like 2N decision variables and 2N + E constraints.

But for my examples I show it solves quite fast as is (and maybe solvers get rid of the cruft in pre-solve), so not worried about that at the moment. Don’t know the typical size networks people use, but I suspect it will work just fine and dandy on typical machines for networks even as large as 10k. (Generally if I keep the model to under 100k decision variables it is only a few minutes to solve the types of problems I show on this blog.)

Power with Likert items

The other day for a grant application we needed to conduct power analysis. Our design was t-test of mean differences for a simple treated/control group randomized experiment with the outcome being a Likert score survey response. (I know, most of the time people create latent scores with Likert items, analyzing the individual items is fine IMO and simpler to specify for a pre-registration analysis.) I am sure others have needed to do similar things, but I could not find code online to help out with this. So I scripted up a simulation in R to do this, and figured sharing would be useful.

So the rub with Likert data, and why you can’t use typical power calculations, is that they have ceiling effects. If most people answer on average 4.5 out of the your scale up to 5, it is difficult to go much higher. Here I simulate data that has that skew (so ceiling effects come into play), and then go through the motions of doing the t-test. So first for some setup, I have a function that rounds and clips data to limited sets of integers, plus some plotting functions.

# power analysis of Likert data
library(ggplot2)

# custom theme
theme_cdc <- function(){
  theme_bw() %+replace% theme(
     text = element_text(size = 16),
     panel.grid.major= element_line(linetype = "longdash"),
     panel.grid.minor= element_blank()
) }

set.seed(10) # setting the random seed

# rounding/clipping data to integers 
clipint <- function(x,min=1,max=5){
    rx <- round(x)
    rx <- ifelse(rx < min,min,rx)
    rx <- ifelse(rx > max,max,rx)
    return(rx)
}

This following function generates the p-values and standard errors, what I will use later in my simulation. Here I use a t-test of mean differences, but it would be fairly easy to say swap out with an ordinal logistic regression if you prefer that. Probably the bigger deal is the simulation generates data using a normal distribution, and then post rounds/clip the data. There is probably a smarter way to generate the data according to the logistic model and ordinal intercepts (Frank Harrell discusses such things a bit on his blog). But this at least takes into account that the data will be skewed, even in the control group, to have more positive outcomes and thus take the ceiling into account.

# this uses OLS to do t-test of mean differences
# generates normal data, but then rounds/clips
sim_ols <- function(n,eff=0.5,int=4,sd=1){
    df <- data.frame(1:n)
    df$treat <- sample(c(0,1),n,replace=TRUE)
    df$latent <- int + eff*df$treat + rnorm(n,sd=sd)
    df$likert <- clipint(df$latent)
    m1 <- lm(likert ~ treat,data=df)
    cd <- coef(summary(m1))
    pval <- cd[2,4]
    se <- cd[2,2]
    return(c(pval,se))
}

Now we can move onto the simulations, this evaluates sample sizes from 100 to 2000 (in increments of 50), effect sizes of 0.1, 0.3, and 0.5, and repeats the simulations 10k times. I then see the power (how often the two-tailed p-value is less than 0.05), as well as the standard error (precision) of the estimates. Effect sizes are in terms of the original Likert scale values, what I take to be much easier to reason about. (I have seen power analyses here use Cohen’s D, which you really can’t get a very large Cohen’s D value due to ceiling effects with this data.)

# running power estimates over different
# sample sizes and effect sizes
samp_sizes <- seq(100,2000,50)
eff_sizes <- c(0.1,0.3,0.5)
rep_size <- 10000
df <- expand.grid(samps_sizes=samp_sizes,eff_sizes=eff_sizes,pow=c(NA),se=c(NA))
for (i in 1:nrow(df)){
    ss <- df[i,1]
    es <- df[i,2]
    stats <- replicate(rep_size,sim_ols(n=ss,eff=es))
    smean <- rowMeans(stats)
    df[i,3] <- mean(stats[1,] < 0.05) # alpha 0.05
    df[i,4] <- smean[2]
}

df$eff_sizes <- as.factor(df$eff_sizes)

The graph of the power shows what you would expect, so with a few hundred samples you can determine an effect size of 0.5, but with a smaller effect size (on the order of 0.1) you will need more than 2k samples.

# Graph of power
powg <- ggplot(data=df,aes(x=samps_sizes,y=pow)) + 
        geom_line(aes(color=eff_sizes)) + 
        geom_point(pch=21,color='white',size=2,aes(fill=eff_sizes)) +
        labs(x='Sample Sizes',y=NULL,title='Power Estimates') +
        scale_y_continuous(breaks=seq(0,1,0.1)) + 
        scale_x_continuous(breaks=seq(100,2000,250)) + 
        scale_color_discrete(name="Effect Sizes") +
        scale_fill_discrete(name="Effect Sizes") + 
        theme_cdc()

Unfortunately, in reality with most survey measures of police data, e.g. rate your officer 1 to 5, a 0.5 effect is a really large increase. In my mapping attitudes paper, some demographics shift global attitudes at max by 0.2, and I doubt most interventions will be that dramatic. So I like plotting the precision of the estimator, which the effect size doesn’t really make a dent here (it could with more severe ceiling effects).

# Graph of Standard Errors (for Precision)
precg <- ggplot(data=df,aes(x=samps_sizes,y=se,color=eff_sizes)) + 
         geom_line(aes(color=eff_sizes)) + 
         geom_point(pch=21,color='white',size=2,aes(fill=eff_sizes)) +
         labs(x='Sample Sizes',y=NULL,title='Precision Estimates') +
         scale_x_continuous(breaks=seq(100,2000,250)) + 
         scale_color_discrete(name="Effect Sizes") +
         scale_fill_discrete(name="Effect Sizes") + 
         theme_cdc()

With field experiments when considering post police contacts (general attitude surveys you have more wiggle room, but still they cost money to survey) you really can’t control the sample size. You are at the whims of whatever events happen in the police departments daily duties. So the best you can do is make approximate plans for “how long am I going to collect measures before I analyze the data”, and “how reasonably precise will my estimates be”.

This particular grant I make arguments we care more about a non-inferiority type test (e.g. can be sure attitudes are not worse, within a particular error level, given our treatment is more cost-effective than business as usual). But if we did an intervention specifically intended to improve attitudes, you may be talking more like 5,000+ contacts to detect an effect of 0.1 (given likely sample non-response), which is still a big effect.

You would gain power by doing a scale (e.g. summing together multiple items or conducting a factor analysis), assuming the intervention effects the underlying latent trait, and piecemeal individual items. But that will have to be for another day simulating data to do that end-to-end analysis.

Despite not being an academic anymore, I have in the hopper more grant collaborations than I did when I was a professor. The NIJ season is winding down, so probably too late to collaborate on any more of those. But if you have other ideas and need quant help with your projects, always feel free to reach out. I enjoy doing these side projects with academics when reasonable funding is available.

Leave a comment

by Andy Wheeler on May 2, 2024 • Permalink

Posted in Crime Analysis, Criminal Justice, Data Visualization, Python, R

Tagged power-analysis, survey

Posted by Andy Wheeler on May 2, 2024

https://andrewpwheeler.com/2024/05/02/power-for-analyzing-likert-items/

Poisson designs and Minimum Detectable Effects

Ian Adam’s posted a working paper the other day on power analysis for analyzing counts, Power Simulations of Rare Event Counts and Introduction to the ‘Power Lift’ Metric (Adams, 2024). I have a few notes I wanted to make in regards to Ian’s contribution. Nothing I say conflicts with what he writes, moreso just the way I have thought about this problem. It is essentially the same issue as I have written about monitoring crime trends (Wheeler, 2016), or examining quasi-experimental designs with count data (Wheeler & Ratcliffe, 2018; Wilson, 2022).

I am going to make two broader points here: point 1, power is solely a property of the aggregate counts in treated vs control, you don’t gain power by simply slicing your data into finer temporal time periods. Part 2 I show an alternative to power, called minimum detectable effect sizes. This focuses more on how wide your confidence intervals are, as opposed to power (which as Ian shows is not monotonic). I think it is easier to understand the implications of certain designs when approached this way – both from “I have this data, what can I determine from it” (a retrospective quasi-experimental design), as well as “how long do I need to let this thing cook to determine if it is effective”. Or more often “how effective can I determine this thing is in a reasonable amount of time”.

Part 1, Establishing it is all about the counts

So lets say you have a treated and control area, where the base rate is 10 per period (control), and 8 per period (treated):

##########
set.seed(10)
n <- 20 # time periods
reduction <- 0.2 # 20% reduced
base <- 10

control <- rpois(n,base)
treat <- rpois(n,base*(1-reduction))

print(cbind(control,treat))
##########

And this simulation produces 20 time periods with values below:

 [1,]      10     6
 [2,]       9     5
 [3,]       5     3
 [4,]       8     8
 [5,]       9     5
 [6,]      10    10
 [7,]      10     7
 [8,]       9    13
 [9,]       8     6
[10,]      13     8
[11,]      10     6
[12,]       8     8
[13,]      11     8
[14,]       7     8
[15,]      10     7
[16,]       6     8
[17,]      12     3
[18,]      15     5
[19,]      10     8
[20,]       7     7

Now we can fit a Poisson regression model, simply comparing treated to control:

##########
outcome <- c(control,treat)
dummy <- rep(0:1,each=n)

m1 <- glm(outcome ~ dummy,family=poisson)
summary(m1)
###########

Which produces:

Call:
glm(formula = outcome ~ dummy, family = poisson)

Deviance Residuals:
     Min        1Q    Median        3Q       Max
-1.69092  -0.45282   0.01894   0.38884   2.04485

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  2.23538    0.07313  30.568  < 2e-16 ***
dummy       -0.29663    0.11199  -2.649  0.00808 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 32.604  on 39  degrees of freedom
Residual deviance: 25.511  on 38  degrees of freedom
AIC: 185.7

Number of Fisher Scoring iterations: 4

In this set of data, the total treated count is 139, and the total control count is 187. Now watch what happens when we fit a glm model on the aggregated data, where we just now have 2 rows of data?

##########
agg <- c(sum(treat),sum(control))
da <- c(1,0)
m2 <- glm(agg ~ da,family=poisson)
summary(m2)
##########

And the results are:

Call:
glm(formula = agg ~ da, family = poisson)

Deviance Residuals:
[1]  0  0

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  5.23111    0.07313  71.534  < 2e-16 ***
da          -0.29663    0.11199  -2.649  0.00808 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 7.0932e+00  on 1  degrees of freedom
Residual deviance: 9.5479e-15  on 0  degrees of freedom
AIC: 17.843

Number of Fisher Scoring iterations: 2

Notice how the treatment effect coefficients and standard errors are the exact same results as they are with the micro observations. This is something people who do regression models often do not understand. Here you don’t gain power by having more observations, power in the Poisson model is determined by the total counts of things you have observed.

If this were not the case, you could just slice observations into finer time periods and gain power. Instead of counts per day, why not per hour? But that isn’t how it works when using Poisson research designs. Counter-intuitive perhaps, you get smaller standard errors when you observe higher counts.

It ends up being the treatment effect estimate in this scenario is easy to calculate in closed form. This is just riffing off of David Wilson’s work (Wilson, 2022).

treat_eff <- log(sum(control)/sum(treat))
treat_se <- sqrt(1/sum(control) + 1/sum(treat))
print(c(treat_eff,treat_se))

Which produces [1] 0.2966347 0.1119903.

For scenarios in which are slightly more complicated, such as treated/control have different number of periods, you can use weights to get the same estimates. Here for example we have 25 periods in treated and 19 periods in the control using the regression approach.

# Micro observations, different number of periods
treat2 <- rpois(25,base*(1 - reduction))
cont2 <- rpois(19,base)
val2 <- c(treat2,cont2)
dum2 <- c(rep(1,25),rep(0,19))
m3 <- glm(val2 ~ dum2,family=poisson)

# Aggregate, estimate rates
tot2 <- c(sum(treat2),sum(cont2))
weight <- c(25,19)
rate2 <- tot2/weight
tagg2 <- c(1,0)
# errors for non-integer values is fine
m4 <- glm(rate2 ~ tagg2,weights=weight,family=poisson) 
print(vcov(m3)/vcov(m4)) # can see these are the same estimates
summary(m4)

Which results in:

>print(vcov(m3)/vcov(m4)) # can see these are the same estimates
            (Intercept)      dum2
(Intercept)   0.9999999 0.9999999
dum2          0.9999999 0.9999992
>summary(m4)

Call:
glm(formula = rate2 ~ tagg2, family = poisson, weights = weight)

Deviance Residuals:
[1]  0  0

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  2.36877    0.07019  33.750  < 2e-16 ***
tagg2       -0.38364    0.10208  -3.758 0.000171 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The treatment effect estimate is similar, where the variance is still dictated by the counts.

treat_rate <- log(rate2[1]/rate2[2])
treat_serate <- sqrt(sum(1/tot2))
print(c(treat_rate,treat_serate))

Which again is [1] -0.3836361 0.1020814, same as the regression results.

Part 2, MDEs

So Ian’s paper has simulation code to determine power. You can do infinite sums with the Poisson distribution to get closer to closed form estimates, like the e-test does in my ptools package. But the simulation approach is fine overall, so just use Ian’s code if you want power estimates.

The way power analysis works, you pick an effect size, then determine the study parameters to be able to detect that effect size a certain percentage of the time (the power, typically set to 0.8 for convenience). An alternative way to think about the problem is how variable will your estimates be? You can then back out the minimum detectable effect size (MDE), given those particular counts. (Another way people talk about this is plan for precision in your experiment.)

Lets do a few examples to illustrate. So say you wanted to know if training reduced conducted energy device (CED) deployments. You are randomizing different units of the city, so you have treated and control. Baseline rates are around 5% per arrest, and say you have 10 arrests per day in each treated/control arm of the study. Around 30 days, you will have ~15 CED usages. Subsequently the standard error of the logged incident rate ratio will be approximately sqrt(1/15 + 1/15) = 0.37. Thus, the smallest effect size you could detect has to be a logged incident rate ratio pretty much double that value.

Presumably we think the intervention will decrease CED uses, so we are looking at an IRR of exp(-0.37*2) = 0.48. So you pretty much need to cut CED usage in half to be able to detect if the intervention worked when only examining the outcomes for one month. (The 2 comes from using a 95% confidence interval.)

If we say we think best case the intervention had a 20% reduction in CED usage, we would then need exp(-se*2) = 0.8. log(0.8) ~ -0.22, so we need a standard error of se = 0.11 to meet this minimum detectable effect. If we have equal counts in each arm, this is approximately sqrt(1/x + 1/x) = 0.11, with rearranging we get 0.11^2 = 2*(1/x), and then 2/(0.11^2) = x = 166. So we want over 160 events in each treated/control group, to be able to detect a 20% reduction.

Now lets imagine a scenario in which one of the arms is fixed, such as retrospective analysis. (Say the control group is prior time periods before training, and 100% of the patrol officers gets the training.) So we have fixed 100 events in the control group, in that scenario, we need to monitor our treatment until we observe sqrt(1/x + 1/100) = 0.11, that 20% reduction standard. We can rearrange this to be 0.11^2 - 1/100 = 1/x, which is x = 1/0.0021 = 476.

When you have a fixed background count, in either in a treated or control arm, that pretty much puts a lower bound on the standard error. In this case with the control arm that has a fixed 100 events, the standard error can never be smaller than sqrt(1/100) = 0.1. So in that case, you can never detect an effect smaller than exp(-0.2).

Another way to think about this is that with smaller effect sizes, you can approximately translate the standard errors to percent point ranges. So if you want to say plan for precision estimates of around +/- 5% – that is a standard error of 0.05. We are going to need sqrt(z) ~ 0.05. At a minimum we need 400 events in one of the treated or control arms, since sqrt(1/400) = 0.05 (and that is only taking into account one of the arms).

For those familiar with survey stats, these are close to the same sample size recommendation for proportions – it is just instead of total sample size, it is the total counts we are interested in. E.g. if you want +/- 5% for sample proportions, you want around 1,000 observations.

And most of the examples of more complicated research designs (e.g. fixed or random effects, overdispersion estimates) will likely make the power lower, not higher, than the back of the envelope estimates here. But they should be a useful starting to know whether a particular experimental design is dead in the water to detect reasonable effect sizes of interest.

If you found this interesting, you will probably find my work on continuous monitoring of crime trends over time also interesting:

This approach relies on very similar Poisson models to what Ian is showing here, you just monitor the process over time and draw the error intervals as you go. For low powered designs, the intervals will just seem hopelessly wide over time.

References

Adams, I. (2024) Power Simulations of Rare Event Counts and Introduction to the ‘Power Lift’ Metric. CrimRxiv
Blattman, C., Green, D., Ortega, D., & Tobón, S. (2018). Place-based interventions at scale: The direct and spillover effects of policing and city services on crime (No. w23941). National Bureau of Economic Research.
Wheeler, A. P. (2016). Tables and graphs for monitoring temporal crime trends: Translating theory into practical crime analysis advice. International Journal of Police Science & Management, 18(3), 159-172.
Wheeler, A.P., & Ratcliffe, J.H. (2018). A simple weighted displacement difference test to evaluate place based crime interventions. Crime Science, 7(1), 11.
Wilson, D. B. (2022). The relative incident rate ratio effect size for count-based impact evaluations: When an odds ratio is not an odds ratio. Journal of Quantitative Criminology, 38(2), 323–34.

1 Comment

by Andy Wheeler on March 18, 2024 • Permalink

Posted in Crime Analysis, data science, R

Tagged Poisson

Posted by Andy Wheeler on March 18, 2024

https://andrewpwheeler.com/2024/03/18/poisson-designs-and-minimum-detectable-effects/

Harmweighted hotspots, using ESRI python API, and Crime De-Coder Updates

Haven’t gotten the time to publish a blog post in a few. There has been a ton of stuff I have put out on my Crime De-Coder website recently. For some samples since I last mentioned here, have published four blog posts:

on what AI regulation in policing would look like
high level advice on creating dashboards
overview of early warning systems for police
types of surveys for police departments

For surveys a few different groups have reached out to me in regards to the NIJ measuring attitudes solicitation (which is essentially a follow up of the competition Gio and myself won). So get in touch if interested (whether a PD or a research group), may try to coordinate everyone to have one submission instead of several competing ones.

To keep up with everything, my suggestion is to sign up for the RSS feed on the site. If you want an email use the if this than that service. (I may have to stop doing my AltAc newsletter emails, it is so painful to send 200 emails and I really don’t care to sign up for another paid for service to do that.)

I also have continued the AltAc newsletter. Getting started with LLMs, using secrets, advice on HTML, all sorts of little pieces of advice every other week.

I have created a new page for presentations. Including, my recent presentation at the Carolina Crime Analysis Association Conference. (Pic courtesy of Joel Caplan who was repping his Simsi product – thank you Joel!)

If other regional IACA groups are interested in a speaker always feel free to reach out.

And finally a new demo on creating a static report using quarto/python. It is a word template I created (I like often generating word documents that are easier to post-hoc edit, it is ok to automate 90% and still need a few more tweaks.)

Harmweighted Hotspots

If you like this blog, also check out Iain Agar’s posts, GIS/SQL/crime analysis – the good stuff. Here I wanted to make a quick note about his post on weighting Crime Harm spots.

So the idea is that when mapping harm spots, you could have two different areas with same high harm, but say one location had 1 murder and one had 100 thefts. So if murder harm weight = 100 and theft harm weight = 1, they would be equal in weight. Iain talks about different transformations of harm, but another way to think about it is in terms of variance. So here assuming Poisson variance (although in practice that is not necessary, you could estimate the variance given enough historical time series data), you would have for your two hotspots:

Hotspot1: mean 1 homicide, variance 1
Hotspot2: mean 100 thefts, variance 100

Weight of 100 for homicides, 1 for theft

Hotspot1: Harmweight = 1*100 = 100
          Variance = 100^2*1 = 10,000
          SD = sqrt(10,000) = 100

Hotspot2: Harmweight = 100*1 = 100
          Variance = 1^2*100 = 100
          SD = sqrt(100) = 10

When you multiply by a constant, which is what you are doing when multiplying by harm weights, the relationship with variance is Var(const*x) = const^2*Var(x). The harm weights add variance, so you may simple add a penalty term, or rank by something like Harmweight - 2*SD (so the lower end of the harm CI). So in this example, the low end of the CI for Hotspot 1 is 0, but the low end of the CI for Hotspot2 is 80. So you would rank Hotspot2 higher, even though they are the same point estimate of harm.

The rank by low CI is a trick I learned from Evan Miller’s blog.

You could fancy this up more with estimating actual models, having multiple harm counts, etc. But this is a quick way to do it in a spreadsheet with just simple counts (assuming Poisson variance). Which I think is often quite reasonable in practice.

Using ESRI Python API

So I knew you could use python in ESRI, they have a notebook interface now. What I did not realize is now with Pro you can simply do pip install arcgis, and then just interact with your org. So for a quick example:

from arcgis.gis import GIS

# Your ESRI url
gis = GIS("https://modelpd.maps.arcgis.com/", username="user_email", password="???yourpassword???")
# For batch geocoding, probably need to do GIS(api_key=<your api key>)

This can be in whatever environment you want, so you don’t even need ArcGIS installed on the system to use this. It is all web-api’s with Pro. To geocode for example, you would then do:

from arcgis.geocoding import geocode, Geocoder, get_geocoders, batch_geocode

# Can search to see if any nice soul has published a geocoding server

arcgis_online = GIS()
items = arcgis_online.content.search('geocoder north carolina', 'geocoding service', max_items=30)

# And we have four
#[<Item title:"North Carolina Address Locator" type:Geocoding Layer owner:ecw31_dukeuniv>,
# <Item title:"Southeast North Carolina Geocoding Service" type:Geocoding Layer owner:RaleighGIS>, 
# <Item title:"Geocoding Service - AddressNC " type:Geocoding Layer owner:nconemap>, 
# <Item title:"ArcGIS World Geocoding Service - NC Extent" type:Geocoding Layer owner:NCDOT.GOV>]

geoNC = Geocoder.fromitem(items[0]) # lets try Duke
#geoNC = Geocoder.fromitem(items[-1]) # NCDOT.GOV
# can also do directly from URL
# via items[0].url
# url = 'https://utility.arcgis.com/usrsvcs/servers/8caecdf6384144cbafc9d56944af1ccf/rest/services/World/GeocodeServer'
# geoNC = Geocoder(url,gis)

# DPAC
res = geocode('123 Vivian Street, Durham, NC 27701',geocoder=geoNC, max_locations=1)
print(res[0])

Note you cannot cache the geocoding results. To do that, you need to use credits and probably sign in via a token and not a username password.

# To cache, need a token
r2 = geocode('123 Vivian Street, Durham, NC 27701',geocoder=geoNC, max_locations=1,for_storage=True)

# If you have multiple addresses, use batch_geocode, again need a token
#dc_res = batch_geocode(FullAddressList, geocoder=geoNC)

Geocoding to this day is still such a pain. I will need to figure out if you can make a local geocoding engine with ESRI and then call that through Pro (I mean I know you can, but not sure pricing for all that).

Overall being able to work directly in python makes my life so much easier, will need to dig more into making some standard dashboards and ETL processes using ESRI’s tools.

I have another post that has been half finished about using the ESRI web APIs, hopefully will have time to put that together before another 6 months passes me by!

Leave a comment

by Andy Wheeler on March 3, 2024 • Permalink

Posted in Crime Analysis, Crime Mapping, Criminal Justice, data science, geocoding, Python

Tagged ESRI, speaking

Posted by Andy Wheeler on March 3, 2024

https://andrewpwheeler.com/2024/03/03/harmweighted-hotspots-using-esri-python-api-and-crime-de-coder-updates/

Search for:
Recent Posts
Categories
Categories
Site RSS Feeds
- RSS - Posts
- RSS - Comments
Follow Blog via Email

Enter your email address to follow this blog and receive notifications of new posts by email.

Email Address:

Join 400 other subscribers
aoristic big-data cartography census choropleth citeulike color consulting cost-benefit courses crime-mapping crime-trends Crime Analysis Criminal Justice data-manipulation data visualization deep-learning ESRI excel flow-data folium geocoding github google-streetview-api grammar of graphics group-based-trajectory gun-violence healthcare homicide-rates hot spots hypothesis-testing linear programming logistic-regression machine-learning MACRO mapping matplotlib meta network NetworkX officer-involved-shooting open-science paper Papers peer-review Poisson prediction Predictive-Policing preprint presentation Python Python-programability pytorch quasi-experiment r recidivism regression resources scholarly scraping seaborn shootings simulation small-multiples social-media social-networking SPSS stackexchange Stata statistics survey time-series uncertainty wdd web-scraping
Top Posts & Pages
Stack Exchange

All posts in category Crime Analysis

ChatGPT

Google

Claude

Prompt Engineering Ideas

Compiling the documents

Setting up the online store

Print on Demand using LuLu

Other Random Book Stuff

Personal Goals

References

References

Rough Estimates for Spatial Police Planning

References

Pre-print on JTC and examining the Buffer Zone

Flock Story

Webinar on Monitoring Temporal Crime Trends

Monitoring Time in Between Events

References

Power with Likert items

Part 1, Establishing it is all about the counts

Part 2, MDEs

References

Harmweighted Hotspots

Using ESRI Python API

Recent Posts

Categories

Site RSS Feeds

Follow Blog via Email

Top Posts & Pages

Stack Exchange