All posts in category Crime Analysis

Large Language Models for Mortals book

I have published a new book, Large Language Models for Mortals: A Practical Guide for Analysts with Python. The book is available to purchase in my store, either as a paperback (for $59.99) or an epub (for $49.99).

The book is a tutorial on using python with all the major LLM foundation model providers (OpenAI, Anthropic, Google, and AWS Bedrock). The book goes through the basics of API calls, structured outputs, RAG applications, and tool-calling/MCP/agents. The book also has a chapter on LLM coding tools, with example walk throughs for GitHub Copilot, Claude Code (including how to set it up via AWS Bedrock), and Google’s Antigravity editor. (It also has a few examples of local models, which you can see Chapter 2 I discuss them before going onto the APIs in Chapter 3).

You can review the first 60 some pages (PDF link here if on Iphone).

llms_mortals_preview Download

While many of the examples in the book are criminology focused, such as extracting out crime elements from incident narratives, or summarizing time series charts, the lessons are more general and are relevant to anyone looking to learn the LLM APIs. I say “analyst” in the title, but this is really relevant to:

traditional data scientists looking to expand into LLM applications
PhD students (in all fields) who would like to use LLM applications in their work
analysts looking to process large amounts of unstructured textual data

Basically anyone who wants to build or create LLM applications, this is the book to help you get started.

I wrote this book partially out of fear – the rapid pace of LLM development has really upended my work as a data scientist. It is really becoming the most important set of skills (moreso than traditional predictive machine learning) in just the past year or two. This book is the one I wish I had several years ago, and will give analysts a firm grounding in using LLMs in realistic applications.

Again, the book is available in:

For purchase worldwide. Here are all the sections in the book – whether you are an AWS or Google shop, or want to learn the different database alternatives for RAG, or want more self contained examples of agents with python code examples for OpenAI, Anthropic, or Google, this should be a resource you highly consider purchasing.

To come are several more blog posts in the near future, how I set up Claude Code to help me write (and not sound like a robot). How to use conformal inference and logprobs to set false positive rates for classification with LLM models, and some pain points with compiling a Quarto book with stochastic outputs (and points of varying reliability for each of the models).

But for now, just go and purchase the book!

Below is the table of contents to review – it is over 350 pages for the print version (in letter paper), over 250 python code snippets and over 80 screenshots.

			
Large Language Models for Mortals: A Practical Guide for Analysts with Python
by Andrew Wheeler
TABLE OF CONTENTS
Preface
    Are LLMs worth all the hype?
    Is this book more AI Slop?
    Who this book is for
    Why write this book?
    What this book covers
    What this book is not
    My background
    Materials for the book
    Feedback on the book
    Thank you
Basics of Large Language Models
1 What is a language model?
2 A simple language model in PyTorch
3 Defining the neural network
4 Training the model
5 Testing the model
6 Recapping what we just built
Running Local Models from Hugging Face
1 Installing required libraries
2 Downloading and using Hugging Face models
3 Generating embeddings with sentence transformers
4 Named entity recognition with GLiNER
5 Text Generation
6 Practical limitations of local models
Calling External APIs
1 GUI applications vs API access
2 Major API providers
3 Calling the OpenAI API
4 Controlling the Output via Temperature
5 Reasoning
6 Multi-turn conversations
7 Understanding the internals of responses
8 Embeddings
9 Inputting different file types
10 Different providers, same API
11 Calling the Anthropic API
12 Using extended thinking with Claude
13 Inputting Documents and Citations
14 Calling the Google Gemini API
15 Long Context with Gemini
16 Grounding in Google Maps
17 Audio Diarization
18 Video Understanding
19 Calling the AWS Bedrock API
20 Calculating costs
Structured Output Generation
1 Prompt Engineering
2 OpenAI with JSON parsing
3 Assistant Messages and Stop Sequences
4 Ensuring Schema Matching Using Pydantic
5 Batch Processing For Structured Data Extraction using OpenAI
6 Anthropic Batch API
7 Google Gemini Batch
8 AWS Bedrock Batch Inference
9 Testing
10 Confidence in Classification using LogProbs
11 Alternative inputs and outputs using XML and YAML
12 Structured Workflows with Structured Outputs
Retrieval-Augmented Generation (RAG)
1 Understanding embeddings
2 Generating Embeddings using OpenAI
3 Example Calculating Cosine similarity and L2 distance
4 Building a simple RAG system
5 Re-ranking for improved results
6 Semantic vs Keyword Search
7 In-memory vector stores
8 Persistent vector databases
9 Chunking text from PDFs
10 Semantic Chunking
11 OpenAI Vector Store
12 AWS S3 Vectors
13 Gemini and BigQuery SQL with Vectors
14 Evaluating retrieval quality
15 Do you need RAG at all?
Tool Calling, Model Context Protocol (MCP), and Agents
1 Understanding tool calling
2 Tool calling with OpenAI
3 Multiple tools and complex workflows
4 Tool calling with Gemini
5 Returning images from tools
6 Using the Google Maps tool
7 Tool calling with Anthropic
8 Error handling and model retry
9 Tool Calling with AWS Bedrock
10 Introduction to Model Context Protocol (MCP)
11 Connecting Claude Desktop to MCP servers
12 Examples of Using the Crime Analysis Server in Claude Desktop
13 What are Agents anyway?
14 Using Multiple Tools with the OpenAI Agents SDK
15 Composing and Sequencing Agents with the Google Agents SDK
16 MCP and file searching using the Claude Agents SDK
17 LLM as a Judge
Coding Tools and AI-Assisted Development
1 Keeping it real with vibe coding
2 VS Code and GitHub Install
3 GitHub Copilot
4 Claude Code Setup
5 Configuring API access
6 Using Claude Code to Edit Files
7 Project context with CLAUDE.md
8 Using an MCP Server
9 Custom Commands and Skills
10 Session Management
11 Hooks for Testing
12 Claude Headless Mode
13 Google Antigravity
14 Best practices for AI-assisted coding
Where to next?
1 Staying current
2 What to learn next?
3 Forecasting the near future of foundation models
4 Final thoughts

		

Leave a comment

by Andy Wheeler on February 11, 2026 • Permalink

Posted in Crime Analysis, data science, Python, writing

Tagged book, epub

Posted by Andy Wheeler on February 11, 2026

https://andrewpwheeler.com/2026/02/11/large-language-models-for-mortals-book/

Part time product design positions to help with AI companies

Recently on the Crime Analysis sub-reddit an individual posted about working with an AI product company developing a tool for detectives or investigators.

The Mercor platform has many opportunities that may be of interest to my network, so I am sharing them here. These include not only for investigators, but GIS analysts, writers, community health workers, etc. (The eligibility interviewers I think if you had any job in gov services would likely qualify, it is just reviewing questions.)

All are part time (minimum of 15 hours per week), remote, and can be in the US, Canada, or UK. (But cannot support H1-B or OPT visas in the US).

Additional for professionals looking to get into the tech job market, see these two resources:

I actually just hired my first employee at Crime De-Coder. Always feel free to reach out if you think you would be a good fit for the types of applications I am working on (python, GIS, crime analysis experience). I will put you in the list to reach out to when new opportunities are available.

Detectives and Criminal Investigators

Referral Link

$65-$115 hourly

Mercor is recruiting Detectives and Criminal Investigators to work on a research project for one of the world’s top AI companies. This project involves using your professional experience to design questions related to your occupation as a Detective and Criminal Investigator. Applicants must:

Have 4+ years full-time work experience in this occupation;
Be based in the US, UK, or Canada
minimum of 15 hours per week

Community Health Workers

Referral Link

$60-$80 hourly

Mercor is recruiting Community Health Workers to work on a research project for one of the world’s top AI companies. This project involves using your professional experience to design questions related to your occupation as a Community Health Worker. Applicants must:

Have 4+ years full-time work experience in this occupation;
Be based in the US, UK, or Canada
minimum 15 hours per week

Writers and Authors

Referral Link

$60-$95 hourly

Mercor is recruiting Writers and Authors to work on a research project for one of the world’s top AI companies. This project involves using your professional experience to design questions related to your occupation as a Writer and Author.

Applicants must:

Have 4+ years full-time work experience in this occupation;
Be based in the US, UK, or Canada
minimum 15 hours per week

Eligibility Interviewers, Government Programs

Referral Link

$60-$80 hourly

Mercor is recruiting Eligibility Interviewers, Government Programs to work on a research project for one of the world’s top AI companies. This project involves using your professional experience to design questions related to your occupation as a Eligibility Interviewers, Government Program. Applicants must:

Have 4+ years full-time work experience in this occupation;
Be based in the US, UK, or Canada
minimum 15 hours per week

Cartographers and Photogrammetrists

Referral Link

$60-$105 hourly

Mercor is recruiting Cartographers and Photogrammetrists to work on a research project for one of the world’s top AI companies. This project involves using your professional experience to design questions related to your occupation as a Cartographer and Photogrammetrist. Applicants must:

Have 4+ years full-time work experience in this occupation;
Be based in the US, UK, or Canada
minimum 15 hours per week

Geoscientists, Except Hydrologists and Geographers

$85-$100 hourly

Referral Link

Mercor is recruiting Geoscientists, Except Hydrologists and Geographers to work on a research project for one of the world’s top AI companies. This project involves using your professional experience to design questions related to your occupation as a Geoscientists, Except Hydrologists and Geographers Applicants must:

Have 4+ years full-time work experience in this occupation;
Be based in the US, UK, or Canada
minimum of 15 hours per week

Leave a comment

by Andy Wheeler on January 15, 2026 • Permalink

Posted in Crime Analysis, Crime Mapping, Criminal Justice, data science, writing

Tagged AI, product-design, tech-jobs

Posted by Andy Wheeler on January 15, 2026

https://andrewpwheeler.com/2026/01/15/part-time-product-design-positions-to-help-with-ai-companies/

Year in Review 2025 and AI Predictions

For a brief year in review, total views for the two different websites have decreased in the past year. For this blog, I am going to be a few thousand shy of 100,000 views. (2023 I had over 150k views, and 2024 I had over 140k views.) For the Crime De-Coder site, I am going to only get around 15k views.

Part of it is I posted less, this will be the 21st blog post this year on the personal blog (2023 had 46 and 2024 had 32 posts). The Crime De-Coder site had 12 blog posts, so pretty consistent with the prior year. Both are pretty bursty, with large bouts of traffic coming from if I post something to Hacker News I can get 1k to 10k views in a day or two if it makes it to the front page. So the 2024 stats for the crime de-coder was a few of those Hacker News bumps I did not get in 2025.

Some of it could legitimately be traditional Google search being usurped by the gen AI tools. This is the first year I had appreciable referrals from chatgpt, but they are less than 1000. The other tools are trivial amount of referrals. If I worried about SEO more, I would have more updating/regular content (as old pages are devalued quite a bit by google, and it seems to be getting more severe over time).

I have upped my use of the free tools quite a bit. ChatGPT knows me pretty well, and I use Claude Desktop almost every day as well.

An IAM policy scroll is more of a nightmare, and I definitely ask more python questions than R, but the cartoon desk is pretty close to spot on. I am close to paying for Anthropic subscription for Claude code credits (currently use pay as I go via Bedrock, and this is the first month I went over $20).

What pages on the blog are popular I can never be sure of. My most popular post last year was Downloading Police Employment Trends from the FBI Data Explorer. A 2023 post, that had random times where it would have several hundred visits in a short hour span. (Some bot collecting sites? I do not know.) If it is actual people, you would want to check out my Sworn Dashboard site, where you can look at trends for PDs much easier than downloading all the data yourself!

One thing that has grown though, I do short form posting on LinkedIn on my crime de-coder page. Impressions total for the year is over 340k (see the graph), and I currently am a few shy of 4400 followers.

LinkedIn is nice because it can be slightly longer form than the other social media sites. I would suggest you follow me there (in addition to signing up for RSS feeds for the two sites). That is the easiest way to follow my work.

I also took over as a moderator of the Crime Analysis Reddit forum, it is better than the IACA forums in my opinion, so encourage folks to post there for crime analysis questions.

Crime De-Coder Work

Crime De-Coder work has been steady (but not increasing). Similar to last year had several consulting gigs conducting crime analysis for premises liability cases (and one other case I may share my opinions once it is over), and doing some small projects with non-profits and police departments.

One big project was a python training in Austin.

The Python Book (which I also translated to Spanish/French), had a trickle of new sales. 2024 had around 100 sales and 2025 had around 50 sales. It is close to 2/3 print sales and 1/3 epub, so definately folks should have physical prints if you are selling books still.

Doing trainings basically makes writing the book worth it, but I do hope eventually the book makes it way into grad school curriculum’s. (Only one course so far.) I have pitched to grad schools to have me run a similar bootcamp to what I do for crime analysts, so if interested let me know.

The biggest new thing was Crime De-Coder got an Arnold Grant. Working with Denver PD on an experiment to evaluate a chronic offender initiative.

At the Day Gig

At my day gig, I was officially promoted to a senior manager and then quickly to a director position. Hence you get posts like what to show in your tech resume and notes on project management.

One of the reasons I am big on python – it is the dominant programming language in data science. It is hard for me to recruit from my network, as majority of individuals just know a little R (if you were a hard core R person, had packages/well executed public repo’s, I could more easily think you will be able to migrate to python to work on my team).

So learn python if you want to be a data scientist is my advice (and see other job market advice at my archived newsletter).

AI Predictions

At the day gig, my work went from 100% traditional supervised machine learning models to more like 50/50 traditional vs generative AI applications. The genAI hype is real, but I think it is worthwhile putting my thoughts to paper.

The biggest question is will AI take all of our jobs? I think a more likely end scenario is the AI tools just become better at helping humans do tasks. The leap from helping a human do something faster vs an AI tool doing it 100% on its own with 0 human input is hard. The models are getting incrementally better, but I think to fully replace people in a substantive way will require another big advancement in fundamental capabilities. Making a human 10x more productive is easier and still will make the AI companies a ton of money.

Sometimes people view the 10x idea and say that will take jobs, just not 100% of jobs. That is a view though that there is only a finite amount of work to be done. That assumption is clearly not true, and being able to do work faster/cheaper just induces demand for more potential work. The example with calculators making more banking jobs, not less, is basically the same example.

One of the critiques of the current systems is they are overvalued, so we are in a bubble. I do not remember where I read it, but one estimate was if everyone in the US spent $1 a day on the different AI tools, that would justify the current valuations for OpenAI, Anthropic, NVIDIA, etc. I think that is totally doable, we spend a few thousand a workday at Gainwell on the foundation models for example for a few projects, and we are just going to continue to roll out more and more. Gainwell is a company with around 6k employees for reference, and our current AI applications touch way less than 1k of those employees. We have plenty of room to grow those applications.

It is super hard though to build systems to help people do things faster. And we are talking like “this thing that used to take 30 minutes now takes 15 minutes”. If you have 100 people doing that thing all the time though, the costs of the models are low enough it is an easy win.

And this mostly only holds true for knowledge economy work that can be all done via software. There just still needs to be fundamental improvements to robotics to be able to do physical things. The tailor’s job is safe for the foreseeable future.

The change in the data science landscape to more generative AI applications definitely requires social scientists and analysts to up their game though to learn a new set of tools. I do have another book in the works to address that, so hopefully you will see that early next year.

Leave a comment

by Andy Wheeler on December 24, 2025 • Permalink

Posted in Crime Analysis, Criminal Justice, Personal Productivity, Python, R, scholarly

Tagged year-in-review

Posted by Andy Wheeler on December 24, 2025

https://andrewpwheeler.com/2025/12/24/year-in-review-2025-and-ai-predictions/

Advice for crime analyst to break into data science

I recently received a question about a crime analyst looking to break into data science. Figured it would be a good topic for my advice in a blog post. I have written many resources over the years targeting recent PhDs, but the advice for crime analysts is not all that different. You need to pick up some programming, and likely some more advanced tech skills.

For background, the individual had SQL + Excel skills (which many analysts may just have Excel). Vast majority of analyst roles, you should be quite adept at SQL. But just SQL is not sufficient for even an entry level data science role.

For entry data science, you will need to demonstrate competency in at least one programming language. The majority of positions will want you to have python skills. (I wrote an entry level python book exactly for someone in your position.)

You likely will also need to demonstrate competency in some machine learning or using large language models for data science roles. It used to be Andrew Ng’s courses were the best recommendation (I see he has a spin off DeepLearningAI now). So that is second hand though, I have not personally taken them. LLMs are more popular now, so prioritizing learning how to call those APIs, build RAG systems, prompt engineering I think is going to make you slightly more marketable than traditional machine learning.

I have personally never hired anyone in a data science role without a masters. That said, I would not have a problem if you had a good portfolio. (Nice website, Github contributions, etc.)

You should likely start just looking and applying to “analyst” roles now. Don’t worry about if they ask for programming you do not have experience in, just apply. Many roles the posting is clearly wrong or totally unrealistic expectations.

Larger companies, analyst roles can have a better career ladder, so you may just decide to stay in that role. If not, can continue additional learning opportunities to pursue a data science career.

Remote is more difficult than in person, but I would start by identifying companies that are crime analysis adjacent (Lexis Nexis, ESRI, Axon) and start applying to current open analyst positions.

For additional resources I have written over the years:

alt ac newsletter, not running anymore but has most of my advice on those webpages
blog post on different types of tech jobs, approximate salaries
resume advice

The alt-ac newsletter has various programming and job search tips. THe 2023 blog post goes through different positions (if you want, it may be easier to break into project management than data science, you have a good background to get senior analyst positions though), and the 2025 blog post goes over how to have a portfolio of work.

Cover page, data science for crime analysis with python

Leave a comment

by Andy Wheeler on November 21, 2025 • Permalink

Posted in Crime Analysis, Crime Mapping, Criminal Justice, data science, Python

Tagged job-advice

Posted by Andy Wheeler on November 21, 2025

https://andrewpwheeler.com/2025/11/21/advice-for-crime-analyst-to-break-into-data-science/

I translated my book for $7 using openai

The other day an officer from the French Gendarmerie commented that they use my python for crime analysis book. I asked that individual, and he stated they all speak English. But given my book is written in plain text markdown and compiled using Quarto, it is not that difficult to pipe the text through a tool to translate it to other languages. (Knowing that epubs under the hood are just html, it would not suprise me if there is some epub reader that can use google translate.)

So you can see now I have available in the Crime De-Coder store four new books:

French version (paperback and epub)
Spanish version (paperback and epub)

ebook versions are normally $39.99, and print is $49.99 (both available worldwide). For the next few weeks, can use promo code translate25 (until 11/15/2025) to purchase epub versions for $19.99.

If you want to see a preview of the books first two chapters, here are the PDFs:

And here I added a page on my crimede-coder site with testimonials.

As the title says, this in the end cost (less than) $7 to convert to French (and ditto to convert to Spanish).

Here is code demo’ing the conversion. It uses OpenAI’s GPT-5 model, but likely smaller and cheaper models would work just fine if you did not want to fork out $7. It ended up being a quite simple afternoon project (parsing the markdown ended up being the bigger pain).

So the markdown for the book in plain text looks like this:

It ends up that because markdown uses line breaks to denote different sections, that ends up being a fairly natural break to do the translation. These GenAI tools cannot repeat back very long sequences, but a paragraph is a good length. Long enough to have additional context, but short enough for the machine to not go off the rails when trying to just return the text you input. Then I just have extra logic to not parse code sections (that start/end with three backticks). I don’t even bother to parse out the other sections (like LaTeX or HTML), and I just include in the prompt to not modify those.

So I just read in the quarto document, split by “”, then feed in the text sections into OpenAI. I did not test this very much, just use the current default gpt-5 model with medium reasoning. (It is quite possible a non-reasoning smaller model will do just as well. I suspect the open models will do fine.)

You will ultimately still want someone to spot check the results, and then do some light edits. For example, here is the French version when I am talking about running code in the REPL, first in English:

Running in the REPL

Now, we are going to run an interactive python session, sometimes people call this the REPL, read-eval-print-loop. Simply type python in the command prompt and hit enter. You will then be greeted with this screen, and you will be inside of a python session.

And then in French:

Exécution dans le REPL

Maintenant, nous allons lancer une session Python interactive, que certains appellent le REPL, boucle lire-évaluer-afficher. Tapez simplement python dans l’invite de commande et appuyez sur Entrée. Vous verrez alors cet écran et vous serez dans une session Python.

So the acronym is carried forward, but the description of the acronym is not. (And I went and edited that for the versions on my website.) But look at this section in the intro talking about GIS:

There are situations when paid for tools are appropriate as well. Statistical programs like SPSS and SAS do not store their entire dataset in memory, so can be very convenient for some large data tasks. ESRI’s GIS (Geographic Information System) tools can be more convenient for specific mapping tasks (such as calculating network distances or geocoding) than many of the open source solutions. (And ESRI’s tools you can automate by using python code as well, so it is not mutually exclusive.) But that being said, I can leverage python for nearly 100% of my day to day tasks. This is especially important for public sector crime analysts, as you may not have a budget to purchase closed source programs. Python is 100% free and open source.

And here in French:

Il existe également des situations où les outils payants sont appropriés. Les logiciels statistiques comme SPSS et SAS ne stockent pas l’intégralité de leur jeu de données en mémoire, ils peuvent donc être très pratiques pour certaines tâches impliquant de grands volumes de données. Les outils SIG d’ESRI (Système d’information géographique) peuvent être plus pratiques que de nombreuses solutions open source pour des tâches cartographiques spécifiques (comme le calcul des distances sur un réseau ou le géocodage). (Et les outils d’ESRI peuvent également être automatisés à l’aide de code Python, ce qui n’est pas mutuellement exclusif.) Cela dit, je peux m’appuyer sur Python pour près de 100 % de mes tâches quotidiennes. C’est particulièrement important pour les analystes de la criminalité du secteur public, car vous n’avez peut‑être pas de budget pour acheter des logiciels propriétaires. Python est 100 % gratuit et open source.

So it translated GIS to SIG in French (Système d’information géographique). Which seems quite reasonable to me.

I paid an individual to review the Spanish translation (if any readers are interested to give me a quote for the French version copy-edits, would appreciate it). She stated it is overall very readable, but just has many minor things. Here is a a sample of suggestions:

Total number of edits she suggested were 77 (out of 310 pages).

If you are interested in another language just let me know. I am not sure about translation for the Asian languages, but I imagine it works OK out of the box for most languages that are derivative of Latin. Another benefit of self-publishing, I can just have the French version available now, but if I am able to find someone to help with the copy-edits I will just update the draft after I get their feedback.

Leave a comment

by Andy Wheeler on October 25, 2025 • Permalink

Posted in Crime Analysis, Crime Mapping, data science, Python, writing

Tagged book, epub, generative-ai, translation

Posted by Andy Wheeler on October 25, 2025

https://andrewpwheeler.com/2025/10/25/i-translated-my-book-for-7-using-openai/

Reloading classes in python and shared borders

For some housekeeping, if you are not signed up, also make sure to sign up for the RSS feed of my crime de-coder blog. I have not been cross posting here consistently. For the last few posts:

Using the SPPT to Monitor Changes in Crime (soft launching my crimepy python package)
Cost-benefit analysis of Gun Shot Detection Tech (this is a repost from the Criminal Justician series on the American Society of Evidence Based Policing)
Using LLMs to Extract Data from Text
Finding outliers in proportions

For ASEBP, conference submissions for 2026 are open. (I will actually be going to this in 2026, submitted a 15 minute talk on planning experiments.)

Today will just be a quick post on two pieces of code I thought might be useful to share. The first is useful for humans, when testing code in functions, you can use the importlib library to reload functions. That is, imagine you have code:

import crimepy as cpy

test = cpy.func1(....)

And then when you run this, you see that func1 has an error. You can edit the source code, and then run:

from importlib import reload

reload(cpy)
test = cpy.func1(....)

This is my preferred approach to testing code. Note you need to import a library and reload the library. Not from crimepy import *, this will not work unfortunately.

Recently was testing out code that took quite a while to run, my Patrol Districting. This is a class, and I was editing my methods to make maps. The code itself takes around 5 minutes to run it through when remaking the entire class. What I found was a simpler approach, I can dump out the file to pickle, then reload the library, then load the pickle object. The may the pickle module works, it pulls the method definition from the global environment (pickle just saves the dict items under the hood). So the code looked like this:

from crimepy import pmed
...
pmed12 = pmed.pmed(...)
pmed12.map_plot() # causes error

And then instead of using the reload method as is (which would require me to create an entirely new object), use this approach for de-bugging:

# save file
import pickle
with open('pmed12.pkl', 'wb') as file:
    # Dump data with highest protocol for best performance
    pickle.dump(pmed12, file)

from importlib import reload
# edit method
reload(pmed)

# reload the object
with open('pmed12.pkl', 'rb') as file:
    # Load the pickled data
    pmed12_new = pickle.load(file)

# retest the method
pmed12_new.map_plot()

Writing code itself is often not the bottleneck – testing is. So figuring out ways to iterate testing faster is often worth the effort (I might have saved a day or two of work if I did this approach sooner when debugging that code).

The second code snippet is useful for the machines; I have been having Claude help me write quite a bit of the crimepy work. Here was one though it was having trouble with – calculating the shared border length between two polygons. Basically it went down an overly complicated path to get the exact calculation, whereas here I have an approximation using tiny buffers that works just fine and is much simpler.

def intersection_length(poly1,poly2,smb=1e-15):
    '''
    Length of the intersection between two shapely polygons
    
    poly1 - shapely polygon
    poly2 - shapely polygon
    smb - float, defaul 1e-15, small distance to buffer
    
    The way this works, I compute a very small buffer for
    whatever polygon is simpler (based on length)
    then take the intersection and divide by 2
    so not exact, but close enough for this work
    '''
    # buffer the less complicated edge of the two
    if poly1.length > poly2.length:
        p2, p1 = poly1, poly2
    else:
        p1, p2 = poly1, poly2
    # This basically returns a very skinny polygon
    pb = p1.buffer(smb,cap_style='flat').intersection(p2)
    if pb.is_empty:
        return 0.0
    elif hasattr(pb, 'length')
        return (pb.length-2*smb)/2
    else:
        return 0.0

And then for some tests:

from shapely.geometry import Polygon

poly1 = Polygon([(0, 0), (4, 0), (4, 3), (0, 3)])  # Rectangle
poly2 = Polygon([(2, 1), (6, 1), (6, 4), (2, 4)])  # Overlapping rectangle

intersection_length(poly1,poly2) # should be close to 0

poly3 = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
poly4 = Polygon([(2, 0), (4, 0), (4, 2), (2, 2)])

intersection_length(poly3,poly4) # should be close to 2

poly5 = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
poly6 = Polygon([(2, 0), (4, 0), (4, 3), (1, 3), (1, 2), (2, 2)])

intersection_length(poly5,poly6) # should be close to 3

poly7 = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
poly8 = Polygon([(3, 0), (5, 0), (5, 2), (3, 2)])

intersection_length(poly7,poly8) # should be 0

Real GIS data often has imperfections (polygons that do not perfectly line up). So using the buffer method (and having an option to increase the buffer size) can often help smooth out those issues. It will not be exact, but the inexactness we are talking about will often be to well past the 10th decimal place.

Leave a comment

by Andy Wheeler on August 26, 2025 • Permalink

Posted in Crime Analysis, data science, Mapping, Python

Posted by Andy Wheeler on August 26, 2025

https://andrewpwheeler.com/2025/08/26/reloading-classes-in-python-and-shared-borders/

The difference between models, drive-time vs fatality edition

Easily one of the most common critiques I make when reviewing peer reviewed papers is the concept, the difference between statistically significant and not statistically significant is not itself statistically significant (Gelman & Stern, 2006).

If you cannot parse that sentence, the idea is simple to illustrate. Imagine you have two models:

Model     Coef  (SE)  p-value
  A        0.5  0.2     0.01
  B        0.3  0.2     0.13

So often social scientists will say “well, the effect in model B is different” and then post-hoc make up some reason why the effect in Model B is different than Model A. This is a waste of time, as comparing the effects directly, they are quite similar. We have an estimate of their difference (assuming 0 covariance between the effects), as

Effect difference = 0.5 - 0.3 = 0.2
SE of effect difference = sqrt(0.2^2 + 0.2^2) = 0.28

So when you compare the models directly (which is probably what you want to do when you are describing comparisons between your work and prior work), this is a bit of a nothing burger. It does not matter that Model B is not statistically significant, a coefficient of 0.3 is totally consistent with the prior work given the standard errors of both models.

Reminded again about this concept, as Arredondo et al. (2025) do a replication of my paper with Gio on drive time fatalities and driving distance (Circo & Wheeler, 2021). They find that distance (whether Euclidean or drive time) is not statistically significant in their models. Here is the abstract:

Gunshot fatality rates vary considerably between cities with Baltimore, Maryland experiencing the highest rate in the U.S.. Previous research suggests that proximity to trauma care influences such survival rates. Using binomial logistic regression models, we assessed whether proximity to trauma centers impacted the survivability of gunshot wound victims in Baltimore for the years 2015-2019, considering three types of distance measurements: Euclidean, driving distance, and driving time. Distance to a hospital was not found to be statistically associated with survivability, regardless of measure. These results reinforce previous findings on Baltimore’s anomalous gunshot survivability and indicate broader social forces’ influence on outcomes.

This ends up being a clear example of the error I describe above. To make it simple, here is a comparison between their effects and the effects in my and Gio’s paper (in the format Coef (SE)):

Paper     Euclid          Network       Drive Time
Philly     0.042 (0.021)  0.030 (0.016)    0.022 (0.010)
Baltimore  0.034 (0.022)  0.032 (0.020)    0.013 (0.006)

At least for these coefficients, there is literally nothing anomalous at all compared to the work me and Gio did in Philadelphia.

To translate these coefficients to something meaningful, Gio and I estimate marginal effects – basically a reduction of 2 minutes results in a decrease of 1 percentage point in the probability of death. So if you compare someone who is shot 10 minutes from the hospital and has a 20% chance of death, if you could wave a wand and get them to the ER 2 minutes faster, we would guess their probability of death goes down to 19%. Tiny, but over many such cases makes a difference.

I went through some power analysis simulations in the past for a paper comparing longer drive time distances as well (Sierra-Arévalo et al. 2022). So the (very minor) differences could also be due to omitted variable bias (in logit models, even if not confounded with the other X, can bias towards 0). The Baltimore paper does not include where a person was shot, which was easily the most important factor in my research for the Philly work.

To wrap up – we as researchers cannot really change broader social forces (nor can we likely change the location of level 1 emergency rooms). What we can change however are different methods to get gun shot victims to the ER faster. These include things like scoop-and-run (Winter et al., 2022), or even gun shot detection tech to get people to scenes faster (Piza et al., 2023).

References

Arredondo, M., Pires, S. F., & Goddard, T. (2025). The Spatial Distribution and Determinants of Fatal vs. Nonfatal Shootings in Baltimore, USA: Examining the Impact of Distance to a Trauma Center on Fatality Rates. Deviant Behavior, Online First
Circo, G. M., & Wheeler, A. P. (2021). Trauma Center Drive Time Distances and Fatal Outcomes among Gunshot Wound Victims. Applied Spatial Analysis and Policy, 14(2), 379-393.
Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself statistically significant. The American Statistician, 60(4), 328-331.
Piza, E. L., Hatten, D. N., Carter, J. G., Baughman, J. H., & Mohler, G. O. (2023). Gunshot Detection Technology Time Savings and Spatial Precision: An Exploratory Analysis in Kansas City. Policing: A Journal of Policy and Practice, 17, paac097.
Sierra-Arévalo, Michael, Justin Nix, & Bradley O’Guinn (2022). A National Analysis of Trauma Care Proximity and Firearm Assault Survival among U.S. Police. Police Practice and Research 23(3), 388–396.
Winter, E., Byrne, J. P., Hynes, A. M., Geng, Z., Seamon, M. J., Holena, D. N., … & Cannon, J. W. (2022). Coming in hot: Police transport and prehospital time after firearm injury. Journal of Trauma and Acute Care Surgery, 93(5), 656-663.

1 Comment

by Andy Wheeler on July 28, 2025 • Permalink

Posted in Crime Analysis, Crime Mapping, Criminal Justice, data science

Tagged statistics

Posted by Andy Wheeler on July 28, 2025

https://andrewpwheeler.com/2025/07/28/the-difference-between-models-drive-time-vs-fatality-edition/

Follow up on Reluctant Criminologists critique of build stuff

Jon Brauer and Jake Day recently wrote a response to my build stuff post, Should more criminologists build stuff on their Reluctant Criminologists blog. Go ahead and follow Jon’s and Jake’s thoughtful work. They asked for comment before posting – I mainly wanted to post my response to be more specific about “how much” I think criminologists should build stuff. (Their initial draft said I should “abandon” theoretical research, which is not what I meant (and they took out before publishing), but could see how one could be confused by my statement “emphasis should he flipped” after saying near 0% of work now is building stuff.)

So here is my response to their post:

To start I did not say abandon theoretical research, I said “the emphasis be on doing”. So that is a relative argument, not an absolute. It is fine to do theoretical work, and it is fine to do ex-ante policy evaluations (which should be integrated into the process of building things, seeing if something works well enough to justify its expense I would say is a risky test). I do not have a bright line that I think building stuff is optimal for the field, but it should be much more common than it is now (which is close to 0%). To be specific on my personal opinion, I do think “build stuff” should be 50%+ of research criminologists’ time (relative to writing papers).

I am actually more concerned with the larping idea I gave. You have a large number of papers in criminology that justify their motivation not really as theoretical, but as practical to operations. And they are just not even close. So let’s go with the example of precise empirical distributions of burglaries at the neighborhood level. (It is an area I am familiar with, and there are many “I have a new model for crime in space” papers.) Pretend I did have a forecast, and I said there are going to be 10 burglaries in your neighborhood next month. What exactly are you supposed to do with that information? Forecasting the distribution does not intrinsically make it obvious how to prevent crime (nature does not owe us things we can manipulate). Most academic criminologists would say it is useful for police allocation, which is so vague as to be worthless.

You also do not need a fully specified causal chain to prevent crime. Most of the advancement in crime prevention looks more like CPTED applications than understanding “root causes” (which that phrase I think is a good example of an imprecise theory). I would much rather academics try to build specific CPTED applications than write another regression paper on crime and space (even if it is precise).

For the dark side part, in the counterfactual world in which academics don’t focus on direct applications, it does not mean those applications do not get built. They just get built by people who are not criminologists. It was actually the main reason I wrote the post – folks are building things now that should have more thoughtful input from academic criminologists.

For a specific example, different tech companies are demo’ing products with the ultimate goal of improving police officers mental health. These include flagging if officers go to certain types of calls too often, or using a chatbot as a therapist. Real things I would like criminologists like yourselves being involved in product development, so you can say “How do we know this is actually improving the mental health of officers?”. I vehemently disagree that more academic criminologists being involved will make development of these applications worse.

The final part I want to say is that apps need not intrinsically be focused on anything. I gave examples in policing that I am aware of, because that is my stronger area of expertise, but it can be anything. So let’s go with personal risk assessments. Pre-trial, parole/probation risk assessments look very similar to what Burgess built 100 years ago at this point. So risk stratification is built on the idea that you need to triage resources (some people need more oversight, some less), especially for the parole scenario. Now it is certainly feasible someone comes up with a better technological solution that risk stratification is not needed at all (say better sensors or security systems that obviate the need for the more intensive human oversight). Or a more effective regimen that applies to everyone, say better dynamic risk assessments, so people are funneled faster to more appropriate treatment regimes than just having a parole officer pop in from time to time.

I give this last example because I think it is a an area where focusing on real applications I suspect will be more fruitful long term for theory development. So we have 100 years and thousands of papers on risk assessment, but really only very incremental progress in that area. I believe a stronger focus on actual application – thinking about dynamic measures to accomplish specific goals (like the treatment monitoring and assignment), is likely to be more fruitful than trying to pontificate about some new theory of man that maybe later can be helpful.

We don’t have an atom to reduce observations down to (nor do we have an isolated root node in a causal diagram). We are not going to look hard enough and eventually find Laplace’s Demon. Focusing on a real life application, how people are going to use the information in practice, I think is a better way for everyone to frame their scientific pursuits. It is more likely a particular application changes how we think about the problem all together, and then we mold the way we measure to help accomplish that specific task. Einstein just started with the question “how do we measure how fast things travel when everything is moving”, a very specific question. He did not start out by saying “I want a theory of the universe”.

I am more bullish on real theoretical breakthroughs coming from more mundane and practical questions like “how do we tell if a treatment is working” or “how do we know if an officer is having a mental health crisis” than I am about someone coming up with a grander theory of whatever just from reading peer reviewed papers in their tower.

And here is Jon’s response to that:

Like you, we try to be optimistic, encouraging, and constructive in tone, though at times it requires serious effort to keep cynicism at bay. In general, if we had more Andrew Wheeler’s thoughtfully building things and then evaluating them, then I agree this would be a good thing. Yet, if I don’t trust someone enough to meaningfully observe, record, and analyze the gauges, then I’m certainly not going to trust them to pilot – or to understand well enough to successfully build and improve upon the car/airplane/spaceship. Meanwhile, the normative analysis is that everything is significant/everything works – unless it’s stuff we collectively don’t like. In that context, the cynic in me things we are better off of we simply focus on teaching many (most?) social scientists to observe and analyze better – and may even do less harm despite wasted resources by letting them larp.

Jon and Jake do not have a an estimate in their post on what they think the mix should be building vs theorizing (they say pluralist in the post). I think the near 0 we do now is not good.

Much of this back and forth tends to mirror the current critique of advocacy in science. The Charles Tittle piece they cite, The arrogance of public sociology, could have been written yesterday.

Both the RC group and Tittle’s have what I would consider a perfect enemy of the good argument going on. People can do bad work, people can do good work. I want folks to go out and do good, meaningful work. I have met plenty of criminologists (and the flipside the level of competence of many software engineers) to not have RC’s level of cynicism.

As an individual, I don’t think it makes much sense to worry about the perception of the field as a whole. I cannot control my fellow criminologists, I can only control what I personally do. Tittle in his critique thought public sociology would erode any legitimacy of the field. He maybe was right, but I posit producing mostly irrelevant work will put criminology on the same path.

Leave a comment

by Andy Wheeler on June 18, 2025 • Permalink

Posted in Crime Analysis, Criminal Justice, scholarly

Tagged research-software

Posted by Andy Wheeler on June 18, 2025

https://andrewpwheeler.com/2025/06/18/follow-up-on-reluctant-criminologists-critique-of-build-stuff/

Build Stuff

I have had this thought in my head for a while – criminology research to me is almost all boring. Most of the recent advancement in academia is focused on making science more rigorous – more open methods, more experiments, stronger quasi-experimental designs. These are all good things, but to me still do not fundamentally change the practical implementation of our work.

Criminology research is myopically focused on learning something – I think this should be flipped, and the emphasis be on doing something. We should be building things to improve the crime and justice system.

How criminology research typically goes

Here is a screenshot of the recent articles published in the Journal of Quantitative Criminology. I think this is a pretty good cross-section of high-quality, well-respected research in criminology.

Three of the four articles are clearly ex-ante evaluations of different (pretty normal) policies/behavior by police and their subsequent downstream effects on crime and safety. They are all good papers, and knowing how effective a particular policy works (like stop and frisk, or firearm seizures) are good! But they are the literal example where the term ivory tower comes from – these are things happening in the world, and academics passively observe and say how well they are working. None of the academics in those papers were directly involved in any boots on the ground application – they were things normal operations the police agencies in question were doing on their own.

Imagine someone said “I want to improve the criminal justice system”, and then “to accomplish this, I am going to passively observe what other people do, and tell them if it is effective or not”. This is almost 100% of what academics in criminology do.

The article on illicit supply chains is another one that bothers me – it is sneaky in the respect that many academics would say “ooh that is interesting and should be helpful” given its novelty. I challenge anyone to give a concrete example of how the findings in the article can be directly useful in any law enforcement context. Not hypothetical, “can be useful in targeting someone for investigation”, like literal “this specific group can do specific X to accomplish specific Y”. We have plenty of real problems with illicit supply chains – drug smuggling in and out of the US (recommend the contraband show on Amazon, who knew many manufactures smuggle weed from US out to the UK!). Fentanyl or methamphetamine production from base materials. Retail theft groups and selling online. Plenty of real problems.

Criminology articles tend to be littered with absurdly vague accusations that they can help operations. They almost always cannot.

So we have articles that are passive evaluations of policies other people thought up. I agree this is good, but who exactly comes up with the new stuff to try out? We just have to wait around and hope other people have good ideas and take the time to try them out. And then we have theoretical articles larping as useful in practice (since other academics are the ones reviewing the papers, and no one says “erm, that is nice but makes no sense for practical day to day usage”).

Some may say this is the way science is supposed to work. My response to that is I don’t know dude, go and look at what folks are doing in the engineering or computer science or biology department. They seem to manage both theoretical and practical advancements at the same time just fine and dandy.

Well what have you built Andy?

It is a fair critique if you say “most of your work is boring Andy”. Most of my work is the same “see how a policy works from the ivory tower”, but a few are more “build stuff”. Examples of those include:

making patrol areas (also see fairness constraints in patrol)
prioritization for call-ins (an actual algorithm to identify whom to call in in a focused deterrence intervention, not a hypothetical scenario)
simple rules for flagging outlying cases when monitoring crime series

In the above examples, the one that I know has gotten the most traction are simple rules to identify crime spikes. I know because I have spent time demonstrating that work to various crime analysts across the country, and so many have told me “I use your Poisson Z-score Andy”. (A few have used the patrol area work as well, so I should be in the negative for carbon generation.)

Papers are not what matter though – papers are a distraction. The applications are what matter. The biggest waste currently in academic criminology work is peer reviewed papers. Our priorities as academics are totally backwards. We are evaluated on whether we get a paper published, we should be evaluated on whether we make the world a better place. Papers by themselves do not make the world a better place.

Instead of writing about things other people are doing and whether they work, we should spend more of our time trying to create things that improve the criminal justice system.

Some traditional academics may not agree with this – science is about formulating and testing hypotheses. This need not be in conflict with doing stuff. Have a theory about human nature, what better way to prove the theory than building something to attempt to change things for the better according to your theory. If it works in real life to accomplish things people care about guess what – other people will want to do it. You may even be able to sell it.

Examples of innovations I am excited about

Part of what prompted this was I was talking to a friend, and basically none of the things we were excited about have come from academic criminologists. I think a good exemplar of what I mean here is Anthony Tassone, the head of Truleo. To be clear, this is not a dig but a compliment, following some of Anthony’s posts on social media (LinkedIn, X), he is not a Rhodes Scholar. He is just some dude, building stuff for criminal justice agencies mostly using the recent advancements in LLMs.

For a few other examples of products I am excited about how they can improve criminal justice (I have no affiliations with these beyond I talk to people). Polis for evaluating body worn camera feeds. Dan Tatenko for CaseX is building an automated online crime reporting system that is much simpler to use. The folks at Carbyne (for 911 calls) are also doing some cool stuff. Matt White at Multitude Insights is building a SaaS app to better distribute BOLOs.

The folks at Polis (Brian Lande and Jon Wender) are the only two people in this list that have anything remotely to do with academic criminology. They each have PhDs (Brian in sociology and Jon in criminology). Although they were not tenure track professors, they are former/current police officers with PhDs. Dan at CaseX was a detective not that long ago. The folks at Carbyne I believe are have tech backgrounds. Matt has a military background, but pursued his start up after doing an MBA.

The reason I bring up Anthony Tassone is because when we as criminologists say we are going to passively evaluate what other people are doing, we are saying “we will just let tech people like Anthony make decisions on what real practitioners of criminal justice pursue”. Again not a dig on Anthony – it is a good thing for people to build cool stuff and see if there is a market. My point is that if Anthony can do it, why not academic criminologists?

Rick Smith at Axon is another example. While Axon really got its dominate market due to conducted energy devices and then body worn cameras (so hardware), quite a bit of the current innovation at Axon is software. And Rick did not have a background in hardware engineering either, he just had an idea and built it.

Transferring over into professional software engineering since 2020, let me tell my fellow academics, you too can write software. It is more about having a good idea that actually impacts practice.

Where to next?

Since the day gig (working on fraud-waste-abuse in Medicaid claims) pays the bills, most of my build stuff is now focused on that. The technical skills to learn software engineering are currently not effectively taught in Criminal Justice PhD programs, but they could be. Writing a dissertation is way harder than learning to code.

While my python book has a major focus on data analysis, it is really the same skills to jump to more general software engineering. (I specifically wrote the book to cover more software engineering topics, like writing functions and managing environments, as most of the other python data science books lack that material.)

Skills gap is only part of the issue though. The second is supporting work that pursues building stuff. It is really just norms in the current academe that stop this from occurring now. People value papers, NIJ (at least used to) mostly fund very boring incremental work.

I discussed start ups (people dreaming and building their own stuff) and other larger established orgs (like Axon). Academics are in a prime position to pursue their own start ups, and most Universities have some support for this (see Joel Caplan and Simsi for an example of that path). Especially for software applications, there are few barriers. It is more about time and effort spent pursuing that.

I think the more interesting path is to get more academic criminologists working directly with software companies. I will drop a specific example since I am pretty sure he will not be offended, everyone would be better off if Ian Adams worked directly for one of these companies (the companies, Ian’s take home pay, long term advancement in policing operations). Ian writes good papers – it would be better if Ian worked directly with the companies to make their tools better from the get go.

My friend I was discussing this with gave the example of Bell Labs. Software orgs could easily have professors take part time gigs with them directly, or just go work with them on sabbaticals. Axon should support something like that now.

While this post has been focused on software development, I think it could look similar for collaborating with criminal justice agencies directly. The economics will need to be slightly different (they do not have quite as much expendable capital to support academics, the ROI for private sector I think should be easily positive in the long run). But that I think that would probably be much more effective than the current grant based approach. (Just pay a professor directly to do stuff, instead of asking NIJ to indirectly support evaluation of something the police department has decided to already put into operation.)

Scientific revolutions are not happening in journal articles. They are happening by people building stuff and accomplishing things in the real world with those innovations.

For a few responses to this post, Alex sent me this (saying my characterization of Larry as passively observing is not quite accurate), which is totally reasonable:

Nice post on building/ doing things and thanks for highlighting the paper with Larry. One error however, Larry was directly involved in the doing. He was the chief science officer for the London Met police and has designed their new stop and frisk policy (and targeting areas) based directly on our work. Our work was also highlighted by the Times London as effective crime policy and also by the Chief of the London Met Police as well who said it was one of the best policy relevant papers he’s ever seen. All police are now being by trained on the new legislation on stop and search in procedurally just ways. You may not have known this background but it’s directly relevant to your post.

Larry Sherman (and David Weisburd), and their work on hot spots + direct experiments with police are really exemplars of “doing” vs “learning”. (David Kennedy and his work on focused deterrence is another good example.) In the mid 90s when Larry or David did experiments, they likely were directly involved in a way that I am suggesting – the departments are going and asking Larry “what should we do”.

My personal experience, trying to apply many of the lessons of David’s and Larry’s work (which was started around 30 years ago at this point), is not quite like that. It is more of police departments have already committed to doing something (like hotspots), and want help implementing the project, and maybe some grant helps fund the research. Which is hard and important work, but honestly just looks like effective project management (and departments should just invest in researchers/project managers directly, the external funding model does not make sense long term). For a more on point example of what I mean by doing, see what Rob Guerette did as an embedded criminologist with Miami PD.

Part of the reason I wrote the post, if you think about the progression of policing, we have phases – August Vollmer for professionalization in the early 1900’s. I think you could say folks like Larry and David (and Bill Bratton) brought about a new age of metrics to PDs in the 90s.

There are also technology changes that fundamentally impact PDs. Cars + 911 is one. The most recent one is a new type of oversight via body worn cameras. Folks who are leading this wave of professionalization changes are tech folks (like Rick Smith and Anthony Tassone). I think it is a mistake to just sit on the sidelines and see what these folks come up with – I want academic criminologists to be directly involved in the nitty gritty of the implementations of these systems and making them better.

A second response to this is that building stuff is hard, which I agree and did not mean to imply it was as easy as writing papers (it is not). Here is Anthony Tassone’s response on X:

I know this is hard. This is part of why I mentioned the Bell labs path. Working directly for an already established company is much easier/safer than doing your own startup. Bootstrapping a startup is additionally much different than doing VC go big or go home – which academics on sabbaticals and as a side hustle are potentially in a good position to do this.

Laura Huey did this path, and does not have nice things to say about it:

I have not talked to Laura specifically about this, but I suspect it is her experience running the Canadian Society of Evidence Based Policing. Which I would not suggest starting a non-profit either honestly. Even if you start a for-profit, there is no guarantee you will be in a good position in your current academic position to be well supported.

Again no doubt building useful stuff is harder than writing papers. For a counter to these though, doing my bootstrapped consulting firm is definitely not as stressful as building a large company like Anthony. And working for a tech company directly was a good career move for me (although now I spend most of my day building stuff to limit fraud-waste-abuse in Medicaid claims, not improving policing).

My suggestion that the field should be more focused on building stuff was not because it was easier, it was because if you don’t there is a good chance you are mostly irrelevant.

1 Comment

by Andy Wheeler on June 12, 2025 • Permalink

Posted in Crime Analysis, Criminal Justice, data science, Python, scholarly

Tagged research-software

Posted by Andy Wheeler on June 12, 2025

https://andrewpwheeler.com/2025/06/12/build-stuff/

AMA OLS vs Poisson regression

Crazy busy with Crime De-Coder and day job, so this blog has gone by the wayside for a bit. I am doing more python training for crime analysts, most recently in Austin.

If you want to get a flavor of the training, I have posted a few example videos on YouTube. Here is an example of going over Quarto markdown documents:

I do these custom for each agency. So I log into your system, do actual queries with your RMS to illustrate. Coding is hard to get started, so part of the idea behind the training is to figure out all of the hard stuff (installation, connecting to your RMS, setting up batch jobs), so it is easier for analysts to get started.

This post was a good question I recently received from Lars Lewenhagen at the Swedish police:

In my job I often do evaluations of place-based interventions. Sometimes there is a need to explore the dosage aspect of the intervention. If I want to fit a regression model for this the literature suggests doing a GLM regression predicting the crime counts in the after period with the dosage and crime counts in the before period as covariates. This looks right to me, but the results are often contradictory. Therefore, I contemplated making change in crime counts the dependent variable and doing simple linear regression. I have not seen anyone doing this, so it must be wrong, but why?

And my response was:

Short answer is OLS is probably fine.

Longer answer to tell whether it makes more sense for OLS vs GLM what matters is mostly the functional relationship between the dose response. So for example, say your doses were at 0,1,2,3

A linear model will look like for example

E[Y] = 10 + 3*x
Dose, Y
 0  , 10
 1  , 13
 2  , 16
 3  , 19
E[Y] is the “expected value of Y” (the parameter that is akin to the sample mean). For a Poisson model, it will look like:

log(E[Y]) = 2.2 + 0.3*x
Dose, Y
 0  ,  9.0
 1  , 12.2
 2  , 16.4
 3  , 22.2
So if you plot your mean crime at the different doses, and it is a straight line, then OLS is probably the right model. If you draw the same graph, but use a logged Y axis and it is a straight line, Poisson GLM probably makes more sense.

In practice it is very hard to tell the difference between these two curves in real life (you need to collect dose response data at many points). So just going with OLS is not per se good or bad, it is just a different model and for experiments with only a few dose locations it won’t make much of a difference to describe the experiment itself.

Where the model makes a bigger difference is extrapolating. Go with our above two models, and look at the prediction for dose=10. The differences between the two models make a much larger difference.

I figured this would be a good one for the blog. Most of the academic material will talk about the marginal distribution of the variable being modeled (which is not quite right, as the conditional distribution is what matters). Really for alot of examples I look, linear models are fine, hence why I think the WDD statistic is reasonable (but not always).

For quasi-experiments it is the ratio between treated and control as well, but for a simpler dose-response scenario, you can just plot the means at binned locations of the doses and then see if it is a straight or curved line. In sample it often doesn’t even matter very much, it is all just fitting mean values. Where it is a bigger deal is extrapolation outside of the sample.

Leave a comment

by Andy Wheeler on May 28, 2025 • Permalink

Posted in ask me anything, Crime Analysis

Tagged regression

Posted by Andy Wheeler on May 28, 2025

https://andrewpwheeler.com/2025/05/28/ama-ols-vs-poisson-regression/

Search for:
Recent Posts
Categories
Categories
Site RSS Feeds
- RSS - Posts
- RSS - Comments
Follow Blog via Email

Enter your email address to follow this blog and receive notifications of new posts by email.

Email Address:

Join 400 other subscribers
aoristic big-data cartography census choropleth citeulike color consulting cost-benefit courses crime-mapping crime-trends Crime Analysis Criminal Justice data-manipulation data visualization deep-learning ESRI excel flow-data folium geocoding github google-streetview-api grammar of graphics group-based-trajectory gun-violence healthcare homicide-rates hot spots hypothesis-testing linear programming logistic-regression machine-learning MACRO mapping matplotlib meta network NetworkX officer-involved-shooting open-science paper Papers peer-review Poisson prediction Predictive-Policing preprint presentation Python Python-programability pytorch quasi-experiment r recidivism regression resources scholarly scraping seaborn shootings simulation small-multiples social-media social-networking SPSS stackexchange Stata statistics survey time-series uncertainty wdd web-scraping
Top Posts & Pages
Stack Exchange

All posts in category Crime Analysis

Detectives and Criminal Investigators

Community Health Workers

Writers and Authors

Eligibility Interviewers, Government Programs

Cartographers and Photogrammetrists

Geoscientists, Except Hydrologists and Geographers

Crime De-Coder Work

At the Day Gig

AI Predictions

Running in the REPL

Exécution dans le REPL

References

How criminology research typically goes

Well what have you built Andy?

Examples of innovations I am excited about

Where to next?

Recent Posts

Categories

Site RSS Feeds

Follow Blog via Email

Top Posts & Pages

Stack Exchange