Gathering interest in tech courses

Quick post this morning — I have a survey up gathering input on interest in short, technical courses.

Think 2-3 days, potentially in person/synchronous.

If you have taken a course with Paul Allison at Horizon’s, or an ICPSR summer course, those are similar examples. But, the main difference will be these courses are to prepare you for pursuing private sector roles.

These will be aimed at:

  • grad level social science students
  • current professors looking to pursue private sector roles
  • current data analysts looking to get into data science
  • undergrads with some more technical background

Survey lists potential courses (python for data analysis, intro to LLM APIs, SQL + Dashboards, using agent based tools for analysis), the course medium (in person vs video), price points.

If you are a university or organization interested in hosting such sessions for your students, let me know as well. Happy to chat to you about bringing this to your campus.

Job Advice Resources page

Minor update, I have created a page, Job Advice Resources to cumulatively list all the materials I have written on advice for social scientists and crime analysts looking to pivot into private sector tech roles.

I still get maybe ~2 folks a month ask for advice, and I am always happy to chat. I wish PhD granting institutions took this more seriously (it only takes minor changes to better prepare students).

If you are an administrator of a PhD program and actually care about getting your students jobs, also feel free to reach out and I am happy to discuss how I can help.

The race to the bottom with AI tools

What we are seeing in the AI startup space is a perfect example of the “no moat” problem: if your core product is essentially just clever prompt engineering wrapped around someone else’s frontier model, it is trivially easy for a competitor to reverse-engineer your workflow and undercut your price. Over the last few months, this lack of a defensible moat has triggered a rapid race to the bottom in automated peer review, moving from expensive managed services to open-source “bring your own key” (BYOK) scripts.

Here I am going to look at three tools specifically designed to review academic papers: Refine, IsItCredible, and Coarse.

Overview of the Tools

Refine: Refine positions itself as a premium, rigorous option for institutions, boasting testimonials from Ivy League professors and a high price point of $49.99 per review. It uses what it calls “massive parallel compute” to make hundreds of LLM calls to stress-test every line of a document.

IsItCredible: Built on the open-source Reviewer 2 pipeline, IsItCredible offers a standardized, pay-per-use middle ground with core reports starting at $5. It employs a clever “adversarial” architecture where “Red Team” agents try to find flaws and a “Blue Team” verifies them to prevent hallucinations.

Coarse: Coarse represents the logical endpoint of this race as an open-source “Bring Your Own Key” (BYOK) tool that lets you run complex multi-agent reviews locally or via OpenRouter. Because users pay the API costs directly instead of a markup, a comprehensive paper review is significantly cheaper.

The “LLM as a Judge” Problem

The hardest part of all this is evaluation. How do you know if the AI reviewer is actually good?

Refine relies almost entirely on anecdotal evidence. Their own FAQ essentially tells you to just try it and see the difference for yourself, claiming that general-purpose chatbots cannot match their depth even with expert prompting. This “try it yourself” approach is effective for marketing, but it isn’t a hard benchmark.

IsItCredible and Coarse are trying to be more systematic. The IsItCredible team released a paper, Yell at It: Prompt Engineering for Automated Peer Review, where they benchmarked their tool against five alternatives. They claim 15 wins out of 20 pairings. Similarly, Coarse claims to have been “blind-evaluated” against Refine and Reviewer 2, scoring higher on coverage and specificity.

However, we are still largely in the “LLM as a judge” era. These benchmarks often use another LLM to decide which review is better. It is circular logic. Until we have a “Ground Truth” dataset of known mathematical errors or logical fallacies in published papers, we are just measuring which AI writes the most convincing-sounding critique.

Because evaluation is so difficult, this software category risks becoming a classic market for lemons. It is incredibly difficult to identify substantive differences in quality between these tools without some external, hard benchmark. To truly evaluate if Refine’s expensive managed service is meaningfully better than Coarse’s open-source BYOK run, you have to verify the AI’s claims. But verifying those claims requires spending just as much time reading and reviewing the original paper as you would have spent just doing the review yourself from scratch. Without transparent benchmarks, users cannot easily distinguish high-quality rigorous analysis from convincing hallucinations, driving the market toward the cheapest option by default.

For those building AI tools, this entire space serves as a warning about the race to the bottom. I have previously written about deep research tools as another example of this phenomenon. If your only value proposition is a well-orchestrated prompt chain, open-source alternatives will inevitably compress your margins to zero. Eventually, the native GUI interfaces of the frontier models themselves may just become good enough that your specialized service isn’t even needed.

Meta

Did you like this post? Guess what, it was entirely generated via the Google’s API models (specifically the gemini cli). I have saved the chat session and log for how long it took here. You can see for yourself, I had a broad idea, asked it to review different materials, and then generate a post. I then iterated 25 minutes from start to finish in total.

The original post also is not flagged by Pangram as AI generated.

It definitely is not 100% my style (and to be clear this meta section is 100% hand written). The final paragraph about deep research tools I also struggled to get the model to say what I wanted – I wanted it to say “deep research tools are another example where this same situation will occur”. I am keeping the original 100% AI generated post for posterity though for folks to see what is possible with the current tools.

Policing Scholars should join ASEBP

Cross-posted on my Crime De-Coder blog.

I will be giving a talk at the upcoming American Society of Evidence Based Policing (ASEBP) conference (registration link here, May 20th-22nd in DC). My talk is How long to conduct your experiment? Check it out Thursday morning – I specifically asked for one of the short talks; 15 minutes is plenty to get the gist.

ASEBP Conference Flyer, 2026 in DC

I will be sharing a web-app to go with the talk soon (you can see my WDD tool and this blog post for background), but wanted to write a more general post about why researchers (as well as police officers who are interested in professionalization of the field) should join ASEBP.

To start, I have been involved in various ways with ASEBP for several years now, but I do not have any financial ties to ASEBP. I currently volunteer on the committee that reviews conference talks.

ASEBP is clearly the best organization for policing scholars currently in the country. The other main criminological societies (the American Society of Criminology and the Academy of Criminal Justice Sciences) are operating much as they did 30 years ago. Mostly they only exist to run journals and have a yearly conference where anyone can give a talk. They are incredibly insular, and have basically zero input from practitioners.

You can go and just look at the talks for ASC and ACJS – they are basically irrelevant to the vast majority of criminal justice operations (not only in policing, but in the CJ field as a whole). You can go look at the talks for the ASEBP conference and see they have a much clearer focus on realistic topics police departments are interested in, but presented by legitimate researchers and practitioners.

For scholars, I have developed working relationships with departments through multiple police practitioners I have met through ASEBP – and I hope to make more!

ASEBP was started by Renee Mitchell with a clear goal in mind – Renee is really the modern-day version of August Vollmer. ASEBP is intended to be a rigorous (unlike ASC, which allows almost anyone to present) conference and organization (ASEBP has training opportunities as well) to advance the use of evidence in policing operations.

If you think “I am not a policing researcher”, but have anything to do at all with criminal justice, feel free to get in touch. (Crime analysts should definitely join.) I have ideas to expand the organization – nothing equivalent currently exists in other parts of the criminal justice system as well. Being evidence-based is really the core of what Renee and everyone else is building.

If you are going to the conference and want to meet up, feel free to send me an email, andrew.wheeler@crimede-coder.com, and I will find a time to get a coffee while we are in DC.

Year in Review 2025 and AI Predictions

For a brief year in review, total views for the two different websites have decreased in the past year. For this blog, I am going to be a few thousand shy of 100,000 views. (2023 I had over 150k views, and 2024 I had over 140k views.) For the Crime De-Coder site, I am going to only get around 15k views.

Part of it is I posted less, this will be the 21st blog post this year on the personal blog (2023 had 46 and 2024 had 32 posts). The Crime De-Coder site had 12 blog posts, so pretty consistent with the prior year. Both are pretty bursty, with large bouts of traffic coming from if I post something to Hacker News I can get 1k to 10k views in a day or two if it makes it to the front page. So the 2024 stats for the crime de-coder was a few of those Hacker News bumps I did not get in 2025.

Some of it could legitimately be traditional Google search being usurped by the gen AI tools. This is the first year I had appreciable referrals from chatgpt, but they are less than 1000. The other tools are trivial amount of referrals. If I worried about SEO more, I would have more updating/regular content (as old pages are devalued quite a bit by google, and it seems to be getting more severe over time).

I have upped my use of the free tools quite a bit. ChatGPT knows me pretty well, and I use Claude Desktop almost every day as well.

An IAM policy scroll is more of a nightmare, and I definitely ask more python questions than R, but the cartoon desk is pretty close to spot on. I am close to paying for Anthropic subscription for Claude code credits (currently use pay as I go via Bedrock, and this is the first month I went over $20).

What pages on the blog are popular I can never be sure of. My most popular post last year was Downloading Police Employment Trends from the FBI Data Explorer. A 2023 post, that had random times where it would have several hundred visits in a short hour span. (Some bot collecting sites? I do not know.) If it is actual people, you would want to check out my Sworn Dashboard site, where you can look at trends for PDs much easier than downloading all the data yourself!

One thing that has grown though, I do short form posting on LinkedIn on my crime de-coder page. Impressions total for the year is over 340k (see the graph), and I currently am a few shy of 4400 followers.

LinkedIn is nice because it can be slightly longer form than the other social media sites. I would suggest you follow me there (in addition to signing up for RSS feeds for the two sites). That is the easiest way to follow my work.

I also took over as a moderator of the Crime Analysis Reddit forum, it is better than the IACA forums in my opinion, so encourage folks to post there for crime analysis questions.

Crime De-Coder Work

Crime De-Coder work has been steady (but not increasing). Similar to last year had several consulting gigs conducting crime analysis for premises liability cases (and one other case I may share my opinions once it is over), and doing some small projects with non-profits and police departments.

One big project was a python training in Austin.

The Python Book (which I also translated to Spanish/French), had a trickle of new sales. 2024 had around 100 sales and 2025 had around 50 sales. It is close to 2/3 print sales and 1/3 epub, so definately folks should have physical prints if you are selling books still.

Doing trainings basically makes writing the book worth it, but I do hope eventually the book makes it way into grad school curriculum’s. (Only one course so far.) I have pitched to grad schools to have me run a similar bootcamp to what I do for crime analysts, so if interested let me know.

The biggest new thing was Crime De-Coder got an Arnold Grant. Working with Denver PD on an experiment to evaluate a chronic offender initiative.

At the Day Gig

At my day gig, I was officially promoted to a senior manager and then quickly to a director position. Hence you get posts like what to show in your tech resume and notes on project management.

One of the reasons I am big on python – it is the dominant programming language in data science. It is hard for me to recruit from my network, as majority of individuals just know a little R (if you were a hard core R person, had packages/well executed public repo’s, I could more easily think you will be able to migrate to python to work on my team).

So learn python if you want to be a data scientist is my advice (and see other job market advice at my archived newsletter).

AI Predictions

At the day gig, my work went from 100% traditional supervised machine learning models to more like 50/50 traditional vs generative AI applications. The genAI hype is real, but I think it is worthwhile putting my thoughts to paper.

The biggest question is will AI take all of our jobs? I think a more likely end scenario is the AI tools just become better at helping humans do tasks. The leap from helping a human do something faster vs an AI tool doing it 100% on its own with 0 human input is hard. The models are getting incrementally better, but I think to fully replace people in a substantive way will require another big advancement in fundamental capabilities. Making a human 10x more productive is easier and still will make the AI companies a ton of money.

Sometimes people view the 10x idea and say that will take jobs, just not 100% of jobs. That is a view though that there is only a finite amount of work to be done. That assumption is clearly not true, and being able to do work faster/cheaper just induces demand for more potential work. The example with calculators making more banking jobs, not less, is basically the same example.

One of the critiques of the current systems is they are overvalued, so we are in a bubble. I do not remember where I read it, but one estimate was if everyone in the US spent $1 a day on the different AI tools, that would justify the current valuations for OpenAI, Anthropic, NVIDIA, etc. I think that is totally doable, we spend a few thousand a workday at Gainwell on the foundation models for example for a few projects, and we are just going to continue to roll out more and more. Gainwell is a company with around 6k employees for reference, and our current AI applications touch way less than 1k of those employees. We have plenty of room to grow those applications.

It is super hard though to build systems to help people do things faster. And we are talking like “this thing that used to take 30 minutes now takes 15 minutes”. If you have 100 people doing that thing all the time though, the costs of the models are low enough it is an easy win.

And this mostly only holds true for knowledge economy work that can be all done via software. There just still needs to be fundamental improvements to robotics to be able to do physical things. The tailor’s job is safe for the foreseeable future.

The change in the data science landscape to more generative AI applications definitely requires social scientists and analysts to up their game though to learn a new set of tools. I do have another book in the works to address that, so hopefully you will see that early next year.

What to show in your tech resume?

Jason Brinkley on LinkedIn the other day had a comment on the common look of resumes – I disagree with his point in part but it is worth a blog post to say why:

So first, when giving advice I try to be clear about what I think are just my idiosyncratic positions vs advice that I feel is likely to generalize. So when I say, you should apply to many positions, because your probability of landing a single position is small, that is quite general advice. But here, I have personal opinions about what I want to see in a resume, but I do not really know what others want to see. Resumes, when cold applying, probably have to go through at least two layers (HR/recruiter and the hiring manager), who each will need different things.

People who have different colored resumes, or in different formats (sometimes have a sidebar) I do not remember at all. I only care about the content. So what do I want to see in your resume? (I am interviewing for mostly data scientist positions.) I want to see some type of external verification you actually know how to code. Talk is cheap, it is easy to list “I know these 20 python libraries” or “I saved our company 1 million buckaroos”.

So things I personally like seeing in a resume are:

  • code on github that is not a homework assignment (it is OK if unfinished)
  • technical blog posts
  • your thesis! (or other papers you were first/solo author)

Very few people have these things, so if you do and you land in my stack, you are already at the like 95th percentile (if not higher) for resumes I review for jobs.

The reason having outside verification you actually know what you are doing is because people are liars. For our tech round, our first question is “write a python hello world program and execute it from the command line” – around half of the people we interview fail this test. These are all people who list they are experts in machine learning, large language models, years of experience in python, etc.

My resume is excessive, but I try to practice what I preach (HTML version, PDF version)

I added some color, but have had recruiters ask me to take it off the resume before. So how many people actually click all those links when I apply to positions? Probably few if any – but that is personally what I want to see.

There are really only two pieces of advice I have seen repeatedly about resumes that I think are reasonable, but it is advice not a hard rule:

  • I have had recruiters ask for specific libraries/technologies at the top of the resume
  • Many people want to hear about results for project experience, not “I used library X”

So while I dislike the glut of people listing 20 libraries, I understand it from the point of a recruiter – they have no clue, so are just trying to match the tech skills as best they can. (The matching at this stage I feel may be worse than random, in that liars are incentivized, hence my insistence on showing actual skills in some capacity.) It is infuriating when you have a recruiter not understand some idiosyncratic piece of tech is totally exchangeable with what you did, or that it is trivial to learn on the job given your prior experience, but that is not going to go away anytime soon.

I’d note at Gainwell we have no ATS or HR filtering like this (the only filtering is for geographic location and citizenship status). I actually would rather see technical blog posts or personal github code than saying “I saved the company 1 million dollars” in many circumstances, as that is just as likely to be embellished as the technical skills. Less technical hiring managers though it is probably a good idea to translate technical specs to more plain business implications though.

Recommend reading The Idea Factory, Docker python tips

A friend recently recommended The Idea Factory: Bell Labs and the Great Age of American Innovation by Jon Gertner. It is one of the best books I have read in awhile, so also want to recommend to the readers of my blog.

I was vaguely familiar with Bell Labs given my interest in stats and computer science. John Tukey makes a few honorable mentions, but Claude Shannon is a central character of the book. What I did not realize is that almost all of modern computing can be traced back to innovations that were developed at Bell Labs. For a sample, these include:

  • the transistor
  • fiber optic cables (I did not even know, fiber is very thin strands of glass)
  • the cellular network with smaller towers
  • satellite communication

And then you get smattering of different discussions as well, such as the material science that goes into making underwater cables durable and shark resistant.

The backstory was that AT&T in the early 20th century had a monopoly on landline telephones. Similar now to how most states have a single electric provider – they were a private company but blessed by the government to have that monopoly. AT&T intentionally had a massive research arm that they used to improve communications, but also they provided that research back into the public coffers. Shannon was a pure mathematician, he was not under the gun to produce revenue.

Gertner basically goes through a series of characters that were instrumental in developing some of these ideas, and in creating and managing Bell Labs itself. It is a high level recounting of Gertner mostly from historical notebooks. One of the things I really want to understand is how institutions even tackle a project that lasts a decade – things I have been involved in at work that last a year are just dreadful due to transaction costs between so many groups. I can’t even imagine trying to keep on schedule for something so massive. So I do not get that level of detail from the book, just moreso someone had an idea, developed a little tinker proof of concept, and then Bell Labs sunk a decade an a small army of engineers to figure out how to build it in an economical way.

This is not a critique of Gertner (his writing is wonderful, and really gives flavor to the characters). Maybe just sinking an army of engineers on a problem is the only reasonable answer to my question.

Most of the innovation in my field, criminal justice, is coming from the private sector. I wonder (or maybe hope and dream is a better description) if a company, like Axon, could build something like that for our field.


Part of the point for writing blog posts is that I do the same tasks over and over again. Having a nerd journal is convenient to reference.

One of the things that I do not have to commonly do, but it seems like once a year at my gig, I need to putz around with Docker containers. For note for myself, when building python apps, to get the correct caching you want to install the libraries first, and then copy the app over.

So if you do this:

FROM python:3.11-slim
COPY . /app
RUN pip install --no-cache-dir -r /app/requirements.txt
CMD ["python", "main.py"]

Everytime you change a single line of code, you need to re-install all of the libraries. This is painful. (For folks who like uv, this does not solve the problem, as you still need to download the libraries everytime in this approach.)

A better workflow then is to copy over the single requirements.txt file (or .toml, whatever), install that, and then copy over your application.

FROM python:3.11-slim
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
CMD ["python", "main.py"]

So now, only when I change the requirements.txt file will I need to redo that layer.

Now I am a terrible person to ask about dev builds and testing code in this set up. I doubt I am doing things the way I should be. But most of the time I am just building this.

docker build -t py_app .

And then I will have logic in main.py (or swap out with a test.py) that logs whatever I need to the screen. Then you can either do:

docker run --rm py_app

Or if you want to bash into the container, you can do:

docker run -it --rm py_app bash

Then from in the container you can go into the python REPL, edit a file using vim if you need to, etc.

Part of the reason I want data scientists to be full stack is because at work, if I need another team to help me build and test code, it basically adds 3 months at a minimum to my project. Probably one of the most complicated things myself and team have done at the day job is figure out the correct magical incantations to properly build ODBC connections to various databases in Docker containers. If you can learn about boosted models, you can learn how to build Docker containers.

Deep research and open access

Most of the major LLM chatbot vendors are now offering a tool called deep research. These tools basically just scour the web given a question, and return a report. For academics conducting literature reviews, the parallel is obvious. We just tend to limit the review to peer reviewed research.

I started with testing out Google’s Gemini service. Using that, I noticed almost all of the sources cited were public materials. So I did a little test with a few prompts across the different tools. Below are some examples of those:

  • Google Gemini question on measuring stress in police officers (PDF, I cannot share this chat link it appears)
  • OpenAI Effectiveness of Gunshot detection (PDF, link to chat)
  • Perplexity convenience sample (PDF, Perplexity was one conversation)
  • Perplexity survey measures attitudes towards police (PDF, see chat link above)

The report on officer mental health measures was an area I was wholly unfamiliar. The other tests are areas where I am quite familiar, so I could evaluate how well I thought each tool did. OpenAI’s tool is the most irksome to work with, citations work out of the box for Google and Perplexity, but not with ChatGPT. I had to ask it to reformat things several times. Claude’s tool has no test here, as to use its deep research tool you need a paid account.

Offhand each of the tools did a passable job of reviewing the literature and writing reasonable summaries. I could nitpick things in both the Perplexity and the ChatGPT results, but overall they are good tools I would recommend people become familiar with. ChatGPT was more concise and more on-point. Perplexity got the right answer for the convenience sample question (use post-stratification), but also pulled in a large literature on propensity score matching (which is only relevant for X causes Y type questions, not overall distribution of Y). Again this is nit-picking for less than 5 minutes of work.

Overall these will not magically take over writing your literature review, but are useful (the same way that doing simpler searches in google scholar is useful). The issue with hallucinating citations is mostly solved (see the exception for ChatGPT here). You should consult the original sources and treat deep research reports like on-demand Wikipedia pages, but lets not kid ourselves – most people will not be that thorough.

For the Gemini report on officer mental health, I went through quickly and broke down the 77 citations across the publication type or whether the sources were in HTML or PDF. (Likely some errors here, I went by the text for the most part.) For the HTML vs PDF, 59 out of 77 (76%) are HTML web-sources. Here is the breakdown for my ad-hoc categories for types of publications:

  • Peer Review (open) – 39 (50%)
  • Peer review (just abstract) 10 (13% – these are all ResearchGate)
  • Open Reports 23 (30%)
  • Web pages 5 (6%)

For a quick rundown of these. Peer reviewed should be obvious, but sometimes the different tools cite papers that are not open access. In these cases, they are just using the abstract to madlib how Deep Research fills in its report. (I consider ResearchGate articles here as just abstract, they are a mix of really available, but you need to click a link to get to the PDF in those cases. Google is not indexing those PDFs behind a wall, but the abstract.) Open reports I reserve for think tank or other government groups. Web pages I reserve for blogs or private sector white papers.

I’d note as well that even though it does cite many peer review here, many of these are quite low quality (stuff in MDPI, or other what look to me pay to publish locations). Basically none of the citations are in major criminology journals! As I am not as familiar with this area this may be reasonable though, I don’t know if this material is often in different policing journals or Criminal Justice and Behavior and just not being picked up at all, or if that lit in those places just does not exist. I have a feeling it is missing a few of the traditional crim journal sources though (and picks up a few sources in different languages).

The OpenAI report largely hallucinated references in the final report it built (something that Gemini and Perplexity currently do not do). The references it made up were often portmanteaus of different papers. Of the 12 references it provided, 3 were supposedly peer reviewed articles. You can in the ChatGPT chat go and see the actual web-sources it used (actual links, not hallucinated). Of the 32 web links, here is the breakdown:

  • Pubmed 9
  • The Trace 5
  • Kansas City local news station website 4
  • Eric Piza’s wordpress website 3
  • Govtech website 3
  • NIJ 2

There are single links then to two different journals, and one to the Police Chief magazine. I’d note Eric’s site is not that old (first RSS feed started in February 2023), so Eric making a website where he simple shares his peer reviewed work greatly increased his exposure. His webpage in ChatGPT is more influential than NIJ and peer reviewed CJ journals combined.

I did not do the work to go through the Perplexity citations. But in large part they appear to me quite similar to Gemini on their face. They do cite pure PDF documents more often than I expected, but still we are talking about 24% in the Gemini example are PDFs.

The long story short advice here is that you should post your preprints or postprints publicly, preferably in HTML format. For criminologists, you should do this currently on CrimRXiv. In addition to this, just make a free webpage and post overviews of your work.

These tests were just simple prompts as well. I bet you could steer the tool to give better sources with some additional prompting, like “look at this specific journal”. (Design idea if anyone from Perplexity is listening, allow someone to be able to whitelist sources to specific domains.)


Other random pro-tip for using Gemini chats. They do not print well, and if they have quite a bit of markdown and/or mathematics, they do not convert to a google document very well. What I did in those circumstances was to do a bit of javascript hacking. So go into your dev console (in Chrome right click on the page and select “Inspect”, then in the new page that opens up go to the “Console” tab). And depending on the chat browser currently opened, can try entering this javascript:

// Example printing out Google Gemini Chat
var res = document.getElementsByTagName("extended-response-panel")[0];
var report = res.getElementsByTagName("message-content")[0];
var body = document.getElementsByTagName("body")[0];
let escapeHTMLPolicy = trustedTypes.createPolicy("escapeHTML", {
 createHTML: (string) => string
});
body.innerHTML = escapeHTMLPolicy.createHTML(report.innerHTML);
// Now you can go back to page, cannot scroll but
// Ctrl+P prints out nicely

Or this works for me when revisiting the page:

var report = document.getElementById("extended-response-message-content");
var body = document.getElementsByTagName("body")[0];
let escapeHTMLPolicy = trustedTypes.createPolicy("escapeHTML", {
 createHTML: (string) => string
});
body.innerHTML = escapeHTMLPolicy.createHTML(report.innerHTML);

This page scrolling does not work, but Ctrl + P to print the page does.

The idea behind this, I want to just get the report content, which ends up being hidden away in a mess of div tags, promoted out to the body of the page. This will likely break in the near future as well, but you just need to figure out the correct way to get the report content.

Here is an example of using Gemini’s Deep Research to help me make a practice study guide for my sons calculus course as an example.

Follow up on Reluctant Criminologists critique of build stuff

Jon Brauer and Jake Day recently wrote a response to my build stuff post, Should more criminologists build stuff on their Reluctant Criminologists blog. Go ahead and follow Jon’s and Jake’s thoughtful work. They asked for comment before posting – I mainly wanted to post my response to be more specific about “how much” I think criminologists should build stuff. (Their initial draft said I should “abandon” theoretical research, which is not what I meant (and they took out before publishing), but could see how one could be confused by my statement “emphasis should he flipped” after saying near 0% of work now is building stuff.)

So here is my response to their post:

To start I did not say abandon theoretical research, I said “the emphasis be on doing”. So that is a relative argument, not an absolute. It is fine to do theoretical work, and it is fine to do ex-ante policy evaluations (which should be integrated into the process of building things, seeing if something works well enough to justify its expense I would say is a risky test). I do not have a bright line that I think building stuff is optimal for the field, but it should be much more common than it is now (which is close to 0%). To be specific on my personal opinion, I do think “build stuff” should be 50%+ of research criminologists’ time (relative to writing papers).

I am actually more concerned with the larping idea I gave. You have a large number of papers in criminology that justify their motivation not really as theoretical, but as practical to operations. And they are just not even close. So let’s go with the example of precise empirical distributions of burglaries at the neighborhood level. (It is an area I am familiar with, and there are many “I have a new model for crime in space” papers.) Pretend I did have a forecast, and I said there are going to be 10 burglaries in your neighborhood next month. What exactly are you supposed to do with that information? Forecasting the distribution does not intrinsically make it obvious how to prevent crime (nature does not owe us things we can manipulate). Most academic criminologists would say it is useful for police allocation, which is so vague as to be worthless.

You also do not need a fully specified causal chain to prevent crime. Most of the advancement in crime prevention looks more like CPTED applications than understanding “root causes” (which that phrase I think is a good example of an imprecise theory). I would much rather academics try to build specific CPTED applications than write another regression paper on crime and space (even if it is precise).

For the dark side part, in the counterfactual world in which academics don’t focus on direct applications, it does not mean those applications do not get built. They just get built by people who are not criminologists. It was actually the main reason I wrote the post – folks are building things now that should have more thoughtful input from academic criminologists.

For a specific example, different tech companies are demo’ing products with the ultimate goal of improving police officers mental health. These include flagging if officers go to certain types of calls too often, or using a chatbot as a therapist. Real things I would like criminologists like yourselves being involved in product development, so you can say “How do we know this is actually improving the mental health of officers?”. I vehemently disagree that more academic criminologists being involved will make development of these applications worse.

The final part I want to say is that apps need not intrinsically be focused on anything. I gave examples in policing that I am aware of, because that is my stronger area of expertise, but it can be anything. So let’s go with personal risk assessments. Pre-trial, parole/probation risk assessments look very similar to what Burgess built 100 years ago at this point. So risk stratification is built on the idea that you need to triage resources (some people need more oversight, some less), especially for the parole scenario. Now it is certainly feasible someone comes up with a better technological solution that risk stratification is not needed at all (say better sensors or security systems that obviate the need for the more intensive human oversight). Or a more effective regimen that applies to everyone, say better dynamic risk assessments, so people are funneled faster to more appropriate treatment regimes than just having a parole officer pop in from time to time.

I give this last example because I think it is a an area where focusing on real applications I suspect will be more fruitful long term for theory development. So we have 100 years and thousands of papers on risk assessment, but really only very incremental progress in that area. I believe a stronger focus on actual application – thinking about dynamic measures to accomplish specific goals (like the treatment monitoring and assignment), is likely to be more fruitful than trying to pontificate about some new theory of man that maybe later can be helpful.

We don’t have an atom to reduce observations down to (nor do we have an isolated root node in a causal diagram). We are not going to look hard enough and eventually find Laplace’s Demon. Focusing on a real life application, how people are going to use the information in practice, I think is a better way for everyone to frame their scientific pursuits. It is more likely a particular application changes how we think about the problem all together, and then we mold the way we measure to help accomplish that specific task. Einstein just started with the question “how do we measure how fast things travel when everything is moving”, a very specific question. He did not start out by saying “I want a theory of the universe”.

I am more bullish on real theoretical breakthroughs coming from more mundane and practical questions like “how do we tell if a treatment is working” or “how do we know if an officer is having a mental health crisis” than I am about someone coming up with a grander theory of whatever just from reading peer reviewed papers in their tower.

And here is Jon’s response to that:

Like you, we try to be optimistic, encouraging, and constructive in tone, though at times it requires serious effort to keep cynicism at bay. In general, if we had more Andrew Wheeler’s thoughtfully building things and then evaluating them, then I agree this would be a good thing. Yet, if I don’t trust someone enough to meaningfully observe, record, and analyze the gauges, then I’m certainly not going to trust them to pilot – or to understand well enough to successfully build and improve upon the car/airplane/spaceship. Meanwhile, the normative analysis is that everything is significant/everything works – unless it’s stuff we collectively don’t like. In that context, the cynic in me things we are better off of we simply focus on teaching many (most?) social scientists to observe and analyze better – and may even do less harm despite wasted resources by letting them larp.

Jon and Jake do not have a an estimate in their post on what they think the mix should be building vs theorizing (they say pluralist in the post). I think the near 0 we do now is not good.

Much of this back and forth tends to mirror the current critique of advocacy in science. The Charles Tittle piece they cite, The arrogance of public sociology, could have been written yesterday.

Both the RC group and Tittle’s have what I would consider a perfect enemy of the good argument going on. People can do bad work, people can do good work. I want folks to go out and do good, meaningful work. I have met plenty of criminologists (and the flipside the level of competence of many software engineers) to not have RC’s level of cynicism.

As an individual, I don’t think it makes much sense to worry about the perception of the field as a whole. I cannot control my fellow criminologists, I can only control what I personally do. Tittle in his critique thought public sociology would erode any legitimacy of the field. He maybe was right, but I posit producing mostly irrelevant work will put criminology on the same path.

Build Stuff

I have had this thought in my head for a while – criminology research to me is almost all boring. Most of the recent advancement in academia is focused on making science more rigorous – more open methods, more experiments, stronger quasi-experimental designs. These are all good things, but to me still do not fundamentally change the practical implementation of our work.

Criminology research is myopically focused on learning something – I think this should be flipped, and the emphasis be on doing something. We should be building things to improve the crime and justice system.

How criminology research typically goes

Here is a screenshot of the recent articles published in the Journal of Quantitative Criminology. I think this is a pretty good cross-section of high-quality, well-respected research in criminology.

Three of the four articles are clearly ex-ante evaluations of different (pretty normal) policies/behavior by police and their subsequent downstream effects on crime and safety. They are all good papers, and knowing how effective a particular policy works (like stop and frisk, or firearm seizures) are good! But they are the literal example where the term ivory tower comes from – these are things happening in the world, and academics passively observe and say how well they are working. None of the academics in those papers were directly involved in any boots on the ground application – they were things normal operations the police agencies in question were doing on their own.

Imagine someone said “I want to improve the criminal justice system”, and then “to accomplish this, I am going to passively observe what other people do, and tell them if it is effective or not”. This is almost 100% of what academics in criminology do.

The article on illicit supply chains is another one that bothers me – it is sneaky in the respect that many academics would say “ooh that is interesting and should be helpful” given its novelty. I challenge anyone to give a concrete example of how the findings in the article can be directly useful in any law enforcement context. Not hypothetical, “can be useful in targeting someone for investigation”, like literal “this specific group can do specific X to accomplish specific Y”. We have plenty of real problems with illicit supply chains – drug smuggling in and out of the US (recommend the contraband show on Amazon, who knew many manufactures smuggle weed from US out to the UK!). Fentanyl or methamphetamine production from base materials. Retail theft groups and selling online. Plenty of real problems.

Criminology articles tend to be littered with absurdly vague accusations that they can help operations. They almost always cannot.

So we have articles that are passive evaluations of policies other people thought up. I agree this is good, but who exactly comes up with the new stuff to try out? We just have to wait around and hope other people have good ideas and take the time to try them out. And then we have theoretical articles larping as useful in practice (since other academics are the ones reviewing the papers, and no one says “erm, that is nice but makes no sense for practical day to day usage”).

Some may say this is the way science is supposed to work. My response to that is I don’t know dude, go and look at what folks are doing in the engineering or computer science or biology department. They seem to manage both theoretical and practical advancements at the same time just fine and dandy.

Well what have you built Andy?

It is a fair critique if you say “most of your work is boring Andy”. Most of my work is the same “see how a policy works from the ivory tower”, but a few are more “build stuff”. Examples of those include:

In the above examples, the one that I know has gotten the most traction are simple rules to identify crime spikes. I know because I have spent time demonstrating that work to various crime analysts across the country, and so many have told me “I use your Poisson Z-score Andy”. (A few have used the patrol area work as well, so I should be in the negative for carbon generation.)

Papers are not what matter though – papers are a distraction. The applications are what matter. The biggest waste currently in academic criminology work is peer reviewed papers. Our priorities as academics are totally backwards. We are evaluated on whether we get a paper published, we should be evaluated on whether we make the world a better place. Papers by themselves do not make the world a better place.

Instead of writing about things other people are doing and whether they work, we should spend more of our time trying to create things that improve the criminal justice system.

Some traditional academics may not agree with this – science is about formulating and testing hypotheses. This need not be in conflict with doing stuff. Have a theory about human nature, what better way to prove the theory than building something to attempt to change things for the better according to your theory. If it works in real life to accomplish things people care about guess what – other people will want to do it. You may even be able to sell it.

Examples of innovations I am excited about

Part of what prompted this was I was talking to a friend, and basically none of the things we were excited about have come from academic criminologists. I think a good exemplar of what I mean here is Anthony Tassone, the head of Truleo. To be clear, this is not a dig but a compliment, following some of Anthony’s posts on social media (LinkedIn, X), he is not a Rhodes Scholar. He is just some dude, building stuff for criminal justice agencies mostly using the recent advancements in LLMs.

For a few other examples of products I am excited about how they can improve criminal justice (I have no affiliations with these beyond I talk to people). Polis for evaluating body worn camera feeds. Dan Tatenko for CaseX is building an automated online crime reporting system that is much simpler to use. The folks at Carbyne (for 911 calls) are also doing some cool stuff. Matt White at Multitude Insights is building a SaaS app to better distribute BOLOs.

The folks at Polis (Brian Lande and Jon Wender) are the only two people in this list that have anything remotely to do with academic criminology. They each have PhDs (Brian in sociology and Jon in criminology). Although they were not tenure track professors, they are former/current police officers with PhDs. Dan at CaseX was a detective not that long ago. The folks at Carbyne I believe are have tech backgrounds. Matt has a military background, but pursued his start up after doing an MBA.

The reason I bring up Anthony Tassone is because when we as criminologists say we are going to passively evaluate what other people are doing, we are saying “we will just let tech people like Anthony make decisions on what real practitioners of criminal justice pursue”. Again not a dig on Anthony – it is a good thing for people to build cool stuff and see if there is a market. My point is that if Anthony can do it, why not academic criminologists?

Rick Smith at Axon is another example. While Axon really got its dominate market due to conducted energy devices and then body worn cameras (so hardware), quite a bit of the current innovation at Axon is software. And Rick did not have a background in hardware engineering either, he just had an idea and built it.

Transferring over into professional software engineering since 2020, let me tell my fellow academics, you too can write software. It is more about having a good idea that actually impacts practice.

Where to next?

Since the day gig (working on fraud-waste-abuse in Medicaid claims) pays the bills, most of my build stuff is now focused on that. The technical skills to learn software engineering are currently not effectively taught in Criminal Justice PhD programs, but they could be. Writing a dissertation is way harder than learning to code.

While my python book has a major focus on data analysis, it is really the same skills to jump to more general software engineering. (I specifically wrote the book to cover more software engineering topics, like writing functions and managing environments, as most of the other python data science books lack that material.)

Skills gap is only part of the issue though. The second is supporting work that pursues building stuff. It is really just norms in the current academe that stop this from occurring now. People value papers, NIJ (at least used to) mostly fund very boring incremental work.

I discussed start ups (people dreaming and building their own stuff) and other larger established orgs (like Axon). Academics are in a prime position to pursue their own start ups, and most Universities have some support for this (see Joel Caplan and Simsi for an example of that path). Especially for software applications, there are few barriers. It is more about time and effort spent pursuing that.

I think the more interesting path is to get more academic criminologists working directly with software companies. I will drop a specific example since I am pretty sure he will not be offended, everyone would be better off if Ian Adams worked directly for one of these companies (the companies, Ian’s take home pay, long term advancement in policing operations). Ian writes good papers – it would be better if Ian worked directly with the companies to make their tools better from the get go.

My friend I was discussing this with gave the example of Bell Labs. Software orgs could easily have professors take part time gigs with them directly, or just go work with them on sabbaticals. Axon should support something like that now.

While this post has been focused on software development, I think it could look similar for collaborating with criminal justice agencies directly. The economics will need to be slightly different (they do not have quite as much expendable capital to support academics, the ROI for private sector I think should be easily positive in the long run). But that I think that would probably be much more effective than the current grant based approach. (Just pay a professor directly to do stuff, instead of asking NIJ to indirectly support evaluation of something the police department has decided to already put into operation.)

Scientific revolutions are not happening in journal articles. They are happening by people building stuff and accomplishing things in the real world with those innovations.


For a few responses to this post, Alex sent me this (saying my characterization of Larry as passively observing is not quite accurate), which is totally reasonable:

Nice post on building/ doing things and thanks for highlighting the paper with Larry. One error however, Larry was directly involved in the doing. He was the chief science officer for the London Met police and has designed their new stop and frisk policy (and targeting areas) based directly on our work. Our work was also highlighted by the Times London as effective crime policy and also by the Chief of the London Met Police as well who said it was one of the best policy relevant papers he’s ever seen. All police are now being by trained on the new legislation on stop and search in procedurally just ways. You may not have known this background but it’s directly relevant to your post.

Larry Sherman (and David Weisburd), and their work on hot spots + direct experiments with police are really exemplars of “doing” vs “learning”. (David Kennedy and his work on focused deterrence is another good example.) In the mid 90s when Larry or David did experiments, they likely were directly involved in a way that I am suggesting – the departments are going and asking Larry “what should we do”.

My personal experience, trying to apply many of the lessons of David’s and Larry’s work (which was started around 30 years ago at this point), is not quite like that. It is more of police departments have already committed to doing something (like hotspots), and want help implementing the project, and maybe some grant helps fund the research. Which is hard and important work, but honestly just looks like effective project management (and departments should just invest in researchers/project managers directly, the external funding model does not make sense long term). For a more on point example of what I mean by doing, see what Rob Guerette did as an embedded criminologist with Miami PD.

Part of the reason I wrote the post, if you think about the progression of policing, we have phases – August Vollmer for professionalization in the early 1900’s. I think you could say folks like Larry and David (and Bill Bratton) brought about a new age of metrics to PDs in the 90s.

There are also technology changes that fundamentally impact PDs. Cars + 911 is one. The most recent one is a new type of oversight via body worn cameras. Folks who are leading this wave of professionalization changes are tech folks (like Rick Smith and Anthony Tassone). I think it is a mistake to just sit on the sidelines and see what these folks come up with – I want academic criminologists to be directly involved in the nitty gritty of the implementations of these systems and making them better.

A second response to this is that building stuff is hard, which I agree and did not mean to imply it was as easy as writing papers (it is not). Here is Anthony Tassone’s response on X:

I know this is hard. This is part of why I mentioned the Bell labs path. Working directly for an already established company is much easier/safer than doing your own startup. Bootstrapping a startup is additionally much different than doing VC go big or go home – which academics on sabbaticals and as a side hustle are potentially in a good position to do this.

Laura Huey did this path, and does not have nice things to say about it:

I have not talked to Laura specifically about this, but I suspect it is her experience running the Canadian Society of Evidence Based Policing. Which I would not suggest starting a non-profit either honestly. Even if you start a for-profit, there is no guarantee you will be in a good position in your current academic position to be well supported.

Again no doubt building useful stuff is harder than writing papers. For a counter to these though, doing my bootstrapped consulting firm is definitely not as stressful as building a large company like Anthony. And working for a tech company directly was a good career move for me (although now I spend most of my day building stuff to limit fraud-waste-abuse in Medicaid claims, not improving policing).

My suggestion that the field should be more focused on building stuff was not because it was easier, it was because if you don’t there is a good chance you are mostly irrelevant.