Notes on Valuing the Cost of Crime

AI disclosure – I used AI to write this blog post. I figure having an AI blog post is better than not writing it at all. I will always disclose though if I use AI to heavily write any content on this blog. (I use it for minor copy editing all the time.)

For the tech details, I used gemini flash 3.5 with medium reasoning in the Antigravity IDE, using the same advice I said in this blog post. (Minor preference to Claude Code for writing blog posts for those who care.) It is the outline of the thread I did on X (which I wrote entirely by hand). Using this approach, e.g. I give a detailed outline and prior examples, Pangram says this is only lightly AI assisted.

Notes on Valuing the Cost of Crime

We often hear eye-popping figures about the “cost of crime.” For example, that a single aggravated assault costs society $100,000, or that a statistical life is worth $10 million. But if you look under the hood of these estimates, they are built on a house of cards: Willingness-to-Pay (WTP) surveys.

WTP estimates wildly inflate the costs of crime. For realistic policy decisions and police budgeting, we should be using concrete measures that are easier to calculate and verify.

The Three Buckets of Crime Costs

To evaluate criminal justice interventions, we can break costs into three broad categories:

  • A) Cost to the individual: Personal hospital bills, lost work, and physical trauma.
  • B) Cost to public sector agencies: Police labor, court proceedings, jail/prison operations, and public healthcare programs like Medicaid.
  • C) Cost to society: Reduced business activity in high-crime areas and the loss of workers to the economy.

Most cost-of-crime estimates do not calculate these countable categories. Instead, they use survey estimates of willingness-to-pay to approximate the costs of crime to individuals. I believe WTP estimates themselves are junk and should not be used to guide operations.

The Scaling Problem of Willingness-to-Pay

If you have heard the phrase “a statistical life costs $10 million,” you are seeing a WTP estimate in action.

The scaling math is straightforward, but the resulting estimates themselves are junk. Researchers ask survey respondents questions like: “Would you pay $100 in increased taxes to fund sidewalk improvements that reduce pedestrian fatalities?” If the safety measures are estimated to reduce pedestrian deaths by 1 in 100,000 annually in a city, the math scales up simply:

100 × 100, 000 = $10, 000, 000

People are thus deemed “willing to pay” $10 million to reduce one death.

This methodology yields massive, noisy estimates. You can see these WTP metrics compiled on the RAND Cost of Crime site. The primary limitation is that survey respondents will agree to pay almost any seemingly small amount when they do not actually have to pay it. In one street lighting survey I reviewed, participants were paid $1 to participate and claimed they were willing to pay $200 on average for better streetlights. It is highly doubtful that someone who sells their time for $1 to complete a survey will actually pay $200 in taxes for streetlights. As Andrew Gelman has pointed out, valuing lives based on ability to pay reveals how detached these hypothetical exercises are from real-world resource constraints.

Countable Costs vs. Theoretical Valuations

When we rely on concrete cost estimates that can be verified—such as labor hours and medical bills—the figures are much lower.

For instance, while a WTP estimate for an aggravated assault is close to $100,000, Priscilla Hunt’s study on law enforcement costs estimates the actual police labor cost for an assault is closer to $10,000.

I cannot prove what people are hypothetically willing to pay. But I can show a police chief that reducing ten assaults in a specific sector will save $100,000 in labor and overtime.

This distinction matters for other public costs too. Serious physical assaults can easily generate six-figure medical bills. In New York, more than 70% of gun violence hospitalizations are paid for via Medicaid. While it is reasonable for state or federal governments to weigh these medical costs, a local county or police department does not bear them. It makes no sense for a local police department to justify its budget by claiming it is reducing Medicaid expenses.

Example Cost-Benefit Case Studies

When we restrict our analysis to tangible costs, how do common interventions stack up?

Hotspots Policing

Because crime is highly concentrated, we can identify specific geographic areas that generate massive public costs. I have previously written about locating Million-Dollar Hotspots in Baltimore and Dallas. In my research on redrawing hotspots, I show how spatial concentration makes 24/7 hotspots policing cost-effective based purely on offsetting tangible labor costs.

For code examples of this, check out my crimepy python library (DBSCAN with weights for cost of crime estimates).

ShotSpotter

I am much less bullish on acoustic gunshot detection systems like ShotSpotter due to their high cost, as detailed in my ShotSpotter cost-benefit analysis. I estimate that ShotSpotter saves approximately 1 life for every 100 shooting victims it covers by dispatching emergency services faster. If you value a life at $10 million using WTP, the system easily looks cost-effective. If you use tangible costs, the math changes. ShotSpotter has not shown consistent evidence that it increases case clearances or prevents victimization. In fact, saving a shooting victim via faster response generates higher medical bills than if they had died, highlighting the complex economics of reactive vs. proactive interventions.

Business Improvement Districts (BIDs)

A great example of societal cost-shifting is Business Improvement Districts (BIDs). As shown in John MacDonald and colleagues’ study on BIDs in Los Angeles, BIDs demonstrate that commercial businesses are actually willing to spend their own money to improve safety in their areas through private security, cleaning services, and physical improvements. This is not hypothetical willingness-to-pay; it is a real-world, out-of-pocket expenditure by local merchants who calculate that reducing crime is directly worth their private investment.

Gun Violence Interventions (READI)

When looking at community-based interventions, the cost-benefit models face a different hurdle. Monica Bhatt and her colleagues evaluated Chicago’s READI program in their study on predicting and preventing gun violence. They claim a massive benefit of around $180,000 per participant (translating to a 3:1 benefit-cost ratio).

However, this estimated benefit of $180,000 is derived by mixing up WTP estimates and lifetime projections of individual offending (specifically, the Cohen & Piquero lifecourse model). As I discussed in my analysis of limits on gun violence interventions, extrapolating high-risk youth crime savings over an entire lifecourse using inflated WTP values creates a benefit estimate that is completely detached from the immediate budget realities of local governments.

The Missing Metric: The Value of an Arrest

This brings us to a major gap in criminology: we do not have good estimates for what it is worth to clear a crime.

Because crime is highly concentrated among a small number of chronic offenders, an arrest is often worth more than preventing a single crime. Apprehending a chronic offender can prevent dozens of future offenses.

This is why tools like automated License Plate Readers (LPR) are interesting. As Ozer’s study on LPR effectiveness shows, they are much cheaper than ShotSpotter and are highly cost-effective even if they only generate a small percentage increase in arrests. However, to truly calculate their ROI, we need a better grasp on the actual monetary value of a clearance.

To build better policy, we need to stop relying on WTP surveys and start measuring the real, tangible savings that police departments and local governments can actually bank.

References

  • Bhatt, M. P., Heller, S. B., et al. (2024). Predicting and preventing gun violence: An experimental evaluation of READI Chicago. The Quarterly Journal of Economics, 139(1), 1-56.

  • Cohen, M. A., & Piquero, A. R. (2009). New evidence on the monetary value of saving a high risk youth. Journal of Quantitative Criminology, 25(1), 25-49.

  • Hunt, P., Saunders, J., & Kilmer, B. (2019). Estimates of law enforcement costs by crime type for benefit-cost analyses. Journal of Benefit-Cost Analysis, 10(1), 95-123.

  • MacDonald, J., Golinelli, D., Stokes, R. J., & Bluthenthal, R. (2010). The effect of business improvement districts on the incidence of violent crimes. Injury Prevention, 16(5), 327-332.

  • Ozer, M. (2016). The impact of automatic number plate recognition (ANPR) technology on crime. Police Journal, 89(2), 117-132.

  • Wheeler, A. P., & Reuter, S. (2021). Redrawing Hot Spots of Crime in Dallas, Texas. Police Quarterly, 24(2), 159-184.

How long to conduct your experiment: Talk at ASEBP

Upcoming at the American Society of Evidence Based Policing Conference, I have a talk Thursday morning (9:45-10:00), How long to conduct your experiment.

The talk goes over some of the simple metrics I have created to help plan how long to conduct your intervention. Such as how long to evaluate your hot spots intervention, or purchase to increase arrest rates, etc.

I have prepared a ton of different resources. The main one is a web-based application (a WASM-based app with R as the backend) where you can enter your inputs and generate a graph showing how precise your parameter estimates are:

The help page includes citations and additional materials, but here is a brief rundown:

  • I have the math details in this github repo, see the methodology.pdf. It also includes notes on how I used different LLM tools to produce the webpage and the method materials. Each of the applications allows you to download the R code used to generate the graphs and tables.

  • I have created a series of YouTube videos demonstrating the application (WDD, IRR, Proportion tests)

  • I have posted my slides for the ASEBP talk

See you all in DC at ASEBP in a few weeks!

xAI voice cloning API

xAI has just released an API to clone your voice. It is pretty simple, read a script, and then an API where you can have text to speech in that voice.

Here is the python code after you have cloned your voice.

import os
import requests
voice_id = os.environ['ANDY1_VOICE'] # my demo voice ID
text = '''this is a test demo of my voice. Be excited!
OK, how about a list of things; one, two, three.
Lets see where this takes us.'''
response = requests.post(
"https://api.x.ai/v1/tts",
headers={
"Authorization": f"Bearer {os.environ['XAI_API_KEY']}",
"Content-Type": "application/json",
},
json={
"text": llm_book,
"voice_id": voice_id,
"language": "en",
},
)
response.raise_for_status()
with open("AndyTest1.mp3", "wb") as f:
f.write(response.content)

I need to figure out my audio set up a bit better (my mic set up is probably not optimal and it produces some echo). But does a good job imitating my boring voice right out of the box!

And here is an example for longer speech from my intro to LLMs book:

# intro to llm book
llm_book = '''
Large language models (LLMs) are transforming how we work. Some of these examples include using LLMs to help write computer code, using LLMs to extract out information from irregular text sources, and creating chat-bots that can interact with various data sources and documents.
Most analysts, however, do not have any experience with these tools. This book is meant to be a general introduction to realistic examples of how individuals can use these tools; either in general software applications, or to help analysts write code to create software itself. Given the rapid pace of advancement in this area, a general introduction to help individuals who work in the knowledge economy understand the capabilities of these tools I believe is in order.
Here is a simple example of using an LLM API (*Application Programming Interface* -- just a standard way to send information and get information back on the web) using the anthropic library in python to extract key information from a free text crime narrative:
'''
response = requests.post(
"https://api.x.ai/v1/tts",
headers={
"Authorization": f"Bearer {os.environ['XAI_API_KEY']}",
"Content-Type": "application/json",
},
json={
"text": llm_book,
"voice_id": voice_id,
"language": "en",
},
)
response.raise_for_status()
with open("AndyTest_LLMIntro.mp3", "wb") as f:
f.write(response.content)

The LLM intro messed up *Application Programming Interface* section (start listening at 50 seconds in). But otherwise it is very nice.

For those worried about security, xAI did something smart here — you need to input text live into the API given their prompts. You cannot have a pre-recording audio input to do this. So cloning someone elses voice is pretty hard.

Costs are around $4 per million characters in the text to speech API. So say narrating my entire book should be under $10 I believe.

Took me a total of less than an hour to set up a voice, create the python code, and write this blog post!

Pangram is good

Many of the initial wave of “AI writing detectors” were quite bad. The biggest issue you need to be concerned about with an AI writing detector is false positives. If you are a professor and want to check students’ writing, it is very bad to falsely accuse a student.

The Pangram product, though, is quite good, and I suggest folks check it out.

The other main competitor on the market, GPTZero, is clearly lower quality (such as saying the Constitution is AI generated).

GPTZero in their documentation says they are the most accurate AI detector. One of the reasons you don’t really care about accuracy is that you cannot know the underlying rate of AI writing in any corpus except in the scenario where it is artificially generated. And that is the only scenario in which you can know the accuracy for sure. What you care about is specifically the false positive rate and the false negative rate.

Unlike GPTZero, Pangram appears to have very low false positive rates. A simple way to estimate the false positive rate is to just submit writing prior to 2022 to the tool and see how many it flags as AI. ChatGPT came out in late 2022, the tools to generate writing before that were just not even close for people to use in any serious way. So any writing flagged in the older corpus as AI is a false positive.

Here is an example examining legal briefs.

It is an independent assessment. We cannot really know the capture rate (were there more than 66 briefs generated via LLMs in that sample). We can know the false positive rate though. And it is 1/800 in this sample with Pangram.

Pangram says it has a 1 in 10,000 false positive rate across a wide array of writing samples. They even report in their own internal tests that GPTZero has a 2% false positive rate (I am pretty sure GPTZero’s false positive rate is much higher than 2%, hence the Constitution error.)

Many other checks for false negative rates involve people having various models generate writing and then classifying it. It is hard to know if those are very good benchmarks for estimating the false negative rate. But we can easily estimate the false positive rate, and in that respect Pangram is clearly better than other AI writing detectors on the market.

Should we care if writing is AI?

I have used AI tools to help me write. I promise to be forthcoming if I use AI to help me write any substantive sections of writing (in blog posts, books, social media posts, etc.) Currently I am almost always using the LLMs to copy-edit, which is often simply a prompt “check for spelling and grammar issues”.

I do not use it all the time for writing. This post was all written by hand (and then just copy-edited with Gemini CLI).

It is really not that hard to bring your own voice and use AI to aid your writing. Have the LLM read your prior work, then give it a detailed outline, and then iterate. See my transcript on a prior post for an example.

I’d note I have used Pangram to see if my LLM writing is too obviously AI, and it is not. To me, when the writing is clearly AI, this often signals a clear lack of care and effort in the writing. AI writing can be valuable, but it is quite frequently low value slop.

So you get people larping as tech experts.

You can trivially have Claude or whatever software write a Skill file, and then have an LLM write how it is super awesome. This does not make it so.

And you have salespeople write posts that literally make no sense.

This, to be clear, is obviously AI slop.

So these individuals could actually generate useful content if they spent any more than a trivial amount of time. But they don’t, and it shows.

Gathering interest in tech courses

Quick post this morning — I have a survey up gathering input on interest in short, technical courses.

Think 2-3 days, potentially in person/synchronous.

If you have taken a course with Paul Allison at Horizon’s, or an ICPSR summer course, those are similar examples. But, the main difference will be these courses are to prepare you for pursuing private sector roles.

These will be aimed at:

  • grad level social science students
  • current professors looking to pursue private sector roles
  • current data analysts looking to get into data science
  • undergrads with some more technical background

Survey lists potential courses (python for data analysis, intro to LLM APIs, SQL + Dashboards, using agent based tools for analysis), the course medium (in person vs video), price points.

If you are a university or organization interested in hosting such sessions for your students, let me know as well. Happy to chat to you about bringing this to your campus.

LinkedIn Premium Does Not Boost your Posts

One of my connections mentioned in a post on LinkedIn that since he turned off Premium, his posts have been getting less engagement. Since LinkedIn offers a month for free, and I have been trying to promote my recent book, I figured I would try my free month trial and see how many more views I could get. (Here I am not worried about Premium for applying to new jobs, it is possible it is totally worth it for that, I was not applying to jobs in this test so I do not know.)

Long story short, LinkedIn Premium does not appear to promote my material at all above the baseline.

Post Views

In a sample of 30 posts the month before I turned on Premium (turned on 3/24 in the evening, turned off 4/22 in the morning), my posts had an average of 3600 views (with a standard deviation of 7000, median 1400). Post-Premium, I had 23 posts, and the views were on average 2200 (SD 2900, median 900). Here is the full table of posts and links (Premium=1 means it was posted when my Premium subscription was turned on):

| Premium | Views | URL |
| ----:|----- :|:----- |
| 0    | 3659  | https://www.linkedin.com/posts/andrew-wheeler-46134849_llms-have-transformed-the-data-science-industry-activity-7426975341572984832-HGTA |
| 0    | 2526  | https://www.linkedin.com/posts/andrew-wheeler-46134849_no-guarantees-but-i-am-going-to-try-to-start-activity-7428418993553846272-vdjA    |
| 0    | 2290  | https://www.linkedin.com/posts/andrew-wheeler-46134849_much-of-the-hype-around-claude-code-is-having-activity-7428781380391567360-zkXr   |
| 0    | 545   | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-benefits-of-my-epub-version-of-activity-7429143771109302272-_V6T       |
| 0    | 1454  | https://www.linkedin.com/posts/andrew-wheeler-46134849_claude-code-has-the-ability-to-create-hooks-activity-7429506167527088128-01Um     |
| 0    | 1326  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-prompting-flows-i-find-convenient-activity-7429868558794436609-SDgG    |
| 0    | 1278  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-main-focuses-in-the-book-is-not-activity-7430230940444057600-pqK-      |
| 0    | 5988  | https://www.linkedin.com/posts/andrew-wheeler-46134849_while-skills-in-claude-code-are-all-the-rage-activity-7430955707358818304-iqjb    |
| 0    | 1726  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-mistakes-i-see-with-agent-based-activity-7431318102212100096-rI_W      |
| 0    | 1172  | https://www.linkedin.com/posts/andrew-wheeler-46134849_from-my-experience-as-an-educator-when-presenting-activity-7431680485585580032-8h7B    |
| 0    | 1360  | https://www.linkedin.com/posts/andrew-wheeler-46134849_although-the-llm-tools-are-currently-focused-activity-7432042882770944000-AfAG    |
| 0    | 5304  | https://www.linkedin.com/posts/andrew-wheeler-46134849_when-i-was-a-professor-at-ut-dallas-i-sat-activity-7432405268874817536-qrAI       |
| 0    | 30732 | https://www.linkedin.com/posts/andrew-wheeler-46134849_i-know-a-few-stats-folks-in-my-network-that-activity-7432767666781679617-HXnk |
| 0    | 1003  | https://www.linkedin.com/posts/andrew-wheeler-46134849_claude-code-does-not-have-an-image-model-activity-7433492422111776768-2dqY        |
| 0    | 884   | https://www.linkedin.com/posts/andrew-wheeler-46134849_while-i-have-a-section-in-the-book-devoted-activity-7433854815862013952-FIl-      |
| 0    | 888   | https://www.linkedin.com/posts/andrew-wheeler-46134849_in-the-book-i-have-a-dedicated-chapter-on-activity-7434217215450681344-hk3E       |
| 0    | 1868  | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-llm-book-is-compiled-using-quarto-so-activity-7434579595095580673-RHdx        |
| 0    | 807   | https://www.linkedin.com/posts/andrew-wheeler-46134849_llms-for-mortals-how-to-view-the-epub-activity-7434945455073169409-bNDu           |
| 0    | 1243  | https://www.linkedin.com/posts/andrew-wheeler-46134849_section-on-using-gliner-for-ner-activity-7435304382931513344-NPPC                 |
| 0    | 1745  | https://www.linkedin.com/posts/andrew-wheeler-46134849_my-first-book-data-science-for-crime-analysis-activity-7436029137695416320-aRmr   |
| 0    | 914   | https://www.linkedin.com/posts/andrew-wheeler-46134849_so-the-new-book-large-language-models-for-activity-7436376426100199424-1g8E       |
| 0    | 1593  | https://www.linkedin.com/posts/andrew-wheeler-46134849_agentic-coding-apps-like-claude-code-and-activity-7436738847054512128-qiXz        |
| 0    | 3415  | https://www.linkedin.com/posts/andrew-wheeler-46134849_many-people-are-turned-off-by-ai-writing-activity-7437101213717909504-QjOo        |
| 0    | 928   | https://www.linkedin.com/posts/andrew-wheeler-46134849_pretty-much-every-day-there-is-a-new-prompt-activity-7437463609401794560-50H-     |
| 0    | 2185  | https://www.linkedin.com/posts/andrew-wheeler-46134849_much-of-the-hype-around-skills-is-imo-people-activity-7437826000958337024-t6i3    |
| 0    | 800   | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-benefits-of-my-llm-for-mortals-activity-7438550758763098112-hQUw |
| 0    | 870   | https://www.linkedin.com/posts/andrew-wheeler-46134849_large-language-models-for-mortals-preview-activity-7438913140639207424-tAli       |
| 0    | 1948  | https://www.linkedin.com/posts/andrew-wheeler-46134849_new-blog-post-using-claude-code-to-help-activity-7441087469388861440-TCKq         |
| 0    | 1160  | https://www.linkedin.com/posts/andrew-wheeler-46134849_given-all-the-rage-with-generative-ai-and-activity-7441449857480933377-uPFw       |
| 0    | 27842 | https://www.linkedin.com/posts/andrew-wheeler-46134849_stop-teaching-r-teach-python-when-i-was-activity-7441812266938826753-DywF         |
| 1    | 526   | https://www.linkedin.com/posts/andrew-wheeler-46134849_forecasting-the-future-is-difficult-especially-activity-7442537064803368960-qsVO  |
| 1    | 13096 | https://www.linkedin.com/posts/andrew-wheeler-46134849_when-using-llms-to-do-structured-data-extraction-activity-7442899426471407617-CpZz |
| 1    | 2394  | https://www.linkedin.com/posts/andrew-wheeler-46134849_ive-spoken-with-many-people-who-are-concerned-activity-7443039100477145090-v_3i   |
| 1    | 646   | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-main-audience-my-book-large-language-activity-7443261810511757312-R2Jc        |
| 1    | 3030  | https://www.linkedin.com/posts/andrew-wheeler-46134849_for-the-folks-that-were-not-happy-with-my-activity-7443401497444409344-q38H       |
| 1    | 437   | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-current-capabilities-of-googles-activity-7443624184120754176-YHbw      |
| 1    | 5275  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-error-i-am-seeing-devs-continually-make-activity-7443986571650973696-TZ-K     |
| 1    | 3815  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-biggest-issues-with-using-generative-activity-7444348969100664832-pX0v |
| 1    | 738   | https://www.linkedin.com/posts/andrew-wheeler-46134849_reports-of-rags-demise-are-overstated-activity-7444711358421491712-6BJu           |
| 1    | 425   | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-recent-litellm-distribution-attack-highlights-activity-7445073747968999424-NEmn    |
| 1    | 3752  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-responses-to-me-writing-the-book-activity-7445436130080186369-7Fvx     |
| 1    | 1670  | https://www.linkedin.com/posts/andrew-wheeler-46134849_professors-that-follow-me-i-am-happy-to-activity-7446160904217407488-CXcZ         |
| 1    | 918   | https://www.linkedin.com/posts/andrew-wheeler-46134849_i-have-used-claude-code-the-longest-probably-activity-7447248077435920385-Ym8A    |
| 1    | 876   | https://www.linkedin.com/posts/andrew-wheeler-46134849_gio-has-a-new-post-out-on-examining-confidence-activity-7448697613513740289-hObZ  |
| 1    | 1959  | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-mythos-technical-blog-post-on-its-cybersecurity-activity-7449059395071709184-UJOC  |
| 1    | 2201  | https://www.linkedin.com/posts/andrew-wheeler-46134849_for-folks-that-use-jupyter-notebooks-one-activity-7449422388691243008-Twxn        |
| 1    | 626   | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-recommendations-i-have-in-the-activity-7450147165781426177-vceO        |
| 1    | 333   | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-term-agent-is-almost-always-used-as-activity-7450509557677625344-42Ze         |
| 1    | 5787  | https://www.linkedin.com/posts/andrew-wheeler-46134849_agent-based-systems-require-bad-python-code-activity-7450871941458190336-gzSl     |
| 1    | 468   | https://www.linkedin.com/posts/andrew-wheeler-46134849_broadly-there-are-two-types-of-agent-based-activity-7451234329390678016-ZU1h      |
| 1    | 1267  | https://www.linkedin.com/posts/andrew-wheeler-46134849_i-get-periodically-asked-what-is-the-best-activity-7451596716354707456-aFMF       |
| 1    | 346   | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-saying-a-picture-is-worth-a-1000-words-activity-7451959111132299264-Tk7o      |
| 1    | 480   | https://www.linkedin.com/posts/andrew-wheeler-46134849_it-is-important-to-have-independent-benchmark-activity-7452318150701953024-ftcM        |

I would have expected a multiplier (e.g. typically 3k views, now you have 6k or 9k views per post). So you could nitpick that I have differential timing for the posts, and the pre-premium posts have some contamination (if they promoted my older posts when I activated Premium). But those are not large enough to make a difference in my findings relative to what I expected.

The posts are quite comparable in content, mostly focused on my book and LLMs. It is possible my audience is oversaturated with that content, but I think it is just as likely that LinkedIn Premium doesn’t really promote your work to any substantive extent. (I have additionally obtained more followers in this period, so that should bias the results to have more views, not less.) At least here there is no evidence I should continue to pay $20 a month to increase my reach on LinkedIn.

Posts are bursty, and in the end I have very little ability to forecast what will or will not be popular. In the pre-period, my most popular post was on a blog post I did on log-probabilities (30k views). I definitely try to post more technical stuff on LinkedIn than the typical social media influencer, so that limits the reach.

I also had a rage-bait post on professors should teach python and not R with just under 30k views. (That was a bit of social media manipulation – have a controversial opinion that divides people, you get a bunch of thumbs up and a bunch of comments.) I do not have that many potential rage-bait post topics!

In addition to this, I also did the month for free for LinkedIn Premium for my business Crime De-Coder page. The same with my business page, I did not see any increased views, increased followers, etc.

Profile Views

Although I have not seen LinkedIn explicitly say Premium boosts your posts (besides actually paying for advertising), I have seen LinkedIn explicitly advertise that Premium profiles get more views:

So how do profile views look? I did get more the week I signed up, but it was trending upward previously, and reverted to the trend after week one anyway. (A few days short, I cannot access the chart week by week since turning off Premium.)

For a bit of background, I spent most of my time posting on my LinkedIn business Crime De-Coder page, and only posted on my personal page maybe once or twice a month. But since publishing LLMs for Mortals (in February of 2026), I have posted more on my personal. Which you can see increased my profile views before I signed up for Premium.

Likely the past additional profile views are for that rage-bait Python vs R post that was popular, not due to anything Premium did.

This appears to be extremely misleading advertising on LinkedIn’s part. If they just look at Premium vs not, it is likely Premium users are more active. This should just say the explicit “boost” profile views get, like ranked higher in searches.

$100 ad credit

With premium, you get $100 ad credit for posts a month. I used this to boost my original LLM for Mortals launch post, which was stale at that point and not accumulating any additional views.

The metrics on the post were as LinkedIn said they would be. Despite having 80+ likes when I first created it, the post only had 3700 views. Spending $100 on the credits got me an additional ~3500 views and supposedly ~50 additional website clicks. (I am confused how this is calculated, as I can see the actual link in the post was clicked fewer than 10 additional times with the campaign.)

I knew going in that adverts on LinkedIn are not a net benefit given my book purchase conversion rates. What I will call “high trust” referrals, I have something like a 1/100 purchase rate for the book. For other mediums, it is more like 1/1000. As far as I can tell, these seem pretty typical for a higher dollar value book purchase ($50+).

I have debated on setting the purchase price for the epub to much lower. $50 is in line with current offerings from O’Reilly, and in my informal demand curve tests is where I think it should be. But I don’t think any realistic conversion rate would make LinkedIn advertising make sense for my book.

For reference for influencers though, this gives a rough estimate comparable to LinkedIn’s direct advertising. Basically my average post is worth $100 according to LinkedIn. I only have around 3k followers currently on LinkedIn, so I imagine folks with followings 10x that can likely do direct advertisements to their audiences for more like $1k and up.

Wrap Up

I still think LinkedIn is the best social media site currently to promote my work and business. It is not just about the raw view counts, but also about conversion to people buying my book or reaching out for additional consulting gigs.

I will continue to use LinkedIn for this, but paying for a Premium LinkedIn account does not appear to be worth it for these reasons. Even if the views were increased, it is possible that they are not good connections for these end goals.

There are additional things you get with Premium (can send cold messages to people you are not connected to, supposedly higher priority when applying to jobs). Those are maybe worth the $20 a month for some people. But focusing on what LinkedIn advertises for “boosting” your posts and profile, I did not personally see any evidence that would justify spending even $1 a month for the Premium features.

Job Advice Resources page

Minor update, I have created a page, Job Advice Resources to cumulatively list all the materials I have written on advice for social scientists and crime analysts looking to pivot into private sector tech roles.

I still get maybe ~2 folks a month ask for advice, and I am always happy to chat. I wish PhD granting institutions took this more seriously (it only takes minor changes to better prepare students).

If you are an administrator of a PhD program and actually care about getting your students jobs, also feel free to reach out and I am happy to discuss how I can help.

The race to the bottom with AI tools

What we are seeing in the AI startup space is a perfect example of the “no moat” problem: if your core product is essentially just clever prompt engineering wrapped around someone else’s frontier model, it is trivially easy for a competitor to reverse-engineer your workflow and undercut your price. Over the last few months, this lack of a defensible moat has triggered a rapid race to the bottom in automated peer review, moving from expensive managed services to open-source “bring your own key” (BYOK) scripts.

Here I am going to look at three tools specifically designed to review academic papers: Refine, IsItCredible, and Coarse.

Overview of the Tools

Refine: Refine positions itself as a premium, rigorous option for institutions, boasting testimonials from Ivy League professors and a high price point of $49.99 per review. It uses what it calls “massive parallel compute” to make hundreds of LLM calls to stress-test every line of a document.

IsItCredible: Built on the open-source Reviewer 2 pipeline, IsItCredible offers a standardized, pay-per-use middle ground with core reports starting at $5. It employs a clever “adversarial” architecture where “Red Team” agents try to find flaws and a “Blue Team” verifies them to prevent hallucinations.

Coarse: Coarse represents the logical endpoint of this race as an open-source “Bring Your Own Key” (BYOK) tool that lets you run complex multi-agent reviews locally or via OpenRouter. Because users pay the API costs directly instead of a markup, a comprehensive paper review is significantly cheaper.

The “LLM as a Judge” Problem

The hardest part of all this is evaluation. How do you know if the AI reviewer is actually good?

Refine relies almost entirely on anecdotal evidence. Their own FAQ essentially tells you to just try it and see the difference for yourself, claiming that general-purpose chatbots cannot match their depth even with expert prompting. This “try it yourself” approach is effective for marketing, but it isn’t a hard benchmark.

IsItCredible and Coarse are trying to be more systematic. The IsItCredible team released a paper, Yell at It: Prompt Engineering for Automated Peer Review, where they benchmarked their tool against five alternatives. They claim 15 wins out of 20 pairings. Similarly, Coarse claims to have been “blind-evaluated” against Refine and Reviewer 2, scoring higher on coverage and specificity.

However, we are still largely in the “LLM as a judge” era. These benchmarks often use another LLM to decide which review is better. It is circular logic. Until we have a “Ground Truth” dataset of known mathematical errors or logical fallacies in published papers, we are just measuring which AI writes the most convincing-sounding critique.

Because evaluation is so difficult, this software category risks becoming a classic market for lemons. It is incredibly difficult to identify substantive differences in quality between these tools without some external, hard benchmark. To truly evaluate if Refine’s expensive managed service is meaningfully better than Coarse’s open-source BYOK run, you have to verify the AI’s claims. But verifying those claims requires spending just as much time reading and reviewing the original paper as you would have spent just doing the review yourself from scratch. Without transparent benchmarks, users cannot easily distinguish high-quality rigorous analysis from convincing hallucinations, driving the market toward the cheapest option by default.

For those building AI tools, this entire space serves as a warning about the race to the bottom. I have previously written about deep research tools as another example of this phenomenon. If your only value proposition is a well-orchestrated prompt chain, open-source alternatives will inevitably compress your margins to zero. Eventually, the native GUI interfaces of the frontier models themselves may just become good enough that your specialized service isn’t even needed.

Meta

Did you like this post? Guess what, it was entirely generated via the Google’s API models (specifically the gemini cli). I have saved the chat session and log for how long it took here. You can see for yourself, I had a broad idea, asked it to review different materials, and then generate a post. I then iterated 25 minutes from start to finish in total.

The original post also is not flagged by Pangram as AI generated.

It definitely is not 100% my style (and to be clear this meta section is 100% hand written). The final paragraph about deep research tools I also struggled to get the model to say what I wanted – I wanted it to say “deep research tools are another example where this same situation will occur”. I am keeping the original 100% AI generated post for posterity though for folks to see what is possible with the current tools.

Policing Scholars should join ASEBP

Cross-posted on my Crime De-Coder blog.

I will be giving a talk at the upcoming American Society of Evidence Based Policing (ASEBP) conference (registration link here, May 20th-22nd in DC). My talk is How long to conduct your experiment? Check it out Thursday morning – I specifically asked for one of the short talks; 15 minutes is plenty to get the gist.

ASEBP Conference Flyer, 2026 in DC

I will be sharing a web-app to go with the talk soon (you can see my WDD tool and this blog post for background), but wanted to write a more general post about why researchers (as well as police officers who are interested in professionalization of the field) should join ASEBP.

To start, I have been involved in various ways with ASEBP for several years now, but I do not have any financial ties to ASEBP. I currently volunteer on the committee that reviews conference talks.

ASEBP is clearly the best organization for policing scholars currently in the country. The other main criminological societies (the American Society of Criminology and the Academy of Criminal Justice Sciences) are operating much as they did 30 years ago. Mostly they only exist to run journals and have a yearly conference where anyone can give a talk. They are incredibly insular, and have basically zero input from practitioners.

You can go and just look at the talks for ASC and ACJS – they are basically irrelevant to the vast majority of criminal justice operations (not only in policing, but in the CJ field as a whole). You can go look at the talks for the ASEBP conference and see they have a much clearer focus on realistic topics police departments are interested in, but presented by legitimate researchers and practitioners.

For scholars, I have developed working relationships with departments through multiple police practitioners I have met through ASEBP – and I hope to make more!

ASEBP was started by Renee Mitchell with a clear goal in mind – Renee is really the modern-day version of August Vollmer. ASEBP is intended to be a rigorous (unlike ASC, which allows almost anyone to present) conference and organization (ASEBP has training opportunities as well) to advance the use of evidence in policing operations.

If you think “I am not a policing researcher”, but have anything to do at all with criminal justice, feel free to get in touch. (Crime analysts should definitely join.) I have ideas to expand the organization – nothing equivalent currently exists in other parts of the criminal justice system as well. Being evidence-based is really the core of what Renee and everyone else is building.

If you are going to the conference and want to meet up, feel free to send me an email, andrew.wheeler@crimede-coder.com, and I will find a time to get a coffee while we are in DC.

Interview on LEAP about LLMs for Mortals

I was recently interviewed by Jason Elder on the Law Enforcement Analysts Podcast about my new book, Large Language Models for Mortals: A Practical Guide for Analysts.

Jason does an excellent job with interviewing (and does a quality editing job with audio), so suggest to follow that if you are a crime analyst or researcher working with police departments.

Basically cover large swaths of the book, through basics of APIs, structured info extraction, some high level discussion of RAG, and how AI coding tools still need a bit of human oversight and direction. Even if you are not a coder, I think picking up a copy is a good idea to get an understanding about what is possible with the current tools.

Just to catalog the different coupon codes for the book:

  • LLMDEVS to get 50% off of the epub
  • TWOFOR1 to get $30 off when purchasing two books (can be any two books)

I do give the first coupon code for the paperback version of the book in the interview. So take a listen if interested in $20 off the paperback.

You can purchase either epub or paperback from my store worldwide.