I translated my book for $7 using openai

The other day an officer from the French Gendarmerie commented that they use my python for crime analysis book. I asked that individual, and he stated they all speak English. But given my book is written in plain text markdown and compiled using Quarto, it is not that difficult to pipe the text through a tool to translate it to other languages. (Knowing that epubs under the hood are just html, it would not suprise me if there is some epub reader that can use google translate.)

So you can see now I have available in the Crime De-Coder store four new books:

ebook versions are normally $39.99, and print is $49.99 (both available worldwide). For the next few weeks, can use promo code translate25 (until 11/15/2025) to purchase epub versions for $19.99.

If you want to see a preview of the books first two chapters, here are the PDFs:

And here I added a page on my crimede-coder site with testimonials.

As the title says, this in the end cost (less than) $7 to convert to French (and ditto to convert to Spanish).

Here is code demo’ing the conversion. It uses OpenAI’s GPT-5 model, but likely smaller and cheaper models would work just fine if you did not want to fork out $7. It ended up being a quite simple afternoon project (parsing the markdown ended up being the bigger pain).

So the markdown for the book in plain text looks like this:

It ends up that because markdown uses line breaks to denote different sections, that ends up being a fairly natural break to do the translation. These GenAI tools cannot repeat back very long sequences, but a paragraph is a good length. Long enough to have additional context, but short enough for the machine to not go off the rails when trying to just return the text you input. Then I just have extra logic to not parse code sections (that start/end with three backticks). I don’t even bother to parse out the other sections (like LaTeX or HTML), and I just include in the prompt to not modify those.

So I just read in the quarto document, split by “”, then feed in the text sections into OpenAI. I did not test this very much, just use the current default gpt-5 model with medium reasoning. (It is quite possible a non-reasoning smaller model will do just as well. I suspect the open models will do fine.)

You will ultimately still want someone to spot check the results, and then do some light edits. For example, here is the French version when I am talking about running code in the REPL, first in English:

Running in the REPL

Now, we are going to run an interactive python session, sometimes people call this the REPL, read-eval-print-loop. Simply type python in the command prompt and hit enter. You will then be greeted with this screen, and you will be inside of a python session.

And then in French:

Exécution dans le REPL

Maintenant, nous allons lancer une session Python interactive, que certains appellent le REPL, boucle lire-évaluer-afficher. Tapez simplement python dans l’invite de commande et appuyez sur Entrée. Vous verrez alors cet écran et vous serez dans une session Python.

So the acronym is carried forward, but the description of the acronym is not. (And I went and edited that for the versions on my website.) But look at this section in the intro talking about GIS:

There are situations when paid for tools are appropriate as well. Statistical programs like SPSS and SAS do not store their entire dataset in memory, so can be very convenient for some large data tasks. ESRI’s GIS (Geographic Information System) tools can be more convenient for specific mapping tasks (such as calculating network distances or geocoding) than many of the open source solutions. (And ESRI’s tools you can automate by using python code as well, so it is not mutually exclusive.) But that being said, I can leverage python for nearly 100% of my day to day tasks. This is especially important for public sector crime analysts, as you may not have a budget to purchase closed source programs. Python is 100% free and open source.

And here in French:

Il existe également des situations où les outils payants sont appropriés. Les logiciels statistiques comme SPSS et SAS ne stockent pas l’intégralité de leur jeu de données en mémoire, ils peuvent donc être très pratiques pour certaines tâches impliquant de grands volumes de données. Les outils SIG d’ESRI (Système d’information géographique) peuvent être plus pratiques que de nombreuses solutions open source pour des tâches cartographiques spécifiques (comme le calcul des distances sur un réseau ou le géocodage). (Et les outils d’ESRI peuvent également être automatisés à l’aide de code Python, ce qui n’est pas mutuellement exclusif.) Cela dit, je peux m’appuyer sur Python pour près de 100 % de mes tâches quotidiennes. C’est particulièrement important pour les analystes de la criminalité du secteur public, car vous n’avez peut‑être pas de budget pour acheter des logiciels propriétaires. Python est 100 % gratuit et open source.

So it translated GIS to SIG in French (Système d’information géographique). Which seems quite reasonable to me.

I paid an individual to review the Spanish translation (if any readers are interested to give me a quote for the French version copy-edits, would appreciate it). She stated it is overall very readable, but just has many minor things. Here is a a sample of suggestions:

Total number of edits she suggested were 77 (out of 310 pages).

If you are interested in another language just let me know. I am not sure about translation for the Asian languages, but I imagine it works OK out of the box for most languages that are derivative of Latin. Another benefit of self-publishing, I can just have the French version available now, but if I am able to find someone to help with the copy-edits I will just update the draft after I get their feedback.

Recording your mouse and keyboard with python

The differently LLM providers have released computer use tools, Google’s Gemini being one of the most recent ones. They way these work, it submits and image, and then does tasks given general instruction, actually manipulating the mouse and keyboard, asking for human in the loop input at various stages, etc. They seem cool but I am lacking clear examples where I would use them.

They really make sense for very complicated workflows that vary. The things that make the most sense to spend some time trying to automate though are boring things – things that have a specific set of inputs and outputs, and the machine can entirely make those route decisions based on rules you define at the onset. When I was a crime analyst, I worked with all sorts of janky and old desktop software that I would need to generate reports and email those results to key individuals for example.

Here as a day project, I created a set of python functions to record your keyboard and mouse inputs, and then replay those inputs. The python code is here, it ends up being pretty simple (mostly written by ChatGPT and slightly modified by me). And here is a YouTube video demonstration showing what you can do.

So even if you have software that does not have an easy way to integrate with other code, here you can just record your movements and replay them with python on a schedule to accomplish that monotonous task.

I scraped the Crime solutions site

Before I get to the main gist, I am going to talk about another site. The National Institute of Justice (NIJ) paid RTI over $10 million dollars to develop a forensic technology center of excellence over the past 5 years. While this effort involved more than just a website, the only thing that lives in perpetuity for others to learn from the center of excellence are the resources they provide on the website.

Once funding was pulled, this is what RTI did with those resources:

The website is not even up anymore (it is probably a good domain to snatch up if no one owns it anymore), but you can see what it looked like on the internet archive. It likely had over 1000+ videos and pages of material.

I have many friends at RTI. It is hard for me to articulate how distasteful I find this. I understand RTI is upset with the federal government cuts, but to just simply leave the website up is a minimal cost (and likely worth it to RTI just for the SEO links to other RTI work).

Imagine you paid someone $1 million dollars for something. They build it, and then later say “for $1 million more, I can do more”. You say ok, then after you have dispersed $500,000 you say “I am not going to spend more”. In response, the creator destroys all the material. This is what RTI did, except it was they had been paid $11 million and they were still to be paid another $1 million. Going forward, if anyone from NIJ is listening, government contracts to build external resources should be licensed in a way that prevents that from happening.

And this brings me to the current topic, CrimeSolutions.gov. It is a bit of a different scenario, as NIJ controls this website. But recently they cut funding to the program, which was administered by DSG.

Crime Solutions is a website where they have collected independent ratings of research on criminal justice topics. To date they have something like 800 ratings on the website. I have participated in quite a few, and I think these are high quality.

To prevent someone (for whatever reason) simply turning off the lights, I scraped the site and posted the results to github. It is a PHP site under the hood, but changing everything to run as a static HTML site did not work out too badly.

So for now, you can view the material at the original website. But if that goes down, you have a close to same functional site mirrored at https://apwheele.github.io/crime-solutions/index.html. So at least those 800 some reviews will not be lost.

What is the long term solution? I could be a butthead and tomorrow take down my github page (so clone it locally), so me scraping the site is not really a solution as much as a stopgap.

Ultimately we want a long term, public, storage solution that is not controlled by a single actor. The best solution we have now is ArDrive via the folks from Arweave. For a one time upfront purchase, Arweave guarantees the data will last a minimum of 200 years (they fund an endowment to continually pay for upkeep and storage costs). If you want to learn more, stay tuned, as me and Scott Jacques are working on migrating much of the CrimRXiv and CrimConsortium work to this more permanent solution.

Recommend reading The Idea Factory, Docker python tips

A friend recently recommended The Idea Factory: Bell Labs and the Great Age of American Innovation by Jon Gertner. It is one of the best books I have read in awhile, so also want to recommend to the readers of my blog.

I was vaguely familiar with Bell Labs given my interest in stats and computer science. John Tukey makes a few honorable mentions, but Claude Shannon is a central character of the book. What I did not realize is that almost all of modern computing can be traced back to innovations that were developed at Bell Labs. For a sample, these include:

  • the transistor
  • fiber optic cables (I did not even know, fiber is very thin strands of glass)
  • the cellular network with smaller towers
  • satellite communication

And then you get smattering of different discussions as well, such as the material science that goes into making underwater cables durable and shark resistant.

The backstory was that AT&T in the early 20th century had a monopoly on landline telephones. Similar now to how most states have a single electric provider – they were a private company but blessed by the government to have that monopoly. AT&T intentionally had a massive research arm that they used to improve communications, but also they provided that research back into the public coffers. Shannon was a pure mathematician, he was not under the gun to produce revenue.

Gertner basically goes through a series of characters that were instrumental in developing some of these ideas, and in creating and managing Bell Labs itself. It is a high level recounting of Gertner mostly from historical notebooks. One of the things I really want to understand is how institutions even tackle a project that lasts a decade – things I have been involved in at work that last a year are just dreadful due to transaction costs between so many groups. I can’t even imagine trying to keep on schedule for something so massive. So I do not get that level of detail from the book, just moreso someone had an idea, developed a little tinker proof of concept, and then Bell Labs sunk a decade an a small army of engineers to figure out how to build it in an economical way.

This is not a critique of Gertner (his writing is wonderful, and really gives flavor to the characters). Maybe just sinking an army of engineers on a problem is the only reasonable answer to my question.

Most of the innovation in my field, criminal justice, is coming from the private sector. I wonder (or maybe hope and dream is a better description) if a company, like Axon, could build something like that for our field.


Part of the point for writing blog posts is that I do the same tasks over and over again. Having a nerd journal is convenient to reference.

One of the things that I do not have to commonly do, but it seems like once a year at my gig, I need to putz around with Docker containers. For note for myself, when building python apps, to get the correct caching you want to install the libraries first, and then copy the app over.

So if you do this:

FROM python:3.11-slim
COPY . /app
RUN pip install --no-cache-dir -r /app/requirements.txt
CMD ["python", "main.py"]

Everytime you change a single line of code, you need to re-install all of the libraries. This is painful. (For folks who like uv, this does not solve the problem, as you still need to download the libraries everytime in this approach.)

A better workflow then is to copy over the single requirements.txt file (or .toml, whatever), install that, and then copy over your application.

FROM python:3.11-slim
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
CMD ["python", "main.py"]

So now, only when I change the requirements.txt file will I need to redo that layer.

Now I am a terrible person to ask about dev builds and testing code in this set up. I doubt I am doing things the way I should be. But most of the time I am just building this.

docker build -t py_app .

And then I will have logic in main.py (or swap out with a test.py) that logs whatever I need to the screen. Then you can either do:

docker run --rm py_app

Or if you want to bash into the container, you can do:

docker run -it --rm py_app bash

Then from in the container you can go into the python REPL, edit a file using vim if you need to, etc.

Part of the reason I want data scientists to be full stack is because at work, if I need another team to help me build and test code, it basically adds 3 months at a minimum to my project. Probably one of the most complicated things myself and team have done at the day job is figure out the correct magical incantations to properly build ODBC connections to various databases in Docker containers. If you can learn about boosted models, you can learn how to build Docker containers.

Deep research and open access

Most of the major LLM chatbot vendors are now offering a tool called deep research. These tools basically just scour the web given a question, and return a report. For academics conducting literature reviews, the parallel is obvious. We just tend to limit the review to peer reviewed research.

I started with testing out Google’s Gemini service. Using that, I noticed almost all of the sources cited were public materials. So I did a little test with a few prompts across the different tools. Below are some examples of those:

  • Google Gemini question on measuring stress in police officers (PDF, I cannot share this chat link it appears)
  • OpenAI Effectiveness of Gunshot detection (PDF, link to chat)
  • Perplexity convenience sample (PDF, Perplexity was one conversation)
  • Perplexity survey measures attitudes towards police (PDF, see chat link above)

The report on officer mental health measures was an area I was wholly unfamiliar. The other tests are areas where I am quite familiar, so I could evaluate how well I thought each tool did. OpenAI’s tool is the most irksome to work with, citations work out of the box for Google and Perplexity, but not with ChatGPT. I had to ask it to reformat things several times. Claude’s tool has no test here, as to use its deep research tool you need a paid account.

Offhand each of the tools did a passable job of reviewing the literature and writing reasonable summaries. I could nitpick things in both the Perplexity and the ChatGPT results, but overall they are good tools I would recommend people become familiar with. ChatGPT was more concise and more on-point. Perplexity got the right answer for the convenience sample question (use post-stratification), but also pulled in a large literature on propensity score matching (which is only relevant for X causes Y type questions, not overall distribution of Y). Again this is nit-picking for less than 5 minutes of work.

Overall these will not magically take over writing your literature review, but are useful (the same way that doing simpler searches in google scholar is useful). The issue with hallucinating citations is mostly solved (see the exception for ChatGPT here). You should consult the original sources and treat deep research reports like on-demand Wikipedia pages, but lets not kid ourselves – most people will not be that thorough.

For the Gemini report on officer mental health, I went through quickly and broke down the 77 citations across the publication type or whether the sources were in HTML or PDF. (Likely some errors here, I went by the text for the most part.) For the HTML vs PDF, 59 out of 77 (76%) are HTML web-sources. Here is the breakdown for my ad-hoc categories for types of publications:

  • Peer Review (open) – 39 (50%)
  • Peer review (just abstract) 10 (13% – these are all ResearchGate)
  • Open Reports 23 (30%)
  • Web pages 5 (6%)

For a quick rundown of these. Peer reviewed should be obvious, but sometimes the different tools cite papers that are not open access. In these cases, they are just using the abstract to madlib how Deep Research fills in its report. (I consider ResearchGate articles here as just abstract, they are a mix of really available, but you need to click a link to get to the PDF in those cases. Google is not indexing those PDFs behind a wall, but the abstract.) Open reports I reserve for think tank or other government groups. Web pages I reserve for blogs or private sector white papers.

I’d note as well that even though it does cite many peer review here, many of these are quite low quality (stuff in MDPI, or other what look to me pay to publish locations). Basically none of the citations are in major criminology journals! As I am not as familiar with this area this may be reasonable though, I don’t know if this material is often in different policing journals or Criminal Justice and Behavior and just not being picked up at all, or if that lit in those places just does not exist. I have a feeling it is missing a few of the traditional crim journal sources though (and picks up a few sources in different languages).

The OpenAI report largely hallucinated references in the final report it built (something that Gemini and Perplexity currently do not do). The references it made up were often portmanteaus of different papers. Of the 12 references it provided, 3 were supposedly peer reviewed articles. You can in the ChatGPT chat go and see the actual web-sources it used (actual links, not hallucinated). Of the 32 web links, here is the breakdown:

  • Pubmed 9
  • The Trace 5
  • Kansas City local news station website 4
  • Eric Piza’s wordpress website 3
  • Govtech website 3
  • NIJ 2

There are single links then to two different journals, and one to the Police Chief magazine. I’d note Eric’s site is not that old (first RSS feed started in February 2023), so Eric making a website where he simple shares his peer reviewed work greatly increased his exposure. His webpage in ChatGPT is more influential than NIJ and peer reviewed CJ journals combined.

I did not do the work to go through the Perplexity citations. But in large part they appear to me quite similar to Gemini on their face. They do cite pure PDF documents more often than I expected, but still we are talking about 24% in the Gemini example are PDFs.

The long story short advice here is that you should post your preprints or postprints publicly, preferably in HTML format. For criminologists, you should do this currently on CrimRXiv. In addition to this, just make a free webpage and post overviews of your work.

These tests were just simple prompts as well. I bet you could steer the tool to give better sources with some additional prompting, like “look at this specific journal”. (Design idea if anyone from Perplexity is listening, allow someone to be able to whitelist sources to specific domains.)


Other random pro-tip for using Gemini chats. They do not print well, and if they have quite a bit of markdown and/or mathematics, they do not convert to a google document very well. What I did in those circumstances was to do a bit of javascript hacking. So go into your dev console (in Chrome right click on the page and select “Inspect”, then in the new page that opens up go to the “Console” tab). And depending on the chat browser currently opened, can try entering this javascript:

// Example printing out Google Gemini Chat
var res = document.getElementsByTagName("extended-response-panel")[0];
var report = res.getElementsByTagName("message-content")[0];
var body = document.getElementsByTagName("body")[0];
let escapeHTMLPolicy = trustedTypes.createPolicy("escapeHTML", {
 createHTML: (string) => string
});
body.innerHTML = escapeHTMLPolicy.createHTML(report.innerHTML);
// Now you can go back to page, cannot scroll but
// Ctrl+P prints out nicely

Or this works for me when revisiting the page:

var report = document.getElementById("extended-response-message-content");
var body = document.getElementsByTagName("body")[0];
let escapeHTMLPolicy = trustedTypes.createPolicy("escapeHTML", {
 createHTML: (string) => string
});
body.innerHTML = escapeHTMLPolicy.createHTML(report.innerHTML);

This page scrolling does not work, but Ctrl + P to print the page does.

The idea behind this, I want to just get the report content, which ends up being hidden away in a mess of div tags, promoted out to the body of the page. This will likely break in the near future as well, but you just need to figure out the correct way to get the report content.

Here is an example of using Gemini’s Deep Research to help me make a practice study guide for my sons calculus course as an example.

Reloading classes in python and shared borders

For some housekeeping, if you are not signed up, also make sure to sign up for the RSS feed of my crime de-coder blog. I have not been cross posting here consistently. For the last few posts:

For ASEBP, conference submissions for 2026 are open. (I will actually be going to this in 2026, submitted a 15 minute talk on planning experiments.)

Today will just be a quick post on two pieces of code I thought might be useful to share. The first is useful for humans, when testing code in functions, you can use the importlib library to reload functions. That is, imagine you have code:

import crimepy as cpy

test = cpy.func1(....)

And then when you run this, you see that func1 has an error. You can edit the source code, and then run:

from importlib import reload

reload(cpy)
test = cpy.func1(....)

This is my preferred approach to testing code. Note you need to import a library and reload the library. Not from crimepy import *, this will not work unfortunately.

Recently was testing out code that took quite a while to run, my Patrol Districting. This is a class, and I was editing my methods to make maps. The code itself takes around 5 minutes to run it through when remaking the entire class. What I found was a simpler approach, I can dump out the file to pickle, then reload the library, then load the pickle object. The may the pickle module works, it pulls the method definition from the global environment (pickle just saves the dict items under the hood). So the code looked like this:

from crimepy import pmed
...
pmed12 = pmed.pmed(...)
pmed12.map_plot() # causes error

And then instead of using the reload method as is (which would require me to create an entirely new object), use this approach for de-bugging:

# save file
import pickle
with open('pmed12.pkl', 'wb') as file:
    # Dump data with highest protocol for best performance
    pickle.dump(pmed12, file)

from importlib import reload
# edit method
reload(pmed)

# reload the object
with open('pmed12.pkl', 'rb') as file:
    # Load the pickled data
    pmed12_new = pickle.load(file)

# retest the method
pmed12_new.map_plot()

Writing code itself is often not the bottleneck – testing is. So figuring out ways to iterate testing faster is often worth the effort (I might have saved a day or two of work if I did this approach sooner when debugging that code).

The second code snippet is useful for the machines; I have been having Claude help me write quite a bit of the crimepy work. Here was one though it was having trouble with – calculating the shared border length between two polygons. Basically it went down an overly complicated path to get the exact calculation, whereas here I have an approximation using tiny buffers that works just fine and is much simpler.

def intersection_length(poly1,poly2,smb=1e-15):
    '''
    Length of the intersection between two shapely polygons
    
    poly1 - shapely polygon
    poly2 - shapely polygon
    smb - float, defaul 1e-15, small distance to buffer
    
    The way this works, I compute a very small buffer for
    whatever polygon is simpler (based on length)
    then take the intersection and divide by 2
    so not exact, but close enough for this work
    '''
    # buffer the less complicated edge of the two
    if poly1.length > poly2.length:
        p2, p1 = poly1, poly2
    else:
        p1, p2 = poly1, poly2
    # This basically returns a very skinny polygon
    pb = p1.buffer(smb,cap_style='flat').intersection(p2)
    if pb.is_empty:
        return 0.0
    elif hasattr(pb, 'length')
        return (pb.length-2*smb)/2
    else:
        return 0.0

And then for some tests:

from shapely.geometry import Polygon

poly1 = Polygon([(0, 0), (4, 0), (4, 3), (0, 3)])  # Rectangle
poly2 = Polygon([(2, 1), (6, 1), (6, 4), (2, 4)])  # Overlapping rectangle

intersection_length(poly1,poly2) # should be close to 0

poly3 = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
poly4 = Polygon([(2, 0), (4, 0), (4, 2), (2, 2)])

intersection_length(poly3,poly4) # should be close to 2

poly5 = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
poly6 = Polygon([(2, 0), (4, 0), (4, 3), (1, 3), (1, 2), (2, 2)])

intersection_length(poly5,poly6) # should be close to 3

poly7 = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
poly8 = Polygon([(3, 0), (5, 0), (5, 2), (3, 2)])

intersection_length(poly7,poly8) # should be 0

Real GIS data often has imperfections (polygons that do not perfectly line up). So using the buffer method (and having an option to increase the buffer size) can often help smooth out those issues. It will not be exact, but the inexactness we are talking about will often be to well past the 10th decimal place.

The difference between models, drive-time vs fatality edition

Easily one of the most common critiques I make when reviewing peer reviewed papers is the concept, the difference between statistically significant and not statistically significant is not itself statistically significant (Gelman & Stern, 2006).

If you cannot parse that sentence, the idea is simple to illustrate. Imagine you have two models:

Model     Coef  (SE)  p-value
  A        0.5  0.2     0.01
  B        0.3  0.2     0.13

So often social scientists will say “well, the effect in model B is different” and then post-hoc make up some reason why the effect in Model B is different than Model A. This is a waste of time, as comparing the effects directly, they are quite similar. We have an estimate of their difference (assuming 0 covariance between the effects), as

Effect difference = 0.5 - 0.3 = 0.2
SE of effect difference = sqrt(0.2^2 + 0.2^2) = 0.28

So when you compare the models directly (which is probably what you want to do when you are describing comparisons between your work and prior work), this is a bit of a nothing burger. It does not matter that Model B is not statistically significant, a coefficient of 0.3 is totally consistent with the prior work given the standard errors of both models.

Reminded again about this concept, as Arredondo et al. (2025) do a replication of my paper with Gio on drive time fatalities and driving distance (Circo & Wheeler, 2021). They find that distance (whether Euclidean or drive time) is not statistically significant in their models. Here is the abstract:

Gunshot fatality rates vary considerably between cities with Baltimore, Maryland experiencing the highest rate in the U.S.. Previous research suggests that proximity to trauma care influences such survival rates. Using binomial logistic regression models, we assessed whether proximity to trauma centers impacted the survivability of gunshot wound victims in Baltimore for the years 2015-2019, considering three types of distance measurements: Euclidean, driving distance, and driving time. Distance to a hospital was not found to be statistically associated with survivability, regardless of measure. These results reinforce previous findings on Baltimore’s anomalous gunshot survivability and indicate broader social forces’ influence on outcomes.

This ends up being a clear example of the error I describe above. To make it simple, here is a comparison between their effects and the effects in my and Gio’s paper (in the format Coef (SE)):

Paper     Euclid          Network       Drive Time
Philly     0.042 (0.021)  0.030 (0.016)    0.022 (0.010)
Baltimore  0.034 (0.022)  0.032 (0.020)    0.013 (0.006)

At least for these coefficients, there is literally nothing anomalous at all compared to the work me and Gio did in Philadelphia.

To translate these coefficients to something meaningful, Gio and I estimate marginal effects – basically a reduction of 2 minutes results in a decrease of 1 percentage point in the probability of death. So if you compare someone who is shot 10 minutes from the hospital and has a 20% chance of death, if you could wave a wand and get them to the ER 2 minutes faster, we would guess their probability of death goes down to 19%. Tiny, but over many such cases makes a difference.

I went through some power analysis simulations in the past for a paper comparing longer drive time distances as well (Sierra-Arévalo et al. 2022). So the (very minor) differences could also be due to omitted variable bias (in logit models, even if not confounded with the other X, can bias towards 0). The Baltimore paper does not include where a person was shot, which was easily the most important factor in my research for the Philly work.

To wrap up – we as researchers cannot really change broader social forces (nor can we likely change the location of level 1 emergency rooms). What we can change however are different methods to get gun shot victims to the ER faster. These include things like scoop-and-run (Winter et al., 2022), or even gun shot detection tech to get people to scenes faster (Piza et al., 2023).

References

Some notes on project management

Have recently participated in several projects that I think went well at the day gig – these were big projects, multiple parties, and we came together and got to deployment on a shortened timeline. I think it is worth putting together my notes on why I think this went well.

My personal guiding star for project management is very simple – have a list of things to do, and try to do them as fast as reasonably possible. This is important, as anything that distracts from this ultimate goal is not good. Does not matter if it is agile ceremonies or waterfall excessive requirement gathering. In practice if you are too focused on the bureaucracy either of them can get in the way of the core goals – have a list of things to do and do them in a reasonable amount of time.

For a specific example, we used to have bi-weekly sprints for my team. We stopped doing them for these projects that have IMO gone well, as another group took over project management for the multiple groups. The PM just had an excel spreadsheet, with dates. I did not realize how much waste we had by forcing everything to a two week cycle. We spent too much time trying to plan two weeks out, and ultimately not filling up peoples plates. It is just so much easier to say “ok we want to do this in two days, and then this in the next three days” etc. And when shit happens just be like “ok we need to push X task by a week, as it is much harder than anticipated”, or “I finished Y earlier, but I think we should add a task Z we did not anticipate”.

If sprints make sense for your team go at it – they just did not for my team. They caused friction in a way that was totally unnecessary. Just have a list of things to do, and do them as fast as reasonably possible.

Everything Parallel

So this has avoided the hard part, what to put on the to-do list? Let me discuss another very important high level goal of project management first – you need to do everything in parallel as much as possible.

For a concrete example that continually comes up in my workplace, you have software engineering (writing code) vs software deployment (how that code gets distributed to the right people). I cannot speak to other places (places with a mono-repo/SaaS it probably looks different), but Gainwell is really like 20+ companies all Frankenstein-ed together through acquisitions and separate big state projects over time (and my data science group is pretty much solution architects for AI/ML projects across the org).

It is more work for everyone in this scenario trying to do both writing code and deployment at the same time. Software devs have to make up some reasonably scoped requirements (which will later change) for the DevOps folks to even get started. The DevOps folks may need to work on Docker images (which will later change). So it is more work to do it in parallel than it is sequential, but drastically reduces the overall deliverable timelines. So e.g. instead of 4 weeks + 4 weeks = 8 weeks to deliver, it is 6 weeks of combined effort.

This may seem like “duh Andy” – but I see it all the time people not planning out far enough though to think this through (which tends to look more like waterfall than agile). If you want to do things in months and not quarters, you need everyone working on things in parallel.

For another example at work, we had a product person want to do extensive requirements gathering before starting on the work. This again can happen in parallel. We have an idea, devs can get started on the core of it, and the product folks can work with the end users in the interim. Again more work, things will change, devs may waste 1 or 2 or 4 weeks building something that changes. Does not matter, you should not wait.

I could give examples of purely “write code as well”, e.g. I have one team member write certain parts of the code first, which are inconvenient, because that component not being finished is a blocker for another member of the team. Basically it is almost always worth working harder in the short term if it allows you to do things in parallel with multiple people/teams.

Sometimes the “in parallel” is when team members have slack, have them work on more proof of concept things that you think will be needed down the line. For the stuff I work on this can IMO be enjoyable, e.g. “you have some time, lets put a proof of concept together on using Codex + Agents to do some example work”. (Parallel is not quite the word for this, it is forseeing future needs.) But it is similar in nature, I am having someone work on something that will ultimately change in ways in the future that will result in wasted effort, but that is OK, as the head start on trying to do vague things is well worth it.

What things to put on the list

This is the hardest part – you need someone who understands front to back what the software solution will look like, how it interacts with the world around it (users, databases, input/output, etc.) to be able to translate that vision into a tangible list of things to-do.

I am not even sure if I can articulate how to do this in a general enough manner to even give useful advice. When I don’t know things front to back though I will tell you what, I often make mistakes going down paths that often waste months of work (which I think is sometimes inevitable, no one had the foresight to know it was a bad path until we got quite a ways down it).

I used to think we should do the extensive, months long, requirements gathering to avoid this. I know a few examples where I talked for months with the business owners, came up with a plan, and then later on realized it was based on some fundamental misunderstanding of the business. And the business team did not have enough understanding of the machine learning model to know it did not make sense.

I think mistakes like these are inevitable though, as requirements gathering is a two way street (it is not reasonable for any of the projects I work on to expect the people requesting things to put together a full, scoped out list). So just doing things and iterating is probably just as fast as waiting for a project to be fully scoped out.

Do them as fast as possible

So onto the second part, of “have a list of things to-do and do them as fast as possible”. One of the things with “fast as possible”, people will fill out their time. If you give someone two weeks to do something, most people will not do it faster, they will spend the full two weeks doing that task.

So you need someone technical saying “this should be done in two days”. One mistake I see teams make, is listing out projects that will take several weeks to-do. This is only OK for very senior people. Majority of devs tasks should be 1/2/3 days of work at max. So you need to take a big project and break it down into smaller components. This seems like micro-managing, but I do not know how else to do it and keep things on track. Being more specific is almost always worth my time as opposed to less specific.

Sometimes this even works at higher levels, one of the projects that went well, initial estimates were 6+ months. Our new Senior VP of our group said “nope, needs to be 2-3 months”. And guess what? We did it (he spent money on some external contractors to do some work, but by god we did it). Sometimes do them as fast as possible is a negotiation at the higher levels of the org – well you want something by end of Q3, well we can do A and C, but B will have to wait until later then, and the solution will be temporarily deployed as a desktop app instead of a fully served solution.

Again more work for our devs, but shorter timeline to help others have an even smaller MVP totally makes sense.

AI will not magically save you

Putting this last part in, as I had a few conversations recently about large code conversion projects teams wanted to know if they could just use AI to make short work of it. The answer is yes it makes sense to use AI to help with these tasks, but they expected somewhat of a magic bullet. They each still needed to make a functional CICD framework to test isolated code changes for example. They still needed someone to sit down and say “Joe and Gary and Melinda will work on this project, and have XYZ deliverables in two months”. A legacy system that was built over decades is not a weekend project to just let the machine go brr and churn out a new codebase.

Some of them honestly are groups that just do not want to bite the bullet and do the work. I see projects that are mismanaged (for the criminal justice folks that follow me, on-prem CAD software deployments should not take 12 months). They take that long because the team is mismanaged, mostly people saying “I will do this high level thing in 3 months when I get time”, instead of being like “I will do part A in the next two days and part B in the three days following that”. Or doing things sequentially that should be done in parallel.

To date, genAI has only impacted the software engineering practices of my team at the margins (potentially writing code slightly faster, but probably not). We are currently using genAI in various products though for different end users. (We have deployed many supervised learning models going back years, just more recently have expanded into using genAI for different tasks though in products.)

I do not foresee genAI taking devs jobs in the near future, as there is basically infinite amounts of stuff to work on (everything when you look closely is inefficient in a myriad of ways). Using the genAI tools to write code though looks very much like project management, identifying smaller and more manageable tasks for the machine to work on, then testing those, and moving onto the next steps.

Using DuckDB WASM + Cloudflare R2 to host and query big data (for almost free)

The motivation here, prompted by a recent question Abigail Haddad had on LinkedIn:

For the machines, the context is hosting a dataset of 150 million rows (in another post Abigail stated it was around 72 gigs). And you want the public to be able to make ad-hoc queries on that data. Examples where you may want to do this are public dashboards (think a cities open data site, just puts all the data on R2 and has a front end).

This is the point where traditional SQL databases for websites probably don’t make sense. Databases like Supabase Postgres or MySQL can have that much data, given the cost of cloud computing though and what they are typically used for, it does not make much sense to put 72 gigs and use them for data analysis type queries.

Hosting the data as static files though in an online bucket, like Cloudflare’s R2, and then querying the data makes more sense for that size. Here to query the data, I also use a WASM deployed DuckDB. What this means is I don’t really have to worry about a server at all – it should scale to however many people want to use the service (I am just serving up HTML). The client’s machine handles creating the query and displaying the resulting data via javascript, and Cloudflare basically just pushes data around.

If you want to see it in action, you can check out the github repo, or see the demo deployed on github pages to illustrate generating queries. To check out a query on my Cloudflare R2 bucket, you can run SELECT * FROM 'https://data-crimedecoder.com/books.parquet' LIMIT 10;:

Cloudflare is nice here, since there are no egress charges (big data you need to worry about that). You do get charged for different read/write operations, but the free tiers seem quite generous (I do not know quite how to map these queries to Class B operations in Cloudflare’s parlance, but you get 10 million per month and all my tests only generated a few thousand).

For some notes on this set-up. On Cloudflare, to be able to use DuckDB WASM, I needed to expose the R2 bucket via a custom domain. Using the development url did not work (same issue as here). I also set my CORS Policy to:

[
  {
    "AllowedOrigins": [
      "*"
    ],
    "AllowedMethods": [
      "GET",
      "HEAD"
    ],
    "AllowedHeaders": [
      "*"
    ],
    "ExposeHeaders": [],
    "MaxAgeSeconds": 3000
  }
]

While my Crime De-Coder site is PHP, all the good stuff happens client-side. So you can see some example demo’s of the GSU book prices data.

One of the annoying things about this though, with S3 you can partition the files and query multiple partitions at once. Here something like SELECT * FROM read_parquet('https://data-crimedecoder.com/parquet/Semester=*/*') LIMIT 10; does not work. You can union the partitions together manually. So not sure if there is a way to set up R2 to work the same way as the S3 example (set up a FTP server? let me know in the comments!).

For pricing, for the scenario Abigail had of 72 gigs of data, we then have:

  • $10 per year for the domain
  • 0.015*72*12 = $13 for storage of the 72 gigs

So we have a total cost to run this of $23 per year. And it can scale to a crazy number of users and very large datasets out of the box. (My usecase here is just $10 for the domain, you get 10 gigs for free.)

Since this can be deployed on a static site, there are free options (like github pages). So the page with the SQL query part is essentially free. (I am not sure if there is a way to double dip on the R2 custom domain, such as just putting the HTML in the bucket. Yes, you can just put the HTML in the bucket and it will render like normal.)

While this example only shows generating a table, you can do whatever additional graphics client side. So could make a normal looking dashboard with dropdowns, and those just execute various queries and fill in the graphs/tables.

Follow up on Reluctant Criminologists critique of build stuff

Jon Brauer and Jake Day recently wrote a response to my build stuff post, Should more criminologists build stuff on their Reluctant Criminologists blog. Go ahead and follow Jon’s and Jake’s thoughtful work. They asked for comment before posting – I mainly wanted to post my response to be more specific about “how much” I think criminologists should build stuff. (Their initial draft said I should “abandon” theoretical research, which is not what I meant (and they took out before publishing), but could see how one could be confused by my statement “emphasis should he flipped” after saying near 0% of work now is building stuff.)

So here is my response to their post:

To start I did not say abandon theoretical research, I said “the emphasis be on doing”. So that is a relative argument, not an absolute. It is fine to do theoretical work, and it is fine to do ex-ante policy evaluations (which should be integrated into the process of building things, seeing if something works well enough to justify its expense I would say is a risky test). I do not have a bright line that I think building stuff is optimal for the field, but it should be much more common than it is now (which is close to 0%). To be specific on my personal opinion, I do think “build stuff” should be 50%+ of research criminologists’ time (relative to writing papers).

I am actually more concerned with the larping idea I gave. You have a large number of papers in criminology that justify their motivation not really as theoretical, but as practical to operations. And they are just not even close. So let’s go with the example of precise empirical distributions of burglaries at the neighborhood level. (It is an area I am familiar with, and there are many “I have a new model for crime in space” papers.) Pretend I did have a forecast, and I said there are going to be 10 burglaries in your neighborhood next month. What exactly are you supposed to do with that information? Forecasting the distribution does not intrinsically make it obvious how to prevent crime (nature does not owe us things we can manipulate). Most academic criminologists would say it is useful for police allocation, which is so vague as to be worthless.

You also do not need a fully specified causal chain to prevent crime. Most of the advancement in crime prevention looks more like CPTED applications than understanding “root causes” (which that phrase I think is a good example of an imprecise theory). I would much rather academics try to build specific CPTED applications than write another regression paper on crime and space (even if it is precise).

For the dark side part, in the counterfactual world in which academics don’t focus on direct applications, it does not mean those applications do not get built. They just get built by people who are not criminologists. It was actually the main reason I wrote the post – folks are building things now that should have more thoughtful input from academic criminologists.

For a specific example, different tech companies are demo’ing products with the ultimate goal of improving police officers mental health. These include flagging if officers go to certain types of calls too often, or using a chatbot as a therapist. Real things I would like criminologists like yourselves being involved in product development, so you can say “How do we know this is actually improving the mental health of officers?”. I vehemently disagree that more academic criminologists being involved will make development of these applications worse.

The final part I want to say is that apps need not intrinsically be focused on anything. I gave examples in policing that I am aware of, because that is my stronger area of expertise, but it can be anything. So let’s go with personal risk assessments. Pre-trial, parole/probation risk assessments look very similar to what Burgess built 100 years ago at this point. So risk stratification is built on the idea that you need to triage resources (some people need more oversight, some less), especially for the parole scenario. Now it is certainly feasible someone comes up with a better technological solution that risk stratification is not needed at all (say better sensors or security systems that obviate the need for the more intensive human oversight). Or a more effective regimen that applies to everyone, say better dynamic risk assessments, so people are funneled faster to more appropriate treatment regimes than just having a parole officer pop in from time to time.

I give this last example because I think it is a an area where focusing on real applications I suspect will be more fruitful long term for theory development. So we have 100 years and thousands of papers on risk assessment, but really only very incremental progress in that area. I believe a stronger focus on actual application – thinking about dynamic measures to accomplish specific goals (like the treatment monitoring and assignment), is likely to be more fruitful than trying to pontificate about some new theory of man that maybe later can be helpful.

We don’t have an atom to reduce observations down to (nor do we have an isolated root node in a causal diagram). We are not going to look hard enough and eventually find Laplace’s Demon. Focusing on a real life application, how people are going to use the information in practice, I think is a better way for everyone to frame their scientific pursuits. It is more likely a particular application changes how we think about the problem all together, and then we mold the way we measure to help accomplish that specific task. Einstein just started with the question “how do we measure how fast things travel when everything is moving”, a very specific question. He did not start out by saying “I want a theory of the universe”.

I am more bullish on real theoretical breakthroughs coming from more mundane and practical questions like “how do we tell if a treatment is working” or “how do we know if an officer is having a mental health crisis” than I am about someone coming up with a grander theory of whatever just from reading peer reviewed papers in their tower.

And here is Jon’s response to that:

Like you, we try to be optimistic, encouraging, and constructive in tone, though at times it requires serious effort to keep cynicism at bay. In general, if we had more Andrew Wheeler’s thoughtfully building things and then evaluating them, then I agree this would be a good thing. Yet, if I don’t trust someone enough to meaningfully observe, record, and analyze the gauges, then I’m certainly not going to trust them to pilot – or to understand well enough to successfully build and improve upon the car/airplane/spaceship. Meanwhile, the normative analysis is that everything is significant/everything works – unless it’s stuff we collectively don’t like. In that context, the cynic in me things we are better off of we simply focus on teaching many (most?) social scientists to observe and analyze better – and may even do less harm despite wasted resources by letting them larp.

Jon and Jake do not have a an estimate in their post on what they think the mix should be building vs theorizing (they say pluralist in the post). I think the near 0 we do now is not good.

Much of this back and forth tends to mirror the current critique of advocacy in science. The Charles Tittle piece they cite, The arrogance of public sociology, could have been written yesterday.

Both the RC group and Tittle’s have what I would consider a perfect enemy of the good argument going on. People can do bad work, people can do good work. I want folks to go out and do good, meaningful work. I have met plenty of criminologists (and the flipside the level of competence of many software engineers) to not have RC’s level of cynicism.

As an individual, I don’t think it makes much sense to worry about the perception of the field as a whole. I cannot control my fellow criminologists, I can only control what I personally do. Tittle in his critique thought public sociology would erode any legitimacy of the field. He maybe was right, but I posit producing mostly irrelevant work will put criminology on the same path.