All posts in category Python

xAI voice cloning API

xAI has just released an API to clone your voice. It is pretty simple, read a script, and then an API where you can have text to speech in that voice.

Here is the python code after you have cloned your voice.

			
import os
import requests
voice_id = os.environ['ANDY1_VOICE'] # my demo voice ID
text = '''this is a test demo of my voice. Be excited! 
OK, how about a list of things; one, two, three. 
Lets see where this takes us.'''
response = requests.post(
    "https://api.x.ai/v1/tts",
    headers={
        "Authorization": f"Bearer {os.environ['XAI_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "text": llm_book,
        "voice_id": voice_id,
        "language": "en",
    },
)
response.raise_for_status()
with open("AndyTest1.mp3", "wb") as f:
    f.write(response.content)

		

I need to figure out my audio set up a bit better (my mic set up is probably not optimal and it produces some echo). But does a good job imitating my boring voice right out of the box!

And here is an example for longer speech from my intro to LLMs book:

			
# intro to llm book
llm_book = '''
Large language models (LLMs) are transforming how we work. Some of these examples include using LLMs to help write computer code, using LLMs to extract out information from irregular text sources, and creating chat-bots that can interact with various data sources and documents.
Most analysts, however, do not have any experience with these tools. This book is meant to be a general introduction to realistic examples of how individuals can use these tools; either in general software applications, or to help analysts write code to create software itself. Given the rapid pace of advancement in this area, a general introduction to help individuals who work in the knowledge economy understand the capabilities of these tools I believe is in order.
Here is a simple example of using an LLM API (*Application Programming Interface* -- just a standard way to send information and get information back on the web) using the anthropic library in python to extract key information from a free text crime narrative:
'''
response = requests.post(
    "https://api.x.ai/v1/tts",
    headers={
        "Authorization": f"Bearer {os.environ['XAI_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "text": llm_book,
        "voice_id": voice_id,
        "language": "en",
    },
)
response.raise_for_status()
with open("AndyTest_LLMIntro.mp3", "wb") as f:
    f.write(response.content)

		

The LLM intro messed up *Application Programming Interface* section (start listening at 50 seconds in). But otherwise it is very nice.

For those worried about security, xAI did something smart here — you need to input text live into the API given their prompts. You cannot have a pre-recording audio input to do this. So cloning someone elses voice is pretty hard.

Costs are around $4 per million characters in the text to speech API. So say narrating my entire book should be under $10 I believe.

Took me a total of less than an hour to set up a voice, create the python code, and write this blog post!

Leave a comment

by Andy Wheeler on May 4, 2026 • Permalink

Posted in data science, Python

Posted by Andy Wheeler on May 4, 2026

https://andrewpwheeler.com/2026/05/04/xai-voice-cloning-api/

Gathering interest in tech courses

Quick post this morning — I have a survey up gathering input on interest in short, technical courses.

Tech course survey interest form

Think 2-3 days, potentially in person/synchronous.

If you have taken a course with Paul Allison at Horizon’s, or an ICPSR summer course, those are similar examples. But, the main difference will be these courses are to prepare you for pursuing private sector roles.

These will be aimed at:

grad level social science students
current professors looking to pursue private sector roles
current data analysts looking to get into data science
undergrads with some more technical background

Survey lists potential courses (python for data analysis, intro to LLM APIs, SQL + Dashboards, using agent based tools for analysis), the course medium (in person vs video), price points.

If you are a university or organization interested in hosting such sessions for your students, let me know as well. Happy to chat to you about bringing this to your campus.

Leave a comment

by Andy Wheeler on April 27, 2026 • Permalink

Posted in Crime Analysis, Crime Mapping, data science, Python, scholarly

Tagged online-teaching, teaching

Posted by Andy Wheeler on April 27, 2026

https://andrewpwheeler.com/2026/04/27/gathering-interest-in-tech-courses/

Job Advice Resources page

Minor update, I have created a page, Job Advice Resources to cumulatively list all the materials I have written on advice for social scientists and crime analysts looking to pivot into private sector tech roles.

I still get maybe ~2 folks a month ask for advice, and I am always happy to chat. I wish PhD granting institutions took this more seriously (it only takes minor changes to better prepare students).

If you are an administrator of a PhD program and actually care about getting your students jobs, also feel free to reach out and I am happy to discuss how I can help.

Leave a comment

by Andy Wheeler on April 22, 2026 • Permalink

Posted in Crime Analysis, Criminal Justice, data science, Personal Productivity, Python, scholarly

Tagged tech-jobs

Posted by Andy Wheeler on April 22, 2026

https://andrewpwheeler.com/2026/04/22/job-advice-resources-page/

Stop Teaching R. Teach Python.

There has been a slight transition in social science teaching since I have been a student and professor over the past ~15+ years. In the aughts, it was still common to teach students in legacy, closed source statistical software (SPSS, SAS, and Stata). When I was a PhD student in criminal justice at SUNY Albany, we had a specific class to learn SPSS, although most of the rest of the quantitative courses used Stata.

The R programming language has likely usurped the use of the closed source languages in social science education after the aughts though. (I do not have hard data, but that is my impression seeing what colleagues are using and what they teach in classes.)

I am familiar with all of the major statistical programs (I have written an R package, and you can see this blog for many examples of SPSS and a few for Stata). If the goal in coursework is to teach your students skills relevant to help them get a job, academics in social science institutions should teach their students Python. The current job market for quantitative work is dominated by Python positions.

To be clear, I am not fundamentally opposed to closed source programming languages (there are scenarios where SPSS/SAS make more sense than Hadoop systems I have seen, also if you are a GIS analyst you should learn ESRI tools). This is purely just an observation given the current private sector job market – focusing primarily on Python makes the most sense for social science students.

As an experiment, I went onto LinkedIn and did a search for “data scientist”. Your results will differ (mine are tailored to the Raleigh area, and also includes more senior positions), but here is a table of the positions that came up on the first page, and a quick summary of the tech stacks they require. While this is not a systematic sample, it gives a reasonable snapshot of current expectations.

| Company             | Job Title                           | Tech Stack                                           | URL                                            |
|---------------------|-------------------------------------|------------------------------------------------------|----------------------------------------------- |
| Google              | Data Scientist (Google Voice 2)     | Python, R, SQL                                       | https://www.linkedin.com/jobs/view/4387751995/ |
| Deloitte            | AI Specialist                       | None specified                                       | https://www.linkedin.com/jobs/view/4376183670/ |
| Ascensus            | Principal Analytics                 | R, Python, SQL, GenAI/LLM                            | https://www.linkedin.com/jobs/view/4380164400/ |
| EY                  | AI Lead Engineer                    | Python, C#, R, GenAI/LLM                             | https://www.linkedin.com/jobs/view/4385954762/ |
| PwC                 | GenAI Python Systems Engineer (2)   | Python, SQL, Cloud Platforms, GenAI/LLM              | https://www.linkedin.com/jobs/view/4373604638/ |
| Affirm              | Senior Machine Learning Engineer    | Python, Spark/Ray                                    | https://www.linkedin.com/jobs/view/4326673670/ |
| Lexis Nexis         | Lead Data Scientist                 | Cloud Platforms, GenAI/LLM                           | https://www.linkedin.com/jobs/view/4316327742/ |
| EY                  | AI Finance                          | SQL, Python, Azure, GenAI/LLM                        | https://www.linkedin.com/jobs/view/4385085950/ |
| Korn Ferry          | Sr. Data Scientist                  | Python, R, Spark, AWS, GenAI/LLM                     | https://www.linkedin.com/jobs/view/4387433496/ |
| Deloitte            | Data Science Manager                | Python, Cloud                                        | https://www.linkedin.com/jobs/view/4304674642/ |
| First Citizens Bank | Senior Quant Model Developer        | Python, SAS, SQL                                     | https://www.linkedin.com/jobs/view/4365378242/ |
| First Citizens Bank | Senior Manager Quant Analysis       | Python, SAS, Tableau                                 | https://www.linkedin.com/jobs/view/4388131284/ |
| Jobot               | ML Solution Architect               | Python, Scala, Spark, AWS, Snowflake                 | https://www.linkedin.com/jobs/view/4384023540/ |
| Affirm              | Analyst II                          | SQL, Python, R, CPLEX/Gurobi, Databricks/Snowflake   | https://www.linkedin.com/jobs/view/4373303038/ |
| Red Hat             | Sr Machine Learning Engineer (vLLM) | Python, GenAI/LLM                                    | https://www.linkedin.com/jobs/view/4354827922/ |
| Alliance Health     | Director AI                         | Python (TensorFlow/PyTorch), Office Products, GenAI  | https://www.linkedin.com/jobs/view/4383011480/ |
| Nubank              | ML Data Engineer                    | Python, Ray/Spark                                    | https://www.linkedin.com/jobs/view/4376815752/ |
| Target RWE          | Senior Quant Data Scientist         | R                                                    | https://www.linkedin.com/jobs/view/4385293724/ |
| Siemens             | Senior Data Analytics               | SQL, Python, R, Tableau/PowerBI                      | https://www.linkedin.com/jobs/view/4377969531/ |
| Red Hat             | Sr Machine Learning Engineer        | Python, GenAI/LLM                                    | https://www.linkedin.com/jobs/view/4302769773/ |
| Lexis Nexis         | Director Data Sciences              | Python, R, GenAI/LLM                                 | https://www.linkedin.com/jobs/view/4387335028/ |
| Cigna               | Data Science Senior Advisor         | Python, SQL                                          | https://www.linkedin.com/jobs/view/4381766145/ |
| Thermo Fisher       | Senior Manager Data Engineering     | Fabric, PowerBI, Python, Databricks, Tableau, SAS    | https://www.linkedin.com/jobs/view/4372684009/ |

Of the positions:

9/25 roles included R, but only one required R exclusively. The other 8 were Python/SQL/R
22/25 included Python
11/25 had a focus on Generative AI or LLMs

Python dominates R in the current job market for data science positions. Professors are doing their students a disservice teaching R, the same way they would be doing a disservice teaching their students to code in Fortran.

Another aspect I noticed for this – analyst type jobs not all that long ago really only expected Excel (and maybe SQL). Now even the majority of the analyst jobs expect Python (even more so than dashboard tools like PowerBI in this sample).

For individuals on the job market, I suggest going and doing your own experiment job search like this on LinkedIn to see the tech skills you need to be able to at least get your foot in the door for an interview. I expected GenAI to be slightly more popular (only 11/25), but there were a few other technologies sprinkled in enough it may be good to become familiar with to widen your potential pool (Cloud and Spark – I am surprised Databricks was not listed more often).

If you’re looking to build Python skills from scratch, I cover this in my book: Data Science for Crime Analysis with Python (can purchase in paperback or epub at my store).

If also interested in learning about generative AI, see my book Large Language Models for Mortals: A Practical Guide for Analysts with Python.

You can use the coupon TWOFOR1 to get $30 off when purchasing multiple books from my store.

1 Comment

by Andy Wheeler on March 22, 2026 • Permalink

Posted in data science, Personal Productivity, Python, R

Tagged teaching

Posted by Andy Wheeler on March 22, 2026

https://andrewpwheeler.com/2026/03/22/stop-teaching-r-teach-python/

Some notes on the unreliability of LLM APIs

Because my book, LLMs for Mortals, was created with Quarto, it runs the code when I compile the book. It uses cached versions when no code changes, but it is guaranteed to be working code for the parts that have a grey input and a following green output, it is valid code that executed and generated the results.

I try to use temperature zero for most of the book, but some of the parts of the book are stochastic. Reasoning models you cannot set the temperature, so some elements of Chapter 3 introducing the models, and basically all of the section in chapter 6 on agents is stochastic. This actually gave me a better appreciation of some of the unreliability of these models, as for some instances it would fail, and others I needed to recompile because the output was poor.

The way jupyter caching works under the hood, it has a separate cache for the epub and the LaTeX document (that is used for the print version). So you technically get a slightly different book when you purchase epub vs paperback. When you have 60+ failure points per chapter (and that gets doubled when compiling to both epub and PDF), you get to glimpse a few of the warts of the API models.

These are also short snippets, so do not have error catching or more robust JSON parsing, so some of these issues I basically programmed away in production systems at work and did not even notice them. I figured my notes may be useful though in general for others trying to rely on these systems with large volume API calls.

OpenAI

All the models were generally reliable, but one of the examples of stochastic outputs in OpenAI gave me fits – I asked OpenAI to analyze a blog post on my Crime De-Coder site and get information from the post. Now this is a bit tricky, as the reasoning model needs to see the data is not available in the post directly, but in an image.

January 24th, at one point though this became totally unreliable in its output. It would often fail to download the additional image, and when it did, it was pretty inconsistent actually giving an accurate answer.

But now I can run below and this just returns fine and dandy near every time. Here is a loop I ran 5 times and it gives the correct answer (around 160 Tuesday at 4 AM).

from openai import OpenAI
import time

client = OpenAI()

prompt = """
Search <https://crimede-coder.com/blogposts/2024/Aoristic>, what is 
the maximum number of commercial burglaries in the chart and on what
day and hour? Do not use shorthand, give an actual number.

If you need to, download additional materials to answer the question.

Be concise in your output.
"""

for _ in range(5):
    # minimal reasoning with responses API
    response = client.responses.create(
        model="gpt-5.2",
        reasoning={'effort': 'low'},
        tools=[{"type": "web_search"}],
        input=prompt,
    )
    time.sleep(20) # to prevent going over my limit
    print(response.output_text)
    print('-------------')

My only guess is there was some downgrade in the model capabilities, and it routed behind the scenes for the reasoning models to some less capable model. (Just on January 24th though!)

Otherwise, the stochastic examples in the book using OpenAI were pretty reliable.

Anthropic

In the structured outputs chapter, I go through examples of parsing JSON vs progressively building on Pydantic outputs. I actually give examples where Pydantic schema’s can cause some filling in of data that you do not want (if the data should be null, and you use k-shot examples, it will often fill in from your last example).

So this chapter really is a ton of advice on prompt engineering for structured outputs. One example I show is using stop sequences when generating JSON and doing text parsing (which is really not necessary and best practice with Pydantic schemas, but I still use this with AWS Bedrock, since it does not support that yet).

This code works fine, what is inconsistent is that on very rare occasion, Anthropic’s API returns the bracket at the end of the call. This subsequently generates an error with this code, as it is invalid JSON with an extra bracket.

Production systems at the day job use AWS, and I wrote the text parsing in a way I would not even see this error (so not sure if it also happens with AWS). And it was quite rare with Anthropic, I just compiled the book enough times to notice this error happen on a few occasions.

Google

In the book I show off using Google Map grounding, since it is a unique capability of Googles – it was very unreliable. Not unreliable in the sense it would return an error and not be available, but unreliable in “I cannot find any google maps data right now”. So this would compile, I would just need to go look at the output and make sure it actually returned something useful.

You can see I switched to the Vertex API for this example – I cannot confidently say if Vertex was more reliable than the Gemini API for this. I experienced issues with both (maybe slightly fewer with Vertex).

The Anthropic error is not so bad – it causes an actual error in the system. The reasoning and LLM outputs something, but it is not good, troubles me more. We are really just piloting agentic systems at the day gig now with a small number of users – they have not gotten really stress tested by a large number of users. I don’t even want to think about how I would monitor maps grounding in production given my experience.

AWS

AWS I only had one example not consistently work – calling the DeepSeek API.

In the prior code calling Anthropic models via Bedrock, and later chapters I have an example of Mistral and different embedding models (Cohere and Amazon’s Titan), were all fine. Just this single example from DeepSeek would randomly not work. By not work the API would return a response, but the content would be empty. So the final print statement is where the error occurred, accessing text that did not exist.

Most of my work, even if DeepSeek is cheaper, I need to consider caching. So Haiku is pretty competitive with the other models. So I do not have much experience in Bedrock with any models besides Anthropic ones.

My biggest gripe with AWS is the IAM permissions are too difficult (and have changed over the past year). I was able to reasonably figure out how to use S3 Vectors and batch inference (which is discussed in the book). I was able to figure out Knowledge Bases, but I just took it out of the book (both too expensive for hobby projects to have the search endpoint). OpenAI’s vector search store is super easy though, so will definately consider that for traditional RAG applications moving forward.

Buy the book!

Use promo code LLMDEVS for 50% off of the epub. Or if you prefer purchase the paperback.

Leave a comment

by Andy Wheeler on February 27, 2026 • Permalink

Posted in data science, Python

Tagged Anthropic, AWS-Bedrock, Gemini, LLM, OpenAI

Posted by Andy Wheeler on February 27, 2026

https://andrewpwheeler.com/2026/02/27/some-notes-on-unreliability-of-llm-apis/

Large Language Models for Mortals book

I have published a new book, Large Language Models for Mortals: A Practical Guide for Analysts with Python. The book is available to purchase in my store, either as a paperback (for $59.99) or an epub (for $49.99).

The book is a tutorial on using python with all the major LLM foundation model providers (OpenAI, Anthropic, Google, and AWS Bedrock). The book goes through the basics of API calls, structured outputs, RAG applications, and tool-calling/MCP/agents. The book also has a chapter on LLM coding tools, with example walk throughs for GitHub Copilot, Claude Code (including how to set it up via AWS Bedrock), and Google’s Antigravity editor. (It also has a few examples of local models, which you can see Chapter 2 I discuss them before going onto the APIs in Chapter 3).

You can review the first 60 some pages (PDF link here if on Iphone).

llms_mortals_preview Download

While many of the examples in the book are criminology focused, such as extracting out crime elements from incident narratives, or summarizing time series charts, the lessons are more general and are relevant to anyone looking to learn the LLM APIs. I say “analyst” in the title, but this is really relevant to:

traditional data scientists looking to expand into LLM applications
PhD students (in all fields) who would like to use LLM applications in their work
analysts looking to process large amounts of unstructured textual data

Basically anyone who wants to build or create LLM applications, this is the book to help you get started.

I wrote this book partially out of fear – the rapid pace of LLM development has really upended my work as a data scientist. It is really becoming the most important set of skills (moreso than traditional predictive machine learning) in just the past year or two. This book is the one I wish I had several years ago, and will give analysts a firm grounding in using LLMs in realistic applications.

Again, the book is available in:

For purchase worldwide. Here are all the sections in the book – whether you are an AWS or Google shop, or want to learn the different database alternatives for RAG, or want more self contained examples of agents with python code examples for OpenAI, Anthropic, or Google, this should be a resource you highly consider purchasing.

To come are several more blog posts in the near future, how I set up Claude Code to help me write (and not sound like a robot). How to use conformal inference and logprobs to set false positive rates for classification with LLM models, and some pain points with compiling a Quarto book with stochastic outputs (and points of varying reliability for each of the models).

But for now, just go and purchase the book!

Below is the table of contents to review – it is over 350 pages for the print version (in letter paper), over 250 python code snippets and over 80 screenshots.

			
Large Language Models for Mortals: A Practical Guide for Analysts with Python
by Andrew Wheeler
TABLE OF CONTENTS
Preface
    Are LLMs worth all the hype?
    Is this book more AI Slop?
    Who this book is for
    Why write this book?
    What this book covers
    What this book is not
    My background
    Materials for the book
    Feedback on the book
    Thank you
Basics of Large Language Models
1 What is a language model?
2 A simple language model in PyTorch
3 Defining the neural network
4 Training the model
5 Testing the model
6 Recapping what we just built
Running Local Models from Hugging Face
1 Installing required libraries
2 Downloading and using Hugging Face models
3 Generating embeddings with sentence transformers
4 Named entity recognition with GLiNER
5 Text Generation
6 Practical limitations of local models
Calling External APIs
1 GUI applications vs API access
2 Major API providers
3 Calling the OpenAI API
4 Controlling the Output via Temperature
5 Reasoning
6 Multi-turn conversations
7 Understanding the internals of responses
8 Embeddings
9 Inputting different file types
10 Different providers, same API
11 Calling the Anthropic API
12 Using extended thinking with Claude
13 Inputting Documents and Citations
14 Calling the Google Gemini API
15 Long Context with Gemini
16 Grounding in Google Maps
17 Audio Diarization
18 Video Understanding
19 Calling the AWS Bedrock API
20 Calculating costs
Structured Output Generation
1 Prompt Engineering
2 OpenAI with JSON parsing
3 Assistant Messages and Stop Sequences
4 Ensuring Schema Matching Using Pydantic
5 Batch Processing For Structured Data Extraction using OpenAI
6 Anthropic Batch API
7 Google Gemini Batch
8 AWS Bedrock Batch Inference
9 Testing
10 Confidence in Classification using LogProbs
11 Alternative inputs and outputs using XML and YAML
12 Structured Workflows with Structured Outputs
Retrieval-Augmented Generation (RAG)
1 Understanding embeddings
2 Generating Embeddings using OpenAI
3 Example Calculating Cosine similarity and L2 distance
4 Building a simple RAG system
5 Re-ranking for improved results
6 Semantic vs Keyword Search
7 In-memory vector stores
8 Persistent vector databases
9 Chunking text from PDFs
10 Semantic Chunking
11 OpenAI Vector Store
12 AWS S3 Vectors
13 Gemini and BigQuery SQL with Vectors
14 Evaluating retrieval quality
15 Do you need RAG at all?
Tool Calling, Model Context Protocol (MCP), and Agents
1 Understanding tool calling
2 Tool calling with OpenAI
3 Multiple tools and complex workflows
4 Tool calling with Gemini
5 Returning images from tools
6 Using the Google Maps tool
7 Tool calling with Anthropic
8 Error handling and model retry
9 Tool Calling with AWS Bedrock
10 Introduction to Model Context Protocol (MCP)
11 Connecting Claude Desktop to MCP servers
12 Examples of Using the Crime Analysis Server in Claude Desktop
13 What are Agents anyway?
14 Using Multiple Tools with the OpenAI Agents SDK
15 Composing and Sequencing Agents with the Google Agents SDK
16 MCP and file searching using the Claude Agents SDK
17 LLM as a Judge
Coding Tools and AI-Assisted Development
1 Keeping it real with vibe coding
2 VS Code and GitHub Install
3 GitHub Copilot
4 Claude Code Setup
5 Configuring API access
6 Using Claude Code to Edit Files
7 Project context with CLAUDE.md
8 Using an MCP Server
9 Custom Commands and Skills
10 Session Management
11 Hooks for Testing
12 Claude Headless Mode
13 Google Antigravity
14 Best practices for AI-assisted coding
Where to next?
1 Staying current
2 What to learn next?
3 Forecasting the near future of foundation models
4 Final thoughts

		

Leave a comment

by Andy Wheeler on February 11, 2026 • Permalink

Posted in Crime Analysis, data science, Python, writing

Tagged book, epub

Posted by Andy Wheeler on February 11, 2026

https://andrewpwheeler.com/2026/02/11/large-language-models-for-mortals-book/

Year in Review 2025 and AI Predictions

For a brief year in review, total views for the two different websites have decreased in the past year. For this blog, I am going to be a few thousand shy of 100,000 views. (2023 I had over 150k views, and 2024 I had over 140k views.) For the Crime De-Coder site, I am going to only get around 15k views.

Part of it is I posted less, this will be the 21st blog post this year on the personal blog (2023 had 46 and 2024 had 32 posts). The Crime De-Coder site had 12 blog posts, so pretty consistent with the prior year. Both are pretty bursty, with large bouts of traffic coming from if I post something to Hacker News I can get 1k to 10k views in a day or two if it makes it to the front page. So the 2024 stats for the crime de-coder was a few of those Hacker News bumps I did not get in 2025.

Some of it could legitimately be traditional Google search being usurped by the gen AI tools. This is the first year I had appreciable referrals from chatgpt, but they are less than 1000. The other tools are trivial amount of referrals. If I worried about SEO more, I would have more updating/regular content (as old pages are devalued quite a bit by google, and it seems to be getting more severe over time).

I have upped my use of the free tools quite a bit. ChatGPT knows me pretty well, and I use Claude Desktop almost every day as well.

An IAM policy scroll is more of a nightmare, and I definitely ask more python questions than R, but the cartoon desk is pretty close to spot on. I am close to paying for Anthropic subscription for Claude code credits (currently use pay as I go via Bedrock, and this is the first month I went over $20).

What pages on the blog are popular I can never be sure of. My most popular post last year was Downloading Police Employment Trends from the FBI Data Explorer. A 2023 post, that had random times where it would have several hundred visits in a short hour span. (Some bot collecting sites? I do not know.) If it is actual people, you would want to check out my Sworn Dashboard site, where you can look at trends for PDs much easier than downloading all the data yourself!

One thing that has grown though, I do short form posting on LinkedIn on my crime de-coder page. Impressions total for the year is over 340k (see the graph), and I currently am a few shy of 4400 followers.

LinkedIn is nice because it can be slightly longer form than the other social media sites. I would suggest you follow me there (in addition to signing up for RSS feeds for the two sites). That is the easiest way to follow my work.

I also took over as a moderator of the Crime Analysis Reddit forum, it is better than the IACA forums in my opinion, so encourage folks to post there for crime analysis questions.

Crime De-Coder Work

Crime De-Coder work has been steady (but not increasing). Similar to last year had several consulting gigs conducting crime analysis for premises liability cases (and one other case I may share my opinions once it is over), and doing some small projects with non-profits and police departments.

One big project was a python training in Austin.

The Python Book (which I also translated to Spanish/French), had a trickle of new sales. 2024 had around 100 sales and 2025 had around 50 sales. It is close to 2/3 print sales and 1/3 epub, so definately folks should have physical prints if you are selling books still.

Doing trainings basically makes writing the book worth it, but I do hope eventually the book makes it way into grad school curriculum’s. (Only one course so far.) I have pitched to grad schools to have me run a similar bootcamp to what I do for crime analysts, so if interested let me know.

The biggest new thing was Crime De-Coder got an Arnold Grant. Working with Denver PD on an experiment to evaluate a chronic offender initiative.

At the Day Gig

At my day gig, I was officially promoted to a senior manager and then quickly to a director position. Hence you get posts like what to show in your tech resume and notes on project management.

One of the reasons I am big on python – it is the dominant programming language in data science. It is hard for me to recruit from my network, as majority of individuals just know a little R (if you were a hard core R person, had packages/well executed public repo’s, I could more easily think you will be able to migrate to python to work on my team).

So learn python if you want to be a data scientist is my advice (and see other job market advice at my archived newsletter).

AI Predictions

At the day gig, my work went from 100% traditional supervised machine learning models to more like 50/50 traditional vs generative AI applications. The genAI hype is real, but I think it is worthwhile putting my thoughts to paper.

The biggest question is will AI take all of our jobs? I think a more likely end scenario is the AI tools just become better at helping humans do tasks. The leap from helping a human do something faster vs an AI tool doing it 100% on its own with 0 human input is hard. The models are getting incrementally better, but I think to fully replace people in a substantive way will require another big advancement in fundamental capabilities. Making a human 10x more productive is easier and still will make the AI companies a ton of money.

Sometimes people view the 10x idea and say that will take jobs, just not 100% of jobs. That is a view though that there is only a finite amount of work to be done. That assumption is clearly not true, and being able to do work faster/cheaper just induces demand for more potential work. The example with calculators making more banking jobs, not less, is basically the same example.

One of the critiques of the current systems is they are overvalued, so we are in a bubble. I do not remember where I read it, but one estimate was if everyone in the US spent $1 a day on the different AI tools, that would justify the current valuations for OpenAI, Anthropic, NVIDIA, etc. I think that is totally doable, we spend a few thousand a workday at Gainwell on the foundation models for example for a few projects, and we are just going to continue to roll out more and more. Gainwell is a company with around 6k employees for reference, and our current AI applications touch way less than 1k of those employees. We have plenty of room to grow those applications.

It is super hard though to build systems to help people do things faster. And we are talking like “this thing that used to take 30 minutes now takes 15 minutes”. If you have 100 people doing that thing all the time though, the costs of the models are low enough it is an easy win.

And this mostly only holds true for knowledge economy work that can be all done via software. There just still needs to be fundamental improvements to robotics to be able to do physical things. The tailor’s job is safe for the foreseeable future.

The change in the data science landscape to more generative AI applications definitely requires social scientists and analysts to up their game though to learn a new set of tools. I do have another book in the works to address that, so hopefully you will see that early next year.

Leave a comment

by Andy Wheeler on December 24, 2025 • Permalink

Posted in Crime Analysis, Criminal Justice, Personal Productivity, Python, R, scholarly

Tagged year-in-review

Posted by Andy Wheeler on December 24, 2025

https://andrewpwheeler.com/2025/12/24/year-in-review-2025-and-ai-predictions/

Advice for crime analyst to break into data science

I recently received a question about a crime analyst looking to break into data science. Figured it would be a good topic for my advice in a blog post. I have written many resources over the years targeting recent PhDs, but the advice for crime analysts is not all that different. You need to pick up some programming, and likely some more advanced tech skills.

For background, the individual had SQL + Excel skills (which many analysts may just have Excel). Vast majority of analyst roles, you should be quite adept at SQL. But just SQL is not sufficient for even an entry level data science role.

For entry data science, you will need to demonstrate competency in at least one programming language. The majority of positions will want you to have python skills. (I wrote an entry level python book exactly for someone in your position.)

You likely will also need to demonstrate competency in some machine learning or using large language models for data science roles. It used to be Andrew Ng’s courses were the best recommendation (I see he has a spin off DeepLearningAI now). So that is second hand though, I have not personally taken them. LLMs are more popular now, so prioritizing learning how to call those APIs, build RAG systems, prompt engineering I think is going to make you slightly more marketable than traditional machine learning.

I have personally never hired anyone in a data science role without a masters. That said, I would not have a problem if you had a good portfolio. (Nice website, Github contributions, etc.)

You should likely start just looking and applying to “analyst” roles now. Don’t worry about if they ask for programming you do not have experience in, just apply. Many roles the posting is clearly wrong or totally unrealistic expectations.

Larger companies, analyst roles can have a better career ladder, so you may just decide to stay in that role. If not, can continue additional learning opportunities to pursue a data science career.

Remote is more difficult than in person, but I would start by identifying companies that are crime analysis adjacent (Lexis Nexis, ESRI, Axon) and start applying to current open analyst positions.

For additional resources I have written over the years:

alt ac newsletter, not running anymore but has most of my advice on those webpages
blog post on different types of tech jobs, approximate salaries
resume advice

The alt-ac newsletter has various programming and job search tips. THe 2023 blog post goes through different positions (if you want, it may be easier to break into project management than data science, you have a good background to get senior analyst positions though), and the 2025 blog post goes over how to have a portfolio of work.

Cover page, data science for crime analysis with python

Leave a comment

by Andy Wheeler on November 21, 2025 • Permalink

Posted in Crime Analysis, Crime Mapping, Criminal Justice, data science, Python

Tagged job-advice

Posted by Andy Wheeler on November 21, 2025

https://andrewpwheeler.com/2025/11/21/advice-for-crime-analyst-to-break-into-data-science/

What to show in your tech resume?

Jason Brinkley on LinkedIn the other day had a comment on the common look of resumes – I disagree with his point in part but it is worth a blog post to say why:

So first, when giving advice I try to be clear about what I think are just my idiosyncratic positions vs advice that I feel is likely to generalize. So when I say, you should apply to many positions, because your probability of landing a single position is small, that is quite general advice. But here, I have personal opinions about what I want to see in a resume, but I do not really know what others want to see. Resumes, when cold applying, probably have to go through at least two layers (HR/recruiter and the hiring manager), who each will need different things.

People who have different colored resumes, or in different formats (sometimes have a sidebar) I do not remember at all. I only care about the content. So what do I want to see in your resume? (I am interviewing for mostly data scientist positions.) I want to see some type of external verification you actually know how to code. Talk is cheap, it is easy to list “I know these 20 python libraries” or “I saved our company 1 million buckaroos”.

So things I personally like seeing in a resume are:

code on github that is not a homework assignment (it is OK if unfinished)
technical blog posts
your thesis! (or other papers you were first/solo author)

Very few people have these things, so if you do and you land in my stack, you are already at the like 95th percentile (if not higher) for resumes I review for jobs.

The reason having outside verification you actually know what you are doing is because people are liars. For our tech round, our first question is “write a python hello world program and execute it from the command line” – around half of the people we interview fail this test. These are all people who list they are experts in machine learning, large language models, years of experience in python, etc.

My resume is excessive, but I try to practice what I preach (HTML version, PDF version)

I added some color, but have had recruiters ask me to take it off the resume before. So how many people actually click all those links when I apply to positions? Probably few if any – but that is personally what I want to see.

There are really only two pieces of advice I have seen repeatedly about resumes that I think are reasonable, but it is advice not a hard rule:

I have had recruiters ask for specific libraries/technologies at the top of the resume
Many people want to hear about results for project experience, not “I used library X”

So while I dislike the glut of people listing 20 libraries, I understand it from the point of a recruiter – they have no clue, so are just trying to match the tech skills as best they can. (The matching at this stage I feel may be worse than random, in that liars are incentivized, hence my insistence on showing actual skills in some capacity.) It is infuriating when you have a recruiter not understand some idiosyncratic piece of tech is totally exchangeable with what you did, or that it is trivial to learn on the job given your prior experience, but that is not going to go away anytime soon.

I’d note at Gainwell we have no ATS or HR filtering like this (the only filtering is for geographic location and citizenship status). I actually would rather see technical blog posts or personal github code than saying “I saved the company 1 million dollars” in many circumstances, as that is just as likely to be embellished as the technical skills. Less technical hiring managers though it is probably a good idea to translate technical specs to more plain business implications though.

2 Comments

by Andy Wheeler on November 1, 2025 • Permalink

Posted in data science, Personal Productivity, Python, scholarly

Tagged resume

Posted by Andy Wheeler on November 1, 2025

https://andrewpwheeler.com/2025/11/01/what-to-show-in-your-tech-resume/

Search for:
Recent Posts
Categories
Categories
Site RSS Feeds
- RSS - Posts
- RSS - Comments
Follow Blog via Email

Enter your email address to follow this blog and receive notifications of new posts by email.

Email Address:

Join 392 other subscribers
aoristic big-data cartography census choropleth citeulike consulting cost-benefit courses crime-mapping crime-trends Crime Analysis Criminal Justice data-manipulation data visualization deep-learning ESRI excel flow-data folium geocoding github google-streetview-api grammar of graphics group-based-trajectory gun-violence healthcare homicide-rates hot spots hypothesis-testing linear programming LLM logistic-regression machine-learning MACRO mapping matplotlib meta network NetworkX officer-involved-shooting open-science paper Papers peer-review Poisson prediction Predictive-Policing preprint presentation Python Python-programability pytorch quasi-experiment r recidivism regression resources scholarly scraping seaborn shootings simulation small-multiples social-media social-networking SPSS stackexchange Stata statistics survey time-series uncertainty wdd web-scraping
Top Posts & Pages
Stack Exchange

All posts in category Python

OpenAI

Anthropic

Google

AWS

Buy the book!

Crime De-Coder Work

At the Day Gig

AI Predictions

Recent Posts

Categories

Site RSS Feeds

Follow Blog via Email

Top Posts & Pages

Stack Exchange