All posts in category data science

How long to conduct your experiment: Talk at ASEBP

Upcoming at the American Society of Evidence Based Policing Conference, I have a talk Thursday morning (9:45-10:00), How long to conduct your experiment.

The talk goes over some of the simple metrics I have created to help plan how long to conduct your intervention. Such as how long to evaluate your hot spots intervention, or purchase to increase arrest rates, etc.

I have prepared a ton of different resources. The main one is a web-based application (a WASM-based app with R as the backend) where you can enter your inputs and generate a graph showing how precise your parameter estimates are:

The help page includes citations and additional materials, but here is a brief rundown:

I have the math details in this github repo, see the methodology.pdf. It also includes notes on how I used different LLM tools to produce the webpage and the method materials. Each of the applications allows you to download the R code used to generate the graphs and tables.
I have created a series of YouTube videos demonstrating the application (WDD, IRR, Proportion tests)
I have posted my slides for the ASEBP talk

See you all in DC at ASEBP in a few weeks!

Leave a comment

by Andy Wheeler on May 12, 2026 • Permalink

Posted in Crime Analysis, Crime Mapping, Criminal Justice, data science, Papers, R

Tagged experiment, presentation, talk

Posted by Andy Wheeler on May 12, 2026

https://andrewpwheeler.com/2026/05/12/how-long-to-conduct-your-experiment-talk-at-asebp/

xAI voice cloning API

xAI has just released an API to clone your voice. It is pretty simple, read a script, and then an API where you can have text to speech in that voice.

Here is the python code after you have cloned your voice.

			
import os
import requests
voice_id = os.environ['ANDY1_VOICE'] # my demo voice ID
text = '''this is a test demo of my voice. Be excited! 
OK, how about a list of things; one, two, three. 
Lets see where this takes us.'''
response = requests.post(
    "https://api.x.ai/v1/tts",
    headers={
        "Authorization": f"Bearer {os.environ['XAI_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "text": llm_book,
        "voice_id": voice_id,
        "language": "en",
    },
)
response.raise_for_status()
with open("AndyTest1.mp3", "wb") as f:
    f.write(response.content)

		

I need to figure out my audio set up a bit better (my mic set up is probably not optimal and it produces some echo). But does a good job imitating my boring voice right out of the box!

And here is an example for longer speech from my intro to LLMs book:

			
# intro to llm book
llm_book = '''
Large language models (LLMs) are transforming how we work. Some of these examples include using LLMs to help write computer code, using LLMs to extract out information from irregular text sources, and creating chat-bots that can interact with various data sources and documents.
Most analysts, however, do not have any experience with these tools. This book is meant to be a general introduction to realistic examples of how individuals can use these tools; either in general software applications, or to help analysts write code to create software itself. Given the rapid pace of advancement in this area, a general introduction to help individuals who work in the knowledge economy understand the capabilities of these tools I believe is in order.
Here is a simple example of using an LLM API (*Application Programming Interface* -- just a standard way to send information and get information back on the web) using the anthropic library in python to extract key information from a free text crime narrative:
'''
response = requests.post(
    "https://api.x.ai/v1/tts",
    headers={
        "Authorization": f"Bearer {os.environ['XAI_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "text": llm_book,
        "voice_id": voice_id,
        "language": "en",
    },
)
response.raise_for_status()
with open("AndyTest_LLMIntro.mp3", "wb") as f:
    f.write(response.content)

		

The LLM intro messed up *Application Programming Interface* section (start listening at 50 seconds in). But otherwise it is very nice.

For those worried about security, xAI did something smart here — you need to input text live into the API given their prompts. You cannot have a pre-recording audio input to do this. So cloning someone elses voice is pretty hard.

Costs are around $4 per million characters in the text to speech API. So say narrating my entire book should be under $10 I believe.

Took me a total of less than an hour to set up a voice, create the python code, and write this blog post!

Leave a comment

by Andy Wheeler on May 4, 2026 • Permalink

Posted in data science, Python

Posted by Andy Wheeler on May 4, 2026

https://andrewpwheeler.com/2026/05/04/xai-voice-cloning-api/

Gathering interest in tech courses

Quick post this morning — I have a survey up gathering input on interest in short, technical courses.

Tech course survey interest form

Think 2-3 days, potentially in person/synchronous.

If you have taken a course with Paul Allison at Horizon’s, or an ICPSR summer course, those are similar examples. But, the main difference will be these courses are to prepare you for pursuing private sector roles.

These will be aimed at:

grad level social science students
current professors looking to pursue private sector roles
current data analysts looking to get into data science
undergrads with some more technical background

Survey lists potential courses (python for data analysis, intro to LLM APIs, SQL + Dashboards, using agent based tools for analysis), the course medium (in person vs video), price points.

If you are a university or organization interested in hosting such sessions for your students, let me know as well. Happy to chat to you about bringing this to your campus.

Leave a comment

by Andy Wheeler on April 27, 2026 • Permalink

Posted in Crime Analysis, Crime Mapping, data science, Python, scholarly

Tagged online-teaching, teaching

Posted by Andy Wheeler on April 27, 2026

https://andrewpwheeler.com/2026/04/27/gathering-interest-in-tech-courses/

LinkedIn Premium Does Not Boost your Posts

One of my connections mentioned in a post on LinkedIn that since he turned off Premium, his posts have been getting less engagement. Since LinkedIn offers a month for free, and I have been trying to promote my recent book, I figured I would try my free month trial and see how many more views I could get. (Here I am not worried about Premium for applying to new jobs, it is possible it is totally worth it for that, I was not applying to jobs in this test so I do not know.)

Long story short, LinkedIn Premium does not appear to promote my material at all above the baseline.

Post Views

In a sample of 30 posts the month before I turned on Premium (turned on 3/24 in the evening, turned off 4/22 in the morning), my posts had an average of 3600 views (with a standard deviation of 7000, median 1400). Post-Premium, I had 23 posts, and the views were on average 2200 (SD 2900, median 900). Here is the full table of posts and links (Premium=1 means it was posted when my Premium subscription was turned on):

| Premium | Views | URL |
| ----:|----- :|:----- |
| 0    | 3659  | https://www.linkedin.com/posts/andrew-wheeler-46134849_llms-have-transformed-the-data-science-industry-activity-7426975341572984832-HGTA |
| 0    | 2526  | https://www.linkedin.com/posts/andrew-wheeler-46134849_no-guarantees-but-i-am-going-to-try-to-start-activity-7428418993553846272-vdjA    |
| 0    | 2290  | https://www.linkedin.com/posts/andrew-wheeler-46134849_much-of-the-hype-around-claude-code-is-having-activity-7428781380391567360-zkXr   |
| 0    | 545   | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-benefits-of-my-epub-version-of-activity-7429143771109302272-_V6T       |
| 0    | 1454  | https://www.linkedin.com/posts/andrew-wheeler-46134849_claude-code-has-the-ability-to-create-hooks-activity-7429506167527088128-01Um     |
| 0    | 1326  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-prompting-flows-i-find-convenient-activity-7429868558794436609-SDgG    |
| 0    | 1278  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-main-focuses-in-the-book-is-not-activity-7430230940444057600-pqK-      |
| 0    | 5988  | https://www.linkedin.com/posts/andrew-wheeler-46134849_while-skills-in-claude-code-are-all-the-rage-activity-7430955707358818304-iqjb    |
| 0    | 1726  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-mistakes-i-see-with-agent-based-activity-7431318102212100096-rI_W      |
| 0    | 1172  | https://www.linkedin.com/posts/andrew-wheeler-46134849_from-my-experience-as-an-educator-when-presenting-activity-7431680485585580032-8h7B    |
| 0    | 1360  | https://www.linkedin.com/posts/andrew-wheeler-46134849_although-the-llm-tools-are-currently-focused-activity-7432042882770944000-AfAG    |
| 0    | 5304  | https://www.linkedin.com/posts/andrew-wheeler-46134849_when-i-was-a-professor-at-ut-dallas-i-sat-activity-7432405268874817536-qrAI       |
| 0    | 30732 | https://www.linkedin.com/posts/andrew-wheeler-46134849_i-know-a-few-stats-folks-in-my-network-that-activity-7432767666781679617-HXnk |
| 0    | 1003  | https://www.linkedin.com/posts/andrew-wheeler-46134849_claude-code-does-not-have-an-image-model-activity-7433492422111776768-2dqY        |
| 0    | 884   | https://www.linkedin.com/posts/andrew-wheeler-46134849_while-i-have-a-section-in-the-book-devoted-activity-7433854815862013952-FIl-      |
| 0    | 888   | https://www.linkedin.com/posts/andrew-wheeler-46134849_in-the-book-i-have-a-dedicated-chapter-on-activity-7434217215450681344-hk3E       |
| 0    | 1868  | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-llm-book-is-compiled-using-quarto-so-activity-7434579595095580673-RHdx        |
| 0    | 807   | https://www.linkedin.com/posts/andrew-wheeler-46134849_llms-for-mortals-how-to-view-the-epub-activity-7434945455073169409-bNDu           |
| 0    | 1243  | https://www.linkedin.com/posts/andrew-wheeler-46134849_section-on-using-gliner-for-ner-activity-7435304382931513344-NPPC                 |
| 0    | 1745  | https://www.linkedin.com/posts/andrew-wheeler-46134849_my-first-book-data-science-for-crime-analysis-activity-7436029137695416320-aRmr   |
| 0    | 914   | https://www.linkedin.com/posts/andrew-wheeler-46134849_so-the-new-book-large-language-models-for-activity-7436376426100199424-1g8E       |
| 0    | 1593  | https://www.linkedin.com/posts/andrew-wheeler-46134849_agentic-coding-apps-like-claude-code-and-activity-7436738847054512128-qiXz        |
| 0    | 3415  | https://www.linkedin.com/posts/andrew-wheeler-46134849_many-people-are-turned-off-by-ai-writing-activity-7437101213717909504-QjOo        |
| 0    | 928   | https://www.linkedin.com/posts/andrew-wheeler-46134849_pretty-much-every-day-there-is-a-new-prompt-activity-7437463609401794560-50H-     |
| 0    | 2185  | https://www.linkedin.com/posts/andrew-wheeler-46134849_much-of-the-hype-around-skills-is-imo-people-activity-7437826000958337024-t6i3    |
| 0    | 800   | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-benefits-of-my-llm-for-mortals-activity-7438550758763098112-hQUw |
| 0    | 870   | https://www.linkedin.com/posts/andrew-wheeler-46134849_large-language-models-for-mortals-preview-activity-7438913140639207424-tAli       |
| 0    | 1948  | https://www.linkedin.com/posts/andrew-wheeler-46134849_new-blog-post-using-claude-code-to-help-activity-7441087469388861440-TCKq         |
| 0    | 1160  | https://www.linkedin.com/posts/andrew-wheeler-46134849_given-all-the-rage-with-generative-ai-and-activity-7441449857480933377-uPFw       |
| 0    | 27842 | https://www.linkedin.com/posts/andrew-wheeler-46134849_stop-teaching-r-teach-python-when-i-was-activity-7441812266938826753-DywF         |
| 1    | 526   | https://www.linkedin.com/posts/andrew-wheeler-46134849_forecasting-the-future-is-difficult-especially-activity-7442537064803368960-qsVO  |
| 1    | 13096 | https://www.linkedin.com/posts/andrew-wheeler-46134849_when-using-llms-to-do-structured-data-extraction-activity-7442899426471407617-CpZz |
| 1    | 2394  | https://www.linkedin.com/posts/andrew-wheeler-46134849_ive-spoken-with-many-people-who-are-concerned-activity-7443039100477145090-v_3i   |
| 1    | 646   | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-main-audience-my-book-large-language-activity-7443261810511757312-R2Jc        |
| 1    | 3030  | https://www.linkedin.com/posts/andrew-wheeler-46134849_for-the-folks-that-were-not-happy-with-my-activity-7443401497444409344-q38H       |
| 1    | 437   | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-current-capabilities-of-googles-activity-7443624184120754176-YHbw      |
| 1    | 5275  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-error-i-am-seeing-devs-continually-make-activity-7443986571650973696-TZ-K     |
| 1    | 3815  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-biggest-issues-with-using-generative-activity-7444348969100664832-pX0v |
| 1    | 738   | https://www.linkedin.com/posts/andrew-wheeler-46134849_reports-of-rags-demise-are-overstated-activity-7444711358421491712-6BJu           |
| 1    | 425   | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-recent-litellm-distribution-attack-highlights-activity-7445073747968999424-NEmn    |
| 1    | 3752  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-responses-to-me-writing-the-book-activity-7445436130080186369-7Fvx     |
| 1    | 1670  | https://www.linkedin.com/posts/andrew-wheeler-46134849_professors-that-follow-me-i-am-happy-to-activity-7446160904217407488-CXcZ         |
| 1    | 918   | https://www.linkedin.com/posts/andrew-wheeler-46134849_i-have-used-claude-code-the-longest-probably-activity-7447248077435920385-Ym8A    |
| 1    | 876   | https://www.linkedin.com/posts/andrew-wheeler-46134849_gio-has-a-new-post-out-on-examining-confidence-activity-7448697613513740289-hObZ  |
| 1    | 1959  | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-mythos-technical-blog-post-on-its-cybersecurity-activity-7449059395071709184-UJOC  |
| 1    | 2201  | https://www.linkedin.com/posts/andrew-wheeler-46134849_for-folks-that-use-jupyter-notebooks-one-activity-7449422388691243008-Twxn        |
| 1    | 626   | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-recommendations-i-have-in-the-activity-7450147165781426177-vceO        |
| 1    | 333   | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-term-agent-is-almost-always-used-as-activity-7450509557677625344-42Ze         |
| 1    | 5787  | https://www.linkedin.com/posts/andrew-wheeler-46134849_agent-based-systems-require-bad-python-code-activity-7450871941458190336-gzSl     |
| 1    | 468   | https://www.linkedin.com/posts/andrew-wheeler-46134849_broadly-there-are-two-types-of-agent-based-activity-7451234329390678016-ZU1h      |
| 1    | 1267  | https://www.linkedin.com/posts/andrew-wheeler-46134849_i-get-periodically-asked-what-is-the-best-activity-7451596716354707456-aFMF       |
| 1    | 346   | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-saying-a-picture-is-worth-a-1000-words-activity-7451959111132299264-Tk7o      |
| 1    | 480   | https://www.linkedin.com/posts/andrew-wheeler-46134849_it-is-important-to-have-independent-benchmark-activity-7452318150701953024-ftcM        |

I would have expected a multiplier (e.g. typically 3k views, now you have 6k or 9k views per post). So you could nitpick that I have differential timing for the posts, and the pre-premium posts have some contamination (if they promoted my older posts when I activated Premium). But those are not large enough to make a difference in my findings relative to what I expected.

The posts are quite comparable in content, mostly focused on my book and LLMs. It is possible my audience is oversaturated with that content, but I think it is just as likely that LinkedIn Premium doesn’t really promote your work to any substantive extent. (I have additionally obtained more followers in this period, so that should bias the results to have more views, not less.) At least here there is no evidence I should continue to pay $20 a month to increase my reach on LinkedIn.

Posts are bursty, and in the end I have very little ability to forecast what will or will not be popular. In the pre-period, my most popular post was on a blog post I did on log-probabilities (30k views). I definitely try to post more technical stuff on LinkedIn than the typical social media influencer, so that limits the reach.

I also had a rage-bait post on professors should teach python and not R with just under 30k views. (That was a bit of social media manipulation – have a controversial opinion that divides people, you get a bunch of thumbs up and a bunch of comments.) I do not have that many potential rage-bait post topics!

In addition to this, I also did the month for free for LinkedIn Premium for my business Crime De-Coder page. The same with my business page, I did not see any increased views, increased followers, etc.

Profile Views

Although I have not seen LinkedIn explicitly say Premium boosts your posts (besides actually paying for advertising), I have seen LinkedIn explicitly advertise that Premium profiles get more views:

So how do profile views look? I did get more the week I signed up, but it was trending upward previously, and reverted to the trend after week one anyway. (A few days short, I cannot access the chart week by week since turning off Premium.)

For a bit of background, I spent most of my time posting on my LinkedIn business Crime De-Coder page, and only posted on my personal page maybe once or twice a month. But since publishing LLMs for Mortals (in February of 2026), I have posted more on my personal. Which you can see increased my profile views before I signed up for Premium.

Likely the past additional profile views are for that rage-bait Python vs R post that was popular, not due to anything Premium did.

This appears to be extremely misleading advertising on LinkedIn’s part. If they just look at Premium vs not, it is likely Premium users are more active. This should just say the explicit “boost” profile views get, like ranked higher in searches.

$100 ad credit

With premium, you get $100 ad credit for posts a month. I used this to boost my original LLM for Mortals launch post, which was stale at that point and not accumulating any additional views.

The metrics on the post were as LinkedIn said they would be. Despite having 80+ likes when I first created it, the post only had 3700 views. Spending $100 on the credits got me an additional ~3500 views and supposedly ~50 additional website clicks. (I am confused how this is calculated, as I can see the actual link in the post was clicked fewer than 10 additional times with the campaign.)

I knew going in that adverts on LinkedIn are not a net benefit given my book purchase conversion rates. What I will call “high trust” referrals, I have something like a 1/100 purchase rate for the book. For other mediums, it is more like 1/1000. As far as I can tell, these seem pretty typical for a higher dollar value book purchase ($50+).

I have debated on setting the purchase price for the epub to much lower. $50 is in line with current offerings from O’Reilly, and in my informal demand curve tests is where I think it should be. But I don’t think any realistic conversion rate would make LinkedIn advertising make sense for my book.

For reference for influencers though, this gives a rough estimate comparable to LinkedIn’s direct advertising. Basically my average post is worth $100 according to LinkedIn. I only have around 3k followers currently on LinkedIn, so I imagine folks with followings 10x that can likely do direct advertisements to their audiences for more like $1k and up.

Wrap Up

I still think LinkedIn is the best social media site currently to promote my work and business. It is not just about the raw view counts, but also about conversion to people buying my book or reaching out for additional consulting gigs.

I will continue to use LinkedIn for this, but paying for a Premium LinkedIn account does not appear to be worth it for these reasons. Even if the views were increased, it is possible that they are not good connections for these end goals.

There are additional things you get with Premium (can send cold messages to people you are not connected to, supposedly higher priority when applying to jobs). Those are maybe worth the $20 a month for some people. But focusing on what LinkedIn advertises for “boosting” your posts and profile, I did not personally see any evidence that would justify spending even $1 a month for the Premium features.

Leave a comment

by Andy Wheeler on April 25, 2026 • Permalink

Posted in data science, Personal Productivity, social networking, writing

Tagged LinkedIn

Posted by Andy Wheeler on April 25, 2026

https://andrewpwheeler.com/2026/04/25/linkedin-premium-does-not-boost-your-posts/

Job Advice Resources page

Minor update, I have created a page, Job Advice Resources to cumulatively list all the materials I have written on advice for social scientists and crime analysts looking to pivot into private sector tech roles.

I still get maybe ~2 folks a month ask for advice, and I am always happy to chat. I wish PhD granting institutions took this more seriously (it only takes minor changes to better prepare students).

If you are an administrator of a PhD program and actually care about getting your students jobs, also feel free to reach out and I am happy to discuss how I can help.

Leave a comment

by Andy Wheeler on April 22, 2026 • Permalink

Posted in Crime Analysis, Criminal Justice, data science, Personal Productivity, Python, scholarly

Tagged tech-jobs

Posted by Andy Wheeler on April 22, 2026

https://andrewpwheeler.com/2026/04/22/job-advice-resources-page/

The race to the bottom with AI tools

What we are seeing in the AI startup space is a perfect example of the “no moat” problem: if your core product is essentially just clever prompt engineering wrapped around someone else’s frontier model, it is trivially easy for a competitor to reverse-engineer your workflow and undercut your price. Over the last few months, this lack of a defensible moat has triggered a rapid race to the bottom in automated peer review, moving from expensive managed services to open-source “bring your own key” (BYOK) scripts.

Here I am going to look at three tools specifically designed to review academic papers: Refine, IsItCredible, and Coarse.

Overview of the Tools

Refine: Refine positions itself as a premium, rigorous option for institutions, boasting testimonials from Ivy League professors and a high price point of $49.99 per review. It uses what it calls “massive parallel compute” to make hundreds of LLM calls to stress-test every line of a document.

IsItCredible: Built on the open-source Reviewer 2 pipeline, IsItCredible offers a standardized, pay-per-use middle ground with core reports starting at $5. It employs a clever “adversarial” architecture where “Red Team” agents try to find flaws and a “Blue Team” verifies them to prevent hallucinations.

Coarse: Coarse represents the logical endpoint of this race as an open-source “Bring Your Own Key” (BYOK) tool that lets you run complex multi-agent reviews locally or via OpenRouter. Because users pay the API costs directly instead of a markup, a comprehensive paper review is significantly cheaper.

The “LLM as a Judge” Problem

The hardest part of all this is evaluation. How do you know if the AI reviewer is actually good?

Refine relies almost entirely on anecdotal evidence. Their own FAQ essentially tells you to just try it and see the difference for yourself, claiming that general-purpose chatbots cannot match their depth even with expert prompting. This “try it yourself” approach is effective for marketing, but it isn’t a hard benchmark.

IsItCredible and Coarse are trying to be more systematic. The IsItCredible team released a paper, Yell at It: Prompt Engineering for Automated Peer Review, where they benchmarked their tool against five alternatives. They claim 15 wins out of 20 pairings. Similarly, Coarse claims to have been “blind-evaluated” against Refine and Reviewer 2, scoring higher on coverage and specificity.

However, we are still largely in the “LLM as a judge” era. These benchmarks often use another LLM to decide which review is better. It is circular logic. Until we have a “Ground Truth” dataset of known mathematical errors or logical fallacies in published papers, we are just measuring which AI writes the most convincing-sounding critique.

Because evaluation is so difficult, this software category risks becoming a classic market for lemons. It is incredibly difficult to identify substantive differences in quality between these tools without some external, hard benchmark. To truly evaluate if Refine’s expensive managed service is meaningfully better than Coarse’s open-source BYOK run, you have to verify the AI’s claims. But verifying those claims requires spending just as much time reading and reviewing the original paper as you would have spent just doing the review yourself from scratch. Without transparent benchmarks, users cannot easily distinguish high-quality rigorous analysis from convincing hallucinations, driving the market toward the cheapest option by default.

For those building AI tools, this entire space serves as a warning about the race to the bottom. I have previously written about deep research tools as another example of this phenomenon. If your only value proposition is a well-orchestrated prompt chain, open-source alternatives will inevitably compress your margins to zero. Eventually, the native GUI interfaces of the frontier models themselves may just become good enough that your specialized service isn’t even needed.

Stop Teaching R. Teach Python.

There has been a slight transition in social science teaching since I have been a student and professor over the past ~15+ years. In the aughts, it was still common to teach students in legacy, closed source statistical software (SPSS, SAS, and Stata). When I was a PhD student in criminal justice at SUNY Albany, we had a specific class to learn SPSS, although most of the rest of the quantitative courses used Stata.

The R programming language has likely usurped the use of the closed source languages in social science education after the aughts though. (I do not have hard data, but that is my impression seeing what colleagues are using and what they teach in classes.)

I am familiar with all of the major statistical programs (I have written an R package, and you can see this blog for many examples of SPSS and a few for Stata). If the goal in coursework is to teach your students skills relevant to help them get a job, academics in social science institutions should teach their students Python. The current job market for quantitative work is dominated by Python positions.

To be clear, I am not fundamentally opposed to closed source programming languages (there are scenarios where SPSS/SAS make more sense than Hadoop systems I have seen, also if you are a GIS analyst you should learn ESRI tools). This is purely just an observation given the current private sector job market – focusing primarily on Python makes the most sense for social science students.

As an experiment, I went onto LinkedIn and did a search for “data scientist”. Your results will differ (mine are tailored to the Raleigh area, and also includes more senior positions), but here is a table of the positions that came up on the first page, and a quick summary of the tech stacks they require. While this is not a systematic sample, it gives a reasonable snapshot of current expectations.

| Company             | Job Title                           | Tech Stack                                           | URL                                            |
|---------------------|-------------------------------------|------------------------------------------------------|----------------------------------------------- |
| Google              | Data Scientist (Google Voice 2)     | Python, R, SQL                                       | https://www.linkedin.com/jobs/view/4387751995/ |
| Deloitte            | AI Specialist                       | None specified                                       | https://www.linkedin.com/jobs/view/4376183670/ |
| Ascensus            | Principal Analytics                 | R, Python, SQL, GenAI/LLM                            | https://www.linkedin.com/jobs/view/4380164400/ |
| EY                  | AI Lead Engineer                    | Python, C#, R, GenAI/LLM                             | https://www.linkedin.com/jobs/view/4385954762/ |
| PwC                 | GenAI Python Systems Engineer (2)   | Python, SQL, Cloud Platforms, GenAI/LLM              | https://www.linkedin.com/jobs/view/4373604638/ |
| Affirm              | Senior Machine Learning Engineer    | Python, Spark/Ray                                    | https://www.linkedin.com/jobs/view/4326673670/ |
| Lexis Nexis         | Lead Data Scientist                 | Cloud Platforms, GenAI/LLM                           | https://www.linkedin.com/jobs/view/4316327742/ |
| EY                  | AI Finance                          | SQL, Python, Azure, GenAI/LLM                        | https://www.linkedin.com/jobs/view/4385085950/ |
| Korn Ferry          | Sr. Data Scientist                  | Python, R, Spark, AWS, GenAI/LLM                     | https://www.linkedin.com/jobs/view/4387433496/ |
| Deloitte            | Data Science Manager                | Python, Cloud                                        | https://www.linkedin.com/jobs/view/4304674642/ |
| First Citizens Bank | Senior Quant Model Developer        | Python, SAS, SQL                                     | https://www.linkedin.com/jobs/view/4365378242/ |
| First Citizens Bank | Senior Manager Quant Analysis       | Python, SAS, Tableau                                 | https://www.linkedin.com/jobs/view/4388131284/ |
| Jobot               | ML Solution Architect               | Python, Scala, Spark, AWS, Snowflake                 | https://www.linkedin.com/jobs/view/4384023540/ |
| Affirm              | Analyst II                          | SQL, Python, R, CPLEX/Gurobi, Databricks/Snowflake   | https://www.linkedin.com/jobs/view/4373303038/ |
| Red Hat             | Sr Machine Learning Engineer (vLLM) | Python, GenAI/LLM                                    | https://www.linkedin.com/jobs/view/4354827922/ |
| Alliance Health     | Director AI                         | Python (TensorFlow/PyTorch), Office Products, GenAI  | https://www.linkedin.com/jobs/view/4383011480/ |
| Nubank              | ML Data Engineer                    | Python, Ray/Spark                                    | https://www.linkedin.com/jobs/view/4376815752/ |
| Target RWE          | Senior Quant Data Scientist         | R                                                    | https://www.linkedin.com/jobs/view/4385293724/ |
| Siemens             | Senior Data Analytics               | SQL, Python, R, Tableau/PowerBI                      | https://www.linkedin.com/jobs/view/4377969531/ |
| Red Hat             | Sr Machine Learning Engineer        | Python, GenAI/LLM                                    | https://www.linkedin.com/jobs/view/4302769773/ |
| Lexis Nexis         | Director Data Sciences              | Python, R, GenAI/LLM                                 | https://www.linkedin.com/jobs/view/4387335028/ |
| Cigna               | Data Science Senior Advisor         | Python, SQL                                          | https://www.linkedin.com/jobs/view/4381766145/ |
| Thermo Fisher       | Senior Manager Data Engineering     | Fabric, PowerBI, Python, Databricks, Tableau, SAS    | https://www.linkedin.com/jobs/view/4372684009/ |

Of the positions:

9/25 roles included R, but only one required R exclusively. The other 8 were Python/SQL/R
22/25 included Python
11/25 had a focus on Generative AI or LLMs

Python dominates R in the current job market for data science positions. Professors are doing their students a disservice teaching R, the same way they would be doing a disservice teaching their students to code in Fortran.

Another aspect I noticed for this – analyst type jobs not all that long ago really only expected Excel (and maybe SQL). Now even the majority of the analyst jobs expect Python (even more so than dashboard tools like PowerBI in this sample).

For individuals on the job market, I suggest going and doing your own experiment job search like this on LinkedIn to see the tech skills you need to be able to at least get your foot in the door for an interview. I expected GenAI to be slightly more popular (only 11/25), but there were a few other technologies sprinkled in enough it may be good to become familiar with to widen your potential pool (Cloud and Spark – I am surprised Databricks was not listed more often).

If you’re looking to build Python skills from scratch, I cover this in my book: Data Science for Crime Analysis with Python (can purchase in paperback or epub at my store).

If also interested in learning about generative AI, see my book Large Language Models for Mortals: A Practical Guide for Analysts with Python.

You can use the coupon TWOFOR1 to get $30 off when purchasing multiple books from my store.

1 Comment

by Andy Wheeler on March 22, 2026 • Permalink

Posted in data science, Personal Productivity, Python, R

Tagged teaching

Posted by Andy Wheeler on March 22, 2026

https://andrewpwheeler.com/2026/03/22/stop-teaching-r-teach-python/

Using Claude Code to help me write

Using LLMs to help you write is understandably a touchy subject for many. There is quite a bit of AI slop coming out now, as it is really easy to just have the LLM tools think for you and write superficially OK but ultimately garbage prose.

My recent book, LLMs for Mortals, I used Sonnet 4.1 to write the initial draft of the book (for around $5). My prior book took around a year, whereas I was able to finish this book in around two months. I definitely did a ton of copy-editing (maybe around 20-30 hours per chapter on average), but I believe around 50% of the book material is the original Sonnet generated prose.

LLMs are a tool – they can be used poorly, but I think they can be used quite well. Pangram, a tool used to detect AI writing, does not flag any of the passages in LLMs for Mortals as AI generated.

This blog post goes over my notes on how I used Claude Code to help me write (although it really is applicable to any of the current coding tools, like Codex or Gemini as well). As a meta-reference, this blog post is 100% written by myself directly, but I will link to a draft written using Claude Code later in the post for a frame of reference.

Copy Editing

First, even if you do not agree with having an LLM write for you directly, there is a use case that should be relatively uncontroversial – having an LLM take a copy-edit pass on your work.

Here is an example I used this for recently, the blog post on Crime De-Coder goes over the benefits of using an API vs local LLMs. In this conversation, you can see my original draft, and the suggestions that Claude’s desktop tool (the free version) gave.

Again this is not really specific to Claude (this would have worked fine in ChatGPT as well). LLMs are good for not only spelling errors, but grammatical issues that spell check will not catch, as well as just more general copy-editing advice on the content.

One point of this – to replicate my setup, you need to write in plain text. Most of the things I write are in some form of markdown (plain markdown for blog posts, and Quarto for longer reports/books/etc). This makes it much easier to use the tools, especially the command line interface (CLI) tools like Claude Code.

Writing New Content

There are two big issues currently with LLM writing:

it is potentially wrong
current LLM writing has a particular style that is itself becoming noticeable

The first bullet, you need to review what it writes. It is much easier to have it write on content you are an expert in, so it is easier to review and spot errors. (It is the same current problem with using the tools to help you write computer code – they are boons for seniors but can write a ton of slop that more neophyte coders have a hard time spotting issues.)

The second bullet, having the style mimic your own, is what I am going to discuss here. It is worth understanding at a high level how generative AI LLMs work – if you ask “answer question X” vs “here is a book, …., answer question X” the LLM will generate a different response. The first part in the former prompt, “here is a book, …” is what is referred to the context. Current models have context windows (how large of a potential input) at around 500,000 words (technically they are around 1 million tokens, one word is often multiple tokens though).

You generally do not want to fill up the context window 100%, but 500,000 words is a very large number – just including text it would be multiple books. Another common prompting technique is what is called k-shot examples. It will typically go like

example input1: ...text... expected_output: ...blah...
example input2: ...diff text... expected_output: ...blah2...
....

This is what you place in the context window, then submit your usual prompt, and have the LLM generate the content. It is giving prior examples to help guide the LLM what you expect the final output to look like. This works the same way with writing – give the LLM prior examples of your writing to help it mimic your future style.

To keep it simple, I have created an example on github to follow along. Basically just have your prior writing (in text!), and then ask Claude Code something like:

review my prior blog posts in folder /blogposts, I am going to have you write a new blog post on topic X given the outline *after* you review the text

Then after your prior work is in the context window, feed the LLM an outline for what you want to write. In this example, I put the outline in an actual text file and said:

In the ClaudeWritingPost folder, review the outline.txt, then create a new md 
file, called ClaudeWritingExample.md, filling in the sections based on the 
outline

Claude Code will then go and review the text file with the outline and write the post. In the github repo I have my original outline for this same post, so you can see side-by-side.

You can technically write custom commands and skills with Claude Code (or the other CLI tools) to save the steps of typing two prompts, but to keep it simple for folks I am just showing the two steps manually. It is really just those two steps – get your prior examples into the context window, and then feed an outline for what else to write.

In the Github repo you can see some additional Claude.md files – these are files that include additional instructions. A common one I say is “do not include emojis”. LLM writing also tends to be verbose and have excessive lists. So I have instructions to avoid those as well.

The written blog post is not bad – I would suggest to go and read it as a proof of concept (I exported the session, can see it cost around fifty cents). Part of the reason I do not typically worry about blog posts is that I often add in things/change things in the process of writing. So you can see my personally written post is longer and has a few more elements.

So when would you use it? Technical writing, like writing tutorials in python, it works very well. Hence I could have it write the first pass on my LLM book and keep 50% of the content. I may use it for blog posts in the future (if I felt compelled to write something every day). But will not take that plunge for now.

For longer pieces, like an entire paper or a book, I suggest to not only make a detailed outline, but to also have the LLM write it in smaller sections. This both helps with reviewing the content, as well as to keep the LLM on track if you make edits/changes as you go. (Longer conversations it is more likely to degrade and make repeated errors.)

An Extra Note About Citations

I am not writing academic papers much anymore, but another fundamental problem with LLM writing is hallucinating citations. If you write in text markdown files, my suggestion is fairly simple – have the papers you want to cite in a bibtex file, and in-line in markdown, only cite papers in the form:

Citation, @item1 says blah [@item1; @item2]. For a specific page quote [@item1 p. 34-35].

The way I write my outlines, it typically is like write a paragraph about X, cite papers a,b,c. So my personal style of progressively filling in an outline works well with LLMs.

So this presumes you already have a list of papers (and are not using the LLM to dynamically write your lit review based on papers you have not read). Next time I actually need to write an academic paper, I may write up an MCP tool to query Semantic Scholar’s API and create a nice bibtex file.

But the solution here is again you need to review the output for accuracy. People without these tools are lazy and cite things they have not read already, so that will continue to happen (the tools just make it easier). Those that figure out how to use the tools appropriately though can be much more productive writing.

1 Comment

by Andy Wheeler on March 20, 2026 • Permalink

Posted in data science, Personal Productivity, writing

Tagged claude-code, LLM

Posted by Andy Wheeler on March 20, 2026

https://andrewpwheeler.com/2026/03/20/using-claude-code-to-help-me-write/

Some notes on the unreliability of LLM APIs

Because my book, LLMs for Mortals, was created with Quarto, it runs the code when I compile the book. It uses cached versions when no code changes, but it is guaranteed to be working code for the parts that have a grey input and a following green output, it is valid code that executed and generated the results.

I try to use temperature zero for most of the book, but some of the parts of the book are stochastic. Reasoning models you cannot set the temperature, so some elements of Chapter 3 introducing the models, and basically all of the section in chapter 6 on agents is stochastic. This actually gave me a better appreciation of some of the unreliability of these models, as for some instances it would fail, and others I needed to recompile because the output was poor.

The way jupyter caching works under the hood, it has a separate cache for the epub and the LaTeX document (that is used for the print version). So you technically get a slightly different book when you purchase epub vs paperback. When you have 60+ failure points per chapter (and that gets doubled when compiling to both epub and PDF), you get to glimpse a few of the warts of the API models.

These are also short snippets, so do not have error catching or more robust JSON parsing, so some of these issues I basically programmed away in production systems at work and did not even notice them. I figured my notes may be useful though in general for others trying to rely on these systems with large volume API calls.

OpenAI

All the models were generally reliable, but one of the examples of stochastic outputs in OpenAI gave me fits – I asked OpenAI to analyze a blog post on my Crime De-Coder site and get information from the post. Now this is a bit tricky, as the reasoning model needs to see the data is not available in the post directly, but in an image.

January 24th, at one point though this became totally unreliable in its output. It would often fail to download the additional image, and when it did, it was pretty inconsistent actually giving an accurate answer.

But now I can run below and this just returns fine and dandy near every time. Here is a loop I ran 5 times and it gives the correct answer (around 160 Tuesday at 4 AM).

from openai import OpenAI
import time

client = OpenAI()

prompt = """
Search <https://crimede-coder.com/blogposts/2024/Aoristic>, what is 
the maximum number of commercial burglaries in the chart and on what
day and hour? Do not use shorthand, give an actual number.

If you need to, download additional materials to answer the question.

Be concise in your output.
"""

for _ in range(5):
    # minimal reasoning with responses API
    response = client.responses.create(
        model="gpt-5.2",
        reasoning={'effort': 'low'},
        tools=[{"type": "web_search"}],
        input=prompt,
    )
    time.sleep(20) # to prevent going over my limit
    print(response.output_text)
    print('-------------')

My only guess is there was some downgrade in the model capabilities, and it routed behind the scenes for the reasoning models to some less capable model. (Just on January 24th though!)

Otherwise, the stochastic examples in the book using OpenAI were pretty reliable.

Anthropic

In the structured outputs chapter, I go through examples of parsing JSON vs progressively building on Pydantic outputs. I actually give examples where Pydantic schema’s can cause some filling in of data that you do not want (if the data should be null, and you use k-shot examples, it will often fill in from your last example).

So this chapter really is a ton of advice on prompt engineering for structured outputs. One example I show is using stop sequences when generating JSON and doing text parsing (which is really not necessary and best practice with Pydantic schemas, but I still use this with AWS Bedrock, since it does not support that yet).

This code works fine, what is inconsistent is that on very rare occasion, Anthropic’s API returns the bracket at the end of the call. This subsequently generates an error with this code, as it is invalid JSON with an extra bracket.

Production systems at the day job use AWS, and I wrote the text parsing in a way I would not even see this error (so not sure if it also happens with AWS). And it was quite rare with Anthropic, I just compiled the book enough times to notice this error happen on a few occasions.

Google

In the book I show off using Google Map grounding, since it is a unique capability of Googles – it was very unreliable. Not unreliable in the sense it would return an error and not be available, but unreliable in “I cannot find any google maps data right now”. So this would compile, I would just need to go look at the output and make sure it actually returned something useful.

You can see I switched to the Vertex API for this example – I cannot confidently say if Vertex was more reliable than the Gemini API for this. I experienced issues with both (maybe slightly fewer with Vertex).

The Anthropic error is not so bad – it causes an actual error in the system. The reasoning and LLM outputs something, but it is not good, troubles me more. We are really just piloting agentic systems at the day gig now with a small number of users – they have not gotten really stress tested by a large number of users. I don’t even want to think about how I would monitor maps grounding in production given my experience.

AWS

AWS I only had one example not consistently work – calling the DeepSeek API.

In the prior code calling Anthropic models via Bedrock, and later chapters I have an example of Mistral and different embedding models (Cohere and Amazon’s Titan), were all fine. Just this single example from DeepSeek would randomly not work. By not work the API would return a response, but the content would be empty. So the final print statement is where the error occurred, accessing text that did not exist.

Most of my work, even if DeepSeek is cheaper, I need to consider caching. So Haiku is pretty competitive with the other models. So I do not have much experience in Bedrock with any models besides Anthropic ones.

My biggest gripe with AWS is the IAM permissions are too difficult (and have changed over the past year). I was able to reasonably figure out how to use S3 Vectors and batch inference (which is discussed in the book). I was able to figure out Knowledge Bases, but I just took it out of the book (both too expensive for hobby projects to have the search endpoint). OpenAI’s vector search store is super easy though, so will definately consider that for traditional RAG applications moving forward.

Buy the book!

Use promo code LLMDEVS for 50% off of the epub. Or if you prefer purchase the paperback.

Leave a comment

by Andy Wheeler on February 27, 2026 • Permalink

Posted in data science, Python

Tagged Anthropic, AWS-Bedrock, Gemini, LLM, OpenAI

Posted by Andy Wheeler on February 27, 2026

https://andrewpwheeler.com/2026/02/27/some-notes-on-unreliability-of-llm-apis/

Search for:
Recent Posts
Categories
Categories
Site RSS Feeds
- RSS - Posts
- RSS - Comments
Follow Blog via Email

Enter your email address to follow this blog and receive notifications of new posts by email.

Email Address:

Join 392 other subscribers
aoristic big-data cartography census choropleth citeulike consulting cost-benefit courses crime-mapping crime-trends Crime Analysis Criminal Justice data-manipulation data visualization deep-learning ESRI excel flow-data folium geocoding github google-streetview-api grammar of graphics group-based-trajectory gun-violence healthcare homicide-rates hot spots hypothesis-testing linear programming LLM logistic-regression machine-learning MACRO mapping matplotlib meta network NetworkX officer-involved-shooting open-science paper Papers peer-review Poisson prediction Predictive-Policing preprint presentation Python Python-programability pytorch quasi-experiment r recidivism regression resources scholarly scraping seaborn shootings simulation small-multiples social-media social-networking SPSS stackexchange Stata statistics survey time-series uncertainty wdd web-scraping
Top Posts & Pages
Stack Exchange

Andrew Wheeler

All posts in category data science

How long to conduct your experiment: Talk at ASEBP

xAI voice cloning API

Gathering interest in tech courses

Job Advice Resources page

The race to the bottom with AI tools

Overview of the Tools

The “LLM as a Judge” Problem

Meta

Stop Teaching R. Teach Python.

Using Claude Code to help me write

Copy Editing

Writing New Content

An Extra Note About Citations

Some notes on the unreliability of LLM APIs

OpenAI

Anthropic

Google

AWS

Buy the book!

Recent Posts

Categories

Site RSS Feeds

Follow Blog via Email

Top Posts & Pages

Stack Exchange