All posts in category Personal Productivity

Pangram is good

Many of the initial wave of “AI writing detectors” were quite bad. The biggest issue you need to be concerned about with an AI writing detector is false positives. If you are a professor and want to check students’ writing, it is very bad to falsely accuse a student.

The Pangram product, though, is quite good, and I suggest folks check it out.

The other main competitor on the market, GPTZero, is clearly lower quality (such as saying the Constitution is AI generated).

GPTZero in their documentation says they are the most accurate AI detector. One of the reasons you don’t really care about accuracy is that you cannot know the underlying rate of AI writing in any corpus except in the scenario where it is artificially generated. And that is the only scenario in which you can know the accuracy for sure. What you care about is specifically the false positive rate and the false negative rate.

Unlike GPTZero, Pangram appears to have very low false positive rates. A simple way to estimate the false positive rate is to just submit writing prior to 2022 to the tool and see how many it flags as AI. ChatGPT came out in late 2022, the tools to generate writing before that were just not even close for people to use in any serious way. So any writing flagged in the older corpus as AI is a false positive.

Here is an example examining legal briefs.

It is an independent assessment. We cannot really know the capture rate (were there more than 66 briefs generated via LLMs in that sample). We can know the false positive rate though. And it is 1/800 in this sample with Pangram.

Pangram says it has a 1 in 10,000 false positive rate across a wide array of writing samples. They even report in their own internal tests that GPTZero has a 2% false positive rate (I am pretty sure GPTZero’s false positive rate is much higher than 2%, hence the Constitution error.)

Many other checks for false negative rates involve people having various models generate writing and then classifying it. It is hard to know if those are very good benchmarks for estimating the false negative rate. But we can easily estimate the false positive rate, and in that respect Pangram is clearly better than other AI writing detectors on the market.

Should we care if writing is AI?

I have used AI tools to help me write. I promise to be forthcoming if I use AI to help me write any substantive sections of writing (in blog posts, books, social media posts, etc.) Currently I am almost always using the LLMs to copy-edit, which is often simply a prompt “check for spelling and grammar issues”.

I do not use it all the time for writing. This post was all written by hand (and then just copy-edited with Gemini CLI).

It is really not that hard to bring your own voice and use AI to aid your writing. Have the LLM read your prior work, then give it a detailed outline, and then iterate. See my transcript on a prior post for an example.

I’d note I have used Pangram to see if my LLM writing is too obviously AI, and it is not. To me, when the writing is clearly AI, this often signals a clear lack of care and effort in the writing. AI writing can be valuable, but it is quite frequently low value slop.

So you get people larping as tech experts.

You can trivially have Claude or whatever software write a Skill file, and then have an LLM write how it is super awesome. This does not make it so.

And you have salespeople write posts that literally make no sense.

This, to be clear, is obviously AI slop.

So these individuals could actually generate useful content if they spent any more than a trivial amount of time. But they don’t, and it shows.

Leave a comment

by Andy Wheeler on April 30, 2026 • Permalink

Posted in Personal Productivity, writing

Tagged LLM, pangram

Posted by Andy Wheeler on April 30, 2026

https://andrewpwheeler.com/2026/04/30/pangram-is-good/

LinkedIn Premium Does Not Boost your Posts

One of my connections mentioned in a post on LinkedIn that since he turned off Premium, his posts have been getting less engagement. Since LinkedIn offers a month for free, and I have been trying to promote my recent book, I figured I would try my free month trial and see how many more views I could get. (Here I am not worried about Premium for applying to new jobs, it is possible it is totally worth it for that, I was not applying to jobs in this test so I do not know.)

Long story short, LinkedIn Premium does not appear to promote my material at all above the baseline.

Post Views

In a sample of 30 posts the month before I turned on Premium (turned on 3/24 in the evening, turned off 4/22 in the morning), my posts had an average of 3600 views (with a standard deviation of 7000, median 1400). Post-Premium, I had 23 posts, and the views were on average 2200 (SD 2900, median 900). Here is the full table of posts and links (Premium=1 means it was posted when my Premium subscription was turned on):

| Premium | Views | URL |
| ----:|----- :|:----- |
| 0    | 3659  | https://www.linkedin.com/posts/andrew-wheeler-46134849_llms-have-transformed-the-data-science-industry-activity-7426975341572984832-HGTA |
| 0    | 2526  | https://www.linkedin.com/posts/andrew-wheeler-46134849_no-guarantees-but-i-am-going-to-try-to-start-activity-7428418993553846272-vdjA    |
| 0    | 2290  | https://www.linkedin.com/posts/andrew-wheeler-46134849_much-of-the-hype-around-claude-code-is-having-activity-7428781380391567360-zkXr   |
| 0    | 545   | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-benefits-of-my-epub-version-of-activity-7429143771109302272-_V6T       |
| 0    | 1454  | https://www.linkedin.com/posts/andrew-wheeler-46134849_claude-code-has-the-ability-to-create-hooks-activity-7429506167527088128-01Um     |
| 0    | 1326  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-prompting-flows-i-find-convenient-activity-7429868558794436609-SDgG    |
| 0    | 1278  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-main-focuses-in-the-book-is-not-activity-7430230940444057600-pqK-      |
| 0    | 5988  | https://www.linkedin.com/posts/andrew-wheeler-46134849_while-skills-in-claude-code-are-all-the-rage-activity-7430955707358818304-iqjb    |
| 0    | 1726  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-mistakes-i-see-with-agent-based-activity-7431318102212100096-rI_W      |
| 0    | 1172  | https://www.linkedin.com/posts/andrew-wheeler-46134849_from-my-experience-as-an-educator-when-presenting-activity-7431680485585580032-8h7B    |
| 0    | 1360  | https://www.linkedin.com/posts/andrew-wheeler-46134849_although-the-llm-tools-are-currently-focused-activity-7432042882770944000-AfAG    |
| 0    | 5304  | https://www.linkedin.com/posts/andrew-wheeler-46134849_when-i-was-a-professor-at-ut-dallas-i-sat-activity-7432405268874817536-qrAI       |
| 0    | 30732 | https://www.linkedin.com/posts/andrew-wheeler-46134849_i-know-a-few-stats-folks-in-my-network-that-activity-7432767666781679617-HXnk |
| 0    | 1003  | https://www.linkedin.com/posts/andrew-wheeler-46134849_claude-code-does-not-have-an-image-model-activity-7433492422111776768-2dqY        |
| 0    | 884   | https://www.linkedin.com/posts/andrew-wheeler-46134849_while-i-have-a-section-in-the-book-devoted-activity-7433854815862013952-FIl-      |
| 0    | 888   | https://www.linkedin.com/posts/andrew-wheeler-46134849_in-the-book-i-have-a-dedicated-chapter-on-activity-7434217215450681344-hk3E       |
| 0    | 1868  | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-llm-book-is-compiled-using-quarto-so-activity-7434579595095580673-RHdx        |
| 0    | 807   | https://www.linkedin.com/posts/andrew-wheeler-46134849_llms-for-mortals-how-to-view-the-epub-activity-7434945455073169409-bNDu           |
| 0    | 1243  | https://www.linkedin.com/posts/andrew-wheeler-46134849_section-on-using-gliner-for-ner-activity-7435304382931513344-NPPC                 |
| 0    | 1745  | https://www.linkedin.com/posts/andrew-wheeler-46134849_my-first-book-data-science-for-crime-analysis-activity-7436029137695416320-aRmr   |
| 0    | 914   | https://www.linkedin.com/posts/andrew-wheeler-46134849_so-the-new-book-large-language-models-for-activity-7436376426100199424-1g8E       |
| 0    | 1593  | https://www.linkedin.com/posts/andrew-wheeler-46134849_agentic-coding-apps-like-claude-code-and-activity-7436738847054512128-qiXz        |
| 0    | 3415  | https://www.linkedin.com/posts/andrew-wheeler-46134849_many-people-are-turned-off-by-ai-writing-activity-7437101213717909504-QjOo        |
| 0    | 928   | https://www.linkedin.com/posts/andrew-wheeler-46134849_pretty-much-every-day-there-is-a-new-prompt-activity-7437463609401794560-50H-     |
| 0    | 2185  | https://www.linkedin.com/posts/andrew-wheeler-46134849_much-of-the-hype-around-skills-is-imo-people-activity-7437826000958337024-t6i3    |
| 0    | 800   | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-benefits-of-my-llm-for-mortals-activity-7438550758763098112-hQUw |
| 0    | 870   | https://www.linkedin.com/posts/andrew-wheeler-46134849_large-language-models-for-mortals-preview-activity-7438913140639207424-tAli       |
| 0    | 1948  | https://www.linkedin.com/posts/andrew-wheeler-46134849_new-blog-post-using-claude-code-to-help-activity-7441087469388861440-TCKq         |
| 0    | 1160  | https://www.linkedin.com/posts/andrew-wheeler-46134849_given-all-the-rage-with-generative-ai-and-activity-7441449857480933377-uPFw       |
| 0    | 27842 | https://www.linkedin.com/posts/andrew-wheeler-46134849_stop-teaching-r-teach-python-when-i-was-activity-7441812266938826753-DywF         |
| 1    | 526   | https://www.linkedin.com/posts/andrew-wheeler-46134849_forecasting-the-future-is-difficult-especially-activity-7442537064803368960-qsVO  |
| 1    | 13096 | https://www.linkedin.com/posts/andrew-wheeler-46134849_when-using-llms-to-do-structured-data-extraction-activity-7442899426471407617-CpZz |
| 1    | 2394  | https://www.linkedin.com/posts/andrew-wheeler-46134849_ive-spoken-with-many-people-who-are-concerned-activity-7443039100477145090-v_3i   |
| 1    | 646   | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-main-audience-my-book-large-language-activity-7443261810511757312-R2Jc        |
| 1    | 3030  | https://www.linkedin.com/posts/andrew-wheeler-46134849_for-the-folks-that-were-not-happy-with-my-activity-7443401497444409344-q38H       |
| 1    | 437   | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-current-capabilities-of-googles-activity-7443624184120754176-YHbw      |
| 1    | 5275  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-error-i-am-seeing-devs-continually-make-activity-7443986571650973696-TZ-K     |
| 1    | 3815  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-biggest-issues-with-using-generative-activity-7444348969100664832-pX0v |
| 1    | 738   | https://www.linkedin.com/posts/andrew-wheeler-46134849_reports-of-rags-demise-are-overstated-activity-7444711358421491712-6BJu           |
| 1    | 425   | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-recent-litellm-distribution-attack-highlights-activity-7445073747968999424-NEmn    |
| 1    | 3752  | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-responses-to-me-writing-the-book-activity-7445436130080186369-7Fvx     |
| 1    | 1670  | https://www.linkedin.com/posts/andrew-wheeler-46134849_professors-that-follow-me-i-am-happy-to-activity-7446160904217407488-CXcZ         |
| 1    | 918   | https://www.linkedin.com/posts/andrew-wheeler-46134849_i-have-used-claude-code-the-longest-probably-activity-7447248077435920385-Ym8A    |
| 1    | 876   | https://www.linkedin.com/posts/andrew-wheeler-46134849_gio-has-a-new-post-out-on-examining-confidence-activity-7448697613513740289-hObZ  |
| 1    | 1959  | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-mythos-technical-blog-post-on-its-cybersecurity-activity-7449059395071709184-UJOC  |
| 1    | 2201  | https://www.linkedin.com/posts/andrew-wheeler-46134849_for-folks-that-use-jupyter-notebooks-one-activity-7449422388691243008-Twxn        |
| 1    | 626   | https://www.linkedin.com/posts/andrew-wheeler-46134849_one-of-the-recommendations-i-have-in-the-activity-7450147165781426177-vceO        |
| 1    | 333   | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-term-agent-is-almost-always-used-as-activity-7450509557677625344-42Ze         |
| 1    | 5787  | https://www.linkedin.com/posts/andrew-wheeler-46134849_agent-based-systems-require-bad-python-code-activity-7450871941458190336-gzSl     |
| 1    | 468   | https://www.linkedin.com/posts/andrew-wheeler-46134849_broadly-there-are-two-types-of-agent-based-activity-7451234329390678016-ZU1h      |
| 1    | 1267  | https://www.linkedin.com/posts/andrew-wheeler-46134849_i-get-periodically-asked-what-is-the-best-activity-7451596716354707456-aFMF       |
| 1    | 346   | https://www.linkedin.com/posts/andrew-wheeler-46134849_the-saying-a-picture-is-worth-a-1000-words-activity-7451959111132299264-Tk7o      |
| 1    | 480   | https://www.linkedin.com/posts/andrew-wheeler-46134849_it-is-important-to-have-independent-benchmark-activity-7452318150701953024-ftcM        |

I would have expected a multiplier (e.g. typically 3k views, now you have 6k or 9k views per post). So you could nitpick that I have differential timing for the posts, and the pre-premium posts have some contamination (if they promoted my older posts when I activated Premium). But those are not large enough to make a difference in my findings relative to what I expected.

The posts are quite comparable in content, mostly focused on my book and LLMs. It is possible my audience is oversaturated with that content, but I think it is just as likely that LinkedIn Premium doesn’t really promote your work to any substantive extent. (I have additionally obtained more followers in this period, so that should bias the results to have more views, not less.) At least here there is no evidence I should continue to pay $20 a month to increase my reach on LinkedIn.

Posts are bursty, and in the end I have very little ability to forecast what will or will not be popular. In the pre-period, my most popular post was on a blog post I did on log-probabilities (30k views). I definitely try to post more technical stuff on LinkedIn than the typical social media influencer, so that limits the reach.

I also had a rage-bait post on professors should teach python and not R with just under 30k views. (That was a bit of social media manipulation – have a controversial opinion that divides people, you get a bunch of thumbs up and a bunch of comments.) I do not have that many potential rage-bait post topics!

In addition to this, I also did the month for free for LinkedIn Premium for my business Crime De-Coder page. The same with my business page, I did not see any increased views, increased followers, etc.

Profile Views

Although I have not seen LinkedIn explicitly say Premium boosts your posts (besides actually paying for advertising), I have seen LinkedIn explicitly advertise that Premium profiles get more views:

So how do profile views look? I did get more the week I signed up, but it was trending upward previously, and reverted to the trend after week one anyway. (A few days short, I cannot access the chart week by week since turning off Premium.)

For a bit of background, I spent most of my time posting on my LinkedIn business Crime De-Coder page, and only posted on my personal page maybe once or twice a month. But since publishing LLMs for Mortals (in February of 2026), I have posted more on my personal. Which you can see increased my profile views before I signed up for Premium.

Likely the past additional profile views are for that rage-bait Python vs R post that was popular, not due to anything Premium did.

This appears to be extremely misleading advertising on LinkedIn’s part. If they just look at Premium vs not, it is likely Premium users are more active. This should just say the explicit “boost” profile views get, like ranked higher in searches.

$100 ad credit

With premium, you get $100 ad credit for posts a month. I used this to boost my original LLM for Mortals launch post, which was stale at that point and not accumulating any additional views.

The metrics on the post were as LinkedIn said they would be. Despite having 80+ likes when I first created it, the post only had 3700 views. Spending $100 on the credits got me an additional ~3500 views and supposedly ~50 additional website clicks. (I am confused how this is calculated, as I can see the actual link in the post was clicked fewer than 10 additional times with the campaign.)

I knew going in that adverts on LinkedIn are not a net benefit given my book purchase conversion rates. What I will call “high trust” referrals, I have something like a 1/100 purchase rate for the book. For other mediums, it is more like 1/1000. As far as I can tell, these seem pretty typical for a higher dollar value book purchase ($50+).

I have debated on setting the purchase price for the epub to much lower. $50 is in line with current offerings from O’Reilly, and in my informal demand curve tests is where I think it should be. But I don’t think any realistic conversion rate would make LinkedIn advertising make sense for my book.

For reference for influencers though, this gives a rough estimate comparable to LinkedIn’s direct advertising. Basically my average post is worth $100 according to LinkedIn. I only have around 3k followers currently on LinkedIn, so I imagine folks with followings 10x that can likely do direct advertisements to their audiences for more like $1k and up.

Wrap Up

I still think LinkedIn is the best social media site currently to promote my work and business. It is not just about the raw view counts, but also about conversion to people buying my book or reaching out for additional consulting gigs.

I will continue to use LinkedIn for this, but paying for a Premium LinkedIn account does not appear to be worth it for these reasons. Even if the views were increased, it is possible that they are not good connections for these end goals.

There are additional things you get with Premium (can send cold messages to people you are not connected to, supposedly higher priority when applying to jobs). Those are maybe worth the $20 a month for some people. But focusing on what LinkedIn advertises for “boosting” your posts and profile, I did not personally see any evidence that would justify spending even $1 a month for the Premium features.

Leave a comment

by Andy Wheeler on April 25, 2026 • Permalink

Posted in data science, Personal Productivity, social networking, writing

Tagged LinkedIn

Posted by Andy Wheeler on April 25, 2026

https://andrewpwheeler.com/2026/04/25/linkedin-premium-does-not-boost-your-posts/

Job Advice Resources page

Minor update, I have created a page, Job Advice Resources to cumulatively list all the materials I have written on advice for social scientists and crime analysts looking to pivot into private sector tech roles.

I still get maybe ~2 folks a month ask for advice, and I am always happy to chat. I wish PhD granting institutions took this more seriously (it only takes minor changes to better prepare students).

If you are an administrator of a PhD program and actually care about getting your students jobs, also feel free to reach out and I am happy to discuss how I can help.

Leave a comment

by Andy Wheeler on April 22, 2026 • Permalink

Posted in Crime Analysis, Criminal Justice, data science, Personal Productivity, Python, scholarly

Tagged tech-jobs

Posted by Andy Wheeler on April 22, 2026

https://andrewpwheeler.com/2026/04/22/job-advice-resources-page/

Stop Teaching R. Teach Python.

There has been a slight transition in social science teaching since I have been a student and professor over the past ~15+ years. In the aughts, it was still common to teach students in legacy, closed source statistical software (SPSS, SAS, and Stata). When I was a PhD student in criminal justice at SUNY Albany, we had a specific class to learn SPSS, although most of the rest of the quantitative courses used Stata.

The R programming language has likely usurped the use of the closed source languages in social science education after the aughts though. (I do not have hard data, but that is my impression seeing what colleagues are using and what they teach in classes.)

I am familiar with all of the major statistical programs (I have written an R package, and you can see this blog for many examples of SPSS and a few for Stata). If the goal in coursework is to teach your students skills relevant to help them get a job, academics in social science institutions should teach their students Python. The current job market for quantitative work is dominated by Python positions.

To be clear, I am not fundamentally opposed to closed source programming languages (there are scenarios where SPSS/SAS make more sense than Hadoop systems I have seen, also if you are a GIS analyst you should learn ESRI tools). This is purely just an observation given the current private sector job market – focusing primarily on Python makes the most sense for social science students.

As an experiment, I went onto LinkedIn and did a search for “data scientist”. Your results will differ (mine are tailored to the Raleigh area, and also includes more senior positions), but here is a table of the positions that came up on the first page, and a quick summary of the tech stacks they require. While this is not a systematic sample, it gives a reasonable snapshot of current expectations.

| Company             | Job Title                           | Tech Stack                                           | URL                                            |
|---------------------|-------------------------------------|------------------------------------------------------|----------------------------------------------- |
| Google              | Data Scientist (Google Voice 2)     | Python, R, SQL                                       | https://www.linkedin.com/jobs/view/4387751995/ |
| Deloitte            | AI Specialist                       | None specified                                       | https://www.linkedin.com/jobs/view/4376183670/ |
| Ascensus            | Principal Analytics                 | R, Python, SQL, GenAI/LLM                            | https://www.linkedin.com/jobs/view/4380164400/ |
| EY                  | AI Lead Engineer                    | Python, C#, R, GenAI/LLM                             | https://www.linkedin.com/jobs/view/4385954762/ |
| PwC                 | GenAI Python Systems Engineer (2)   | Python, SQL, Cloud Platforms, GenAI/LLM              | https://www.linkedin.com/jobs/view/4373604638/ |
| Affirm              | Senior Machine Learning Engineer    | Python, Spark/Ray                                    | https://www.linkedin.com/jobs/view/4326673670/ |
| Lexis Nexis         | Lead Data Scientist                 | Cloud Platforms, GenAI/LLM                           | https://www.linkedin.com/jobs/view/4316327742/ |
| EY                  | AI Finance                          | SQL, Python, Azure, GenAI/LLM                        | https://www.linkedin.com/jobs/view/4385085950/ |
| Korn Ferry          | Sr. Data Scientist                  | Python, R, Spark, AWS, GenAI/LLM                     | https://www.linkedin.com/jobs/view/4387433496/ |
| Deloitte            | Data Science Manager                | Python, Cloud                                        | https://www.linkedin.com/jobs/view/4304674642/ |
| First Citizens Bank | Senior Quant Model Developer        | Python, SAS, SQL                                     | https://www.linkedin.com/jobs/view/4365378242/ |
| First Citizens Bank | Senior Manager Quant Analysis       | Python, SAS, Tableau                                 | https://www.linkedin.com/jobs/view/4388131284/ |
| Jobot               | ML Solution Architect               | Python, Scala, Spark, AWS, Snowflake                 | https://www.linkedin.com/jobs/view/4384023540/ |
| Affirm              | Analyst II                          | SQL, Python, R, CPLEX/Gurobi, Databricks/Snowflake   | https://www.linkedin.com/jobs/view/4373303038/ |
| Red Hat             | Sr Machine Learning Engineer (vLLM) | Python, GenAI/LLM                                    | https://www.linkedin.com/jobs/view/4354827922/ |
| Alliance Health     | Director AI                         | Python (TensorFlow/PyTorch), Office Products, GenAI  | https://www.linkedin.com/jobs/view/4383011480/ |
| Nubank              | ML Data Engineer                    | Python, Ray/Spark                                    | https://www.linkedin.com/jobs/view/4376815752/ |
| Target RWE          | Senior Quant Data Scientist         | R                                                    | https://www.linkedin.com/jobs/view/4385293724/ |
| Siemens             | Senior Data Analytics               | SQL, Python, R, Tableau/PowerBI                      | https://www.linkedin.com/jobs/view/4377969531/ |
| Red Hat             | Sr Machine Learning Engineer        | Python, GenAI/LLM                                    | https://www.linkedin.com/jobs/view/4302769773/ |
| Lexis Nexis         | Director Data Sciences              | Python, R, GenAI/LLM                                 | https://www.linkedin.com/jobs/view/4387335028/ |
| Cigna               | Data Science Senior Advisor         | Python, SQL                                          | https://www.linkedin.com/jobs/view/4381766145/ |
| Thermo Fisher       | Senior Manager Data Engineering     | Fabric, PowerBI, Python, Databricks, Tableau, SAS    | https://www.linkedin.com/jobs/view/4372684009/ |

Of the positions:

9/25 roles included R, but only one required R exclusively. The other 8 were Python/SQL/R
22/25 included Python
11/25 had a focus on Generative AI or LLMs

Python dominates R in the current job market for data science positions. Professors are doing their students a disservice teaching R, the same way they would be doing a disservice teaching their students to code in Fortran.

Another aspect I noticed for this – analyst type jobs not all that long ago really only expected Excel (and maybe SQL). Now even the majority of the analyst jobs expect Python (even more so than dashboard tools like PowerBI in this sample).

For individuals on the job market, I suggest going and doing your own experiment job search like this on LinkedIn to see the tech skills you need to be able to at least get your foot in the door for an interview. I expected GenAI to be slightly more popular (only 11/25), but there were a few other technologies sprinkled in enough it may be good to become familiar with to widen your potential pool (Cloud and Spark – I am surprised Databricks was not listed more often).

If you’re looking to build Python skills from scratch, I cover this in my book: Data Science for Crime Analysis with Python (can purchase in paperback or epub at my store).

If also interested in learning about generative AI, see my book Large Language Models for Mortals: A Practical Guide for Analysts with Python.

You can use the coupon TWOFOR1 to get $30 off when purchasing multiple books from my store.

1 Comment

by Andy Wheeler on March 22, 2026 • Permalink

Posted in data science, Personal Productivity, Python, R

Tagged teaching

Posted by Andy Wheeler on March 22, 2026

https://andrewpwheeler.com/2026/03/22/stop-teaching-r-teach-python/

Using Claude Code to help me write

Using LLMs to help you write is understandably a touchy subject for many. There is quite a bit of AI slop coming out now, as it is really easy to just have the LLM tools think for you and write superficially OK but ultimately garbage prose.

My recent book, LLMs for Mortals, I used Sonnet 4.1 to write the initial draft of the book (for around $5). My prior book took around a year, whereas I was able to finish this book in around two months. I definitely did a ton of copy-editing (maybe around 20-30 hours per chapter on average), but I believe around 50% of the book material is the original Sonnet generated prose.

LLMs are a tool – they can be used poorly, but I think they can be used quite well. Pangram, a tool used to detect AI writing, does not flag any of the passages in LLMs for Mortals as AI generated.

This blog post goes over my notes on how I used Claude Code to help me write (although it really is applicable to any of the current coding tools, like Codex or Gemini as well). As a meta-reference, this blog post is 100% written by myself directly, but I will link to a draft written using Claude Code later in the post for a frame of reference.

Copy Editing

First, even if you do not agree with having an LLM write for you directly, there is a use case that should be relatively uncontroversial – having an LLM take a copy-edit pass on your work.

Here is an example I used this for recently, the blog post on Crime De-Coder goes over the benefits of using an API vs local LLMs. In this conversation, you can see my original draft, and the suggestions that Claude’s desktop tool (the free version) gave.

Again this is not really specific to Claude (this would have worked fine in ChatGPT as well). LLMs are good for not only spelling errors, but grammatical issues that spell check will not catch, as well as just more general copy-editing advice on the content.

One point of this – to replicate my setup, you need to write in plain text. Most of the things I write are in some form of markdown (plain markdown for blog posts, and Quarto for longer reports/books/etc). This makes it much easier to use the tools, especially the command line interface (CLI) tools like Claude Code.

Writing New Content

There are two big issues currently with LLM writing:

it is potentially wrong
current LLM writing has a particular style that is itself becoming noticeable

The first bullet, you need to review what it writes. It is much easier to have it write on content you are an expert in, so it is easier to review and spot errors. (It is the same current problem with using the tools to help you write computer code – they are boons for seniors but can write a ton of slop that more neophyte coders have a hard time spotting issues.)

The second bullet, having the style mimic your own, is what I am going to discuss here. It is worth understanding at a high level how generative AI LLMs work – if you ask “answer question X” vs “here is a book, …., answer question X” the LLM will generate a different response. The first part in the former prompt, “here is a book, …” is what is referred to the context. Current models have context windows (how large of a potential input) at around 500,000 words (technically they are around 1 million tokens, one word is often multiple tokens though).

You generally do not want to fill up the context window 100%, but 500,000 words is a very large number – just including text it would be multiple books. Another common prompting technique is what is called k-shot examples. It will typically go like

example input1: ...text... expected_output: ...blah...
example input2: ...diff text... expected_output: ...blah2...
....

This is what you place in the context window, then submit your usual prompt, and have the LLM generate the content. It is giving prior examples to help guide the LLM what you expect the final output to look like. This works the same way with writing – give the LLM prior examples of your writing to help it mimic your future style.

To keep it simple, I have created an example on github to follow along. Basically just have your prior writing (in text!), and then ask Claude Code something like:

review my prior blog posts in folder /blogposts, I am going to have you write a new blog post on topic X given the outline *after* you review the text

Then after your prior work is in the context window, feed the LLM an outline for what you want to write. In this example, I put the outline in an actual text file and said:

In the ClaudeWritingPost folder, review the outline.txt, then create a new md 
file, called ClaudeWritingExample.md, filling in the sections based on the 
outline

Claude Code will then go and review the text file with the outline and write the post. In the github repo I have my original outline for this same post, so you can see side-by-side.

You can technically write custom commands and skills with Claude Code (or the other CLI tools) to save the steps of typing two prompts, but to keep it simple for folks I am just showing the two steps manually. It is really just those two steps – get your prior examples into the context window, and then feed an outline for what else to write.

In the Github repo you can see some additional Claude.md files – these are files that include additional instructions. A common one I say is “do not include emojis”. LLM writing also tends to be verbose and have excessive lists. So I have instructions to avoid those as well.

The written blog post is not bad – I would suggest to go and read it as a proof of concept (I exported the session, can see it cost around fifty cents). Part of the reason I do not typically worry about blog posts is that I often add in things/change things in the process of writing. So you can see my personally written post is longer and has a few more elements.

So when would you use it? Technical writing, like writing tutorials in python, it works very well. Hence I could have it write the first pass on my LLM book and keep 50% of the content. I may use it for blog posts in the future (if I felt compelled to write something every day). But will not take that plunge for now.

For longer pieces, like an entire paper or a book, I suggest to not only make a detailed outline, but to also have the LLM write it in smaller sections. This both helps with reviewing the content, as well as to keep the LLM on track if you make edits/changes as you go. (Longer conversations it is more likely to degrade and make repeated errors.)

An Extra Note About Citations

I am not writing academic papers much anymore, but another fundamental problem with LLM writing is hallucinating citations. If you write in text markdown files, my suggestion is fairly simple – have the papers you want to cite in a bibtex file, and in-line in markdown, only cite papers in the form:

Citation, @item1 says blah [@item1; @item2]. For a specific page quote [@item1 p. 34-35].

The way I write my outlines, it typically is like write a paragraph about X, cite papers a,b,c. So my personal style of progressively filling in an outline works well with LLMs.

So this presumes you already have a list of papers (and are not using the LLM to dynamically write your lit review based on papers you have not read). Next time I actually need to write an academic paper, I may write up an MCP tool to query Semantic Scholar’s API and create a nice bibtex file.

But the solution here is again you need to review the output for accuracy. People without these tools are lazy and cite things they have not read already, so that will continue to happen (the tools just make it easier). Those that figure out how to use the tools appropriately though can be much more productive writing.

1 Comment

by Andy Wheeler on March 20, 2026 • Permalink

Posted in data science, Personal Productivity, writing

Tagged claude-code, LLM

Posted by Andy Wheeler on March 20, 2026

https://andrewpwheeler.com/2026/03/20/using-claude-code-to-help-me-write/

Year in Review 2025 and AI Predictions

For a brief year in review, total views for the two different websites have decreased in the past year. For this blog, I am going to be a few thousand shy of 100,000 views. (2023 I had over 150k views, and 2024 I had over 140k views.) For the Crime De-Coder site, I am going to only get around 15k views.

Part of it is I posted less, this will be the 21st blog post this year on the personal blog (2023 had 46 and 2024 had 32 posts). The Crime De-Coder site had 12 blog posts, so pretty consistent with the prior year. Both are pretty bursty, with large bouts of traffic coming from if I post something to Hacker News I can get 1k to 10k views in a day or two if it makes it to the front page. So the 2024 stats for the crime de-coder was a few of those Hacker News bumps I did not get in 2025.

Some of it could legitimately be traditional Google search being usurped by the gen AI tools. This is the first year I had appreciable referrals from chatgpt, but they are less than 1000. The other tools are trivial amount of referrals. If I worried about SEO more, I would have more updating/regular content (as old pages are devalued quite a bit by google, and it seems to be getting more severe over time).

I have upped my use of the free tools quite a bit. ChatGPT knows me pretty well, and I use Claude Desktop almost every day as well.

An IAM policy scroll is more of a nightmare, and I definitely ask more python questions than R, but the cartoon desk is pretty close to spot on. I am close to paying for Anthropic subscription for Claude code credits (currently use pay as I go via Bedrock, and this is the first month I went over $20).

What pages on the blog are popular I can never be sure of. My most popular post last year was Downloading Police Employment Trends from the FBI Data Explorer. A 2023 post, that had random times where it would have several hundred visits in a short hour span. (Some bot collecting sites? I do not know.) If it is actual people, you would want to check out my Sworn Dashboard site, where you can look at trends for PDs much easier than downloading all the data yourself!

One thing that has grown though, I do short form posting on LinkedIn on my crime de-coder page. Impressions total for the year is over 340k (see the graph), and I currently am a few shy of 4400 followers.

LinkedIn is nice because it can be slightly longer form than the other social media sites. I would suggest you follow me there (in addition to signing up for RSS feeds for the two sites). That is the easiest way to follow my work.

I also took over as a moderator of the Crime Analysis Reddit forum, it is better than the IACA forums in my opinion, so encourage folks to post there for crime analysis questions.

Crime De-Coder Work

Crime De-Coder work has been steady (but not increasing). Similar to last year had several consulting gigs conducting crime analysis for premises liability cases (and one other case I may share my opinions once it is over), and doing some small projects with non-profits and police departments.

One big project was a python training in Austin.

The Python Book (which I also translated to Spanish/French), had a trickle of new sales. 2024 had around 100 sales and 2025 had around 50 sales. It is close to 2/3 print sales and 1/3 epub, so definately folks should have physical prints if you are selling books still.

Doing trainings basically makes writing the book worth it, but I do hope eventually the book makes it way into grad school curriculum’s. (Only one course so far.) I have pitched to grad schools to have me run a similar bootcamp to what I do for crime analysts, so if interested let me know.

The biggest new thing was Crime De-Coder got an Arnold Grant. Working with Denver PD on an experiment to evaluate a chronic offender initiative.

At the Day Gig

At my day gig, I was officially promoted to a senior manager and then quickly to a director position. Hence you get posts like what to show in your tech resume and notes on project management.

One of the reasons I am big on python – it is the dominant programming language in data science. It is hard for me to recruit from my network, as majority of individuals just know a little R (if you were a hard core R person, had packages/well executed public repo’s, I could more easily think you will be able to migrate to python to work on my team).

So learn python if you want to be a data scientist is my advice (and see other job market advice at my archived newsletter).

AI Predictions

At the day gig, my work went from 100% traditional supervised machine learning models to more like 50/50 traditional vs generative AI applications. The genAI hype is real, but I think it is worthwhile putting my thoughts to paper.

The biggest question is will AI take all of our jobs? I think a more likely end scenario is the AI tools just become better at helping humans do tasks. The leap from helping a human do something faster vs an AI tool doing it 100% on its own with 0 human input is hard. The models are getting incrementally better, but I think to fully replace people in a substantive way will require another big advancement in fundamental capabilities. Making a human 10x more productive is easier and still will make the AI companies a ton of money.

Sometimes people view the 10x idea and say that will take jobs, just not 100% of jobs. That is a view though that there is only a finite amount of work to be done. That assumption is clearly not true, and being able to do work faster/cheaper just induces demand for more potential work. The example with calculators making more banking jobs, not less, is basically the same example.

One of the critiques of the current systems is they are overvalued, so we are in a bubble. I do not remember where I read it, but one estimate was if everyone in the US spent $1 a day on the different AI tools, that would justify the current valuations for OpenAI, Anthropic, NVIDIA, etc. I think that is totally doable, we spend a few thousand a workday at Gainwell on the foundation models for example for a few projects, and we are just going to continue to roll out more and more. Gainwell is a company with around 6k employees for reference, and our current AI applications touch way less than 1k of those employees. We have plenty of room to grow those applications.

It is super hard though to build systems to help people do things faster. And we are talking like “this thing that used to take 30 minutes now takes 15 minutes”. If you have 100 people doing that thing all the time though, the costs of the models are low enough it is an easy win.

And this mostly only holds true for knowledge economy work that can be all done via software. There just still needs to be fundamental improvements to robotics to be able to do physical things. The tailor’s job is safe for the foreseeable future.

The change in the data science landscape to more generative AI applications definitely requires social scientists and analysts to up their game though to learn a new set of tools. I do have another book in the works to address that, so hopefully you will see that early next year.

Leave a comment

by Andy Wheeler on December 24, 2025 • Permalink

Posted in Crime Analysis, Criminal Justice, Personal Productivity, Python, R, scholarly

Tagged year-in-review

Posted by Andy Wheeler on December 24, 2025

https://andrewpwheeler.com/2025/12/24/year-in-review-2025-and-ai-predictions/

What to show in your tech resume?

Jason Brinkley on LinkedIn the other day had a comment on the common look of resumes – I disagree with his point in part but it is worth a blog post to say why:

So first, when giving advice I try to be clear about what I think are just my idiosyncratic positions vs advice that I feel is likely to generalize. So when I say, you should apply to many positions, because your probability of landing a single position is small, that is quite general advice. But here, I have personal opinions about what I want to see in a resume, but I do not really know what others want to see. Resumes, when cold applying, probably have to go through at least two layers (HR/recruiter and the hiring manager), who each will need different things.

People who have different colored resumes, or in different formats (sometimes have a sidebar) I do not remember at all. I only care about the content. So what do I want to see in your resume? (I am interviewing for mostly data scientist positions.) I want to see some type of external verification you actually know how to code. Talk is cheap, it is easy to list “I know these 20 python libraries” or “I saved our company 1 million buckaroos”.

So things I personally like seeing in a resume are:

code on github that is not a homework assignment (it is OK if unfinished)
technical blog posts
your thesis! (or other papers you were first/solo author)

Very few people have these things, so if you do and you land in my stack, you are already at the like 95th percentile (if not higher) for resumes I review for jobs.

The reason having outside verification you actually know what you are doing is because people are liars. For our tech round, our first question is “write a python hello world program and execute it from the command line” – around half of the people we interview fail this test. These are all people who list they are experts in machine learning, large language models, years of experience in python, etc.

My resume is excessive, but I try to practice what I preach (HTML version, PDF version)

I added some color, but have had recruiters ask me to take it off the resume before. So how many people actually click all those links when I apply to positions? Probably few if any – but that is personally what I want to see.

There are really only two pieces of advice I have seen repeatedly about resumes that I think are reasonable, but it is advice not a hard rule:

I have had recruiters ask for specific libraries/technologies at the top of the resume
Many people want to hear about results for project experience, not “I used library X”

So while I dislike the glut of people listing 20 libraries, I understand it from the point of a recruiter – they have no clue, so are just trying to match the tech skills as best they can. (The matching at this stage I feel may be worse than random, in that liars are incentivized, hence my insistence on showing actual skills in some capacity.) It is infuriating when you have a recruiter not understand some idiosyncratic piece of tech is totally exchangeable with what you did, or that it is trivial to learn on the job given your prior experience, but that is not going to go away anytime soon.

I’d note at Gainwell we have no ATS or HR filtering like this (the only filtering is for geographic location and citizenship status). I actually would rather see technical blog posts or personal github code than saying “I saved the company 1 million dollars” in many circumstances, as that is just as likely to be embellished as the technical skills. Less technical hiring managers though it is probably a good idea to translate technical specs to more plain business implications though.

2 Comments

by Andy Wheeler on November 1, 2025 • Permalink

Posted in data science, Personal Productivity, Python, scholarly

Tagged resume

Posted by Andy Wheeler on November 1, 2025

https://andrewpwheeler.com/2025/11/01/what-to-show-in-your-tech-resume/

Deep research and open access

Most of the major LLM chatbot vendors are now offering a tool called deep research. These tools basically just scour the web given a question, and return a report. For academics conducting literature reviews, the parallel is obvious. We just tend to limit the review to peer reviewed research.

I started with testing out Google’s Gemini service. Using that, I noticed almost all of the sources cited were public materials. So I did a little test with a few prompts across the different tools. Below are some examples of those:

Google Gemini question on measuring stress in police officers (PDF, I cannot share this chat link it appears)
OpenAI Effectiveness of Gunshot detection (PDF, link to chat)
Perplexity convenience sample (PDF, Perplexity was one conversation)
Perplexity survey measures attitudes towards police (PDF, see chat link above)

The report on officer mental health measures was an area I was wholly unfamiliar. The other tests are areas where I am quite familiar, so I could evaluate how well I thought each tool did. OpenAI’s tool is the most irksome to work with, citations work out of the box for Google and Perplexity, but not with ChatGPT. I had to ask it to reformat things several times. Claude’s tool has no test here, as to use its deep research tool you need a paid account.

Offhand each of the tools did a passable job of reviewing the literature and writing reasonable summaries. I could nitpick things in both the Perplexity and the ChatGPT results, but overall they are good tools I would recommend people become familiar with. ChatGPT was more concise and more on-point. Perplexity got the right answer for the convenience sample question (use post-stratification), but also pulled in a large literature on propensity score matching (which is only relevant for X causes Y type questions, not overall distribution of Y). Again this is nit-picking for less than 5 minutes of work.

Overall these will not magically take over writing your literature review, but are useful (the same way that doing simpler searches in google scholar is useful). The issue with hallucinating citations is mostly solved (see the exception for ChatGPT here). You should consult the original sources and treat deep research reports like on-demand Wikipedia pages, but lets not kid ourselves – most people will not be that thorough.

For the Gemini report on officer mental health, I went through quickly and broke down the 77 citations across the publication type or whether the sources were in HTML or PDF. (Likely some errors here, I went by the text for the most part.) For the HTML vs PDF, 59 out of 77 (76%) are HTML web-sources. Here is the breakdown for my ad-hoc categories for types of publications:

Peer Review (open) – 39 (50%)
Peer review (just abstract) 10 (13% – these are all ResearchGate)
Open Reports 23 (30%)
Web pages 5 (6%)

For a quick rundown of these. Peer reviewed should be obvious, but sometimes the different tools cite papers that are not open access. In these cases, they are just using the abstract to madlib how Deep Research fills in its report. (I consider ResearchGate articles here as just abstract, they are a mix of really available, but you need to click a link to get to the PDF in those cases. Google is not indexing those PDFs behind a wall, but the abstract.) Open reports I reserve for think tank or other government groups. Web pages I reserve for blogs or private sector white papers.

I’d note as well that even though it does cite many peer review here, many of these are quite low quality (stuff in MDPI, or other what look to me pay to publish locations). Basically none of the citations are in major criminology journals! As I am not as familiar with this area this may be reasonable though, I don’t know if this material is often in different policing journals or Criminal Justice and Behavior and just not being picked up at all, or if that lit in those places just does not exist. I have a feeling it is missing a few of the traditional crim journal sources though (and picks up a few sources in different languages).

The OpenAI report largely hallucinated references in the final report it built (something that Gemini and Perplexity currently do not do). The references it made up were often portmanteaus of different papers. Of the 12 references it provided, 3 were supposedly peer reviewed articles. You can in the ChatGPT chat go and see the actual web-sources it used (actual links, not hallucinated). Of the 32 web links, here is the breakdown:

Pubmed 9
The Trace 5
Kansas City local news station website 4
Eric Piza’s wordpress website 3
Govtech website 3
NIJ 2

There are single links then to two different journals, and one to the Police Chief magazine. I’d note Eric’s site is not that old (first RSS feed started in February 2023), so Eric making a website where he simple shares his peer reviewed work greatly increased his exposure. His webpage in ChatGPT is more influential than NIJ and peer reviewed CJ journals combined.

I did not do the work to go through the Perplexity citations. But in large part they appear to me quite similar to Gemini on their face. They do cite pure PDF documents more often than I expected, but still we are talking about 24% in the Gemini example are PDFs.

The long story short advice here is that you should post your preprints or postprints publicly, preferably in HTML format. For criminologists, you should do this currently on CrimRXiv. In addition to this, just make a free webpage and post overviews of your work.

These tests were just simple prompts as well. I bet you could steer the tool to give better sources with some additional prompting, like “look at this specific journal”. (Design idea if anyone from Perplexity is listening, allow someone to be able to whitelist sources to specific domains.)

Other random pro-tip for using Gemini chats. They do not print well, and if they have quite a bit of markdown and/or mathematics, they do not convert to a google document very well. What I did in those circumstances was to do a bit of javascript hacking. So go into your dev console (in Chrome right click on the page and select “Inspect”, then in the new page that opens up go to the “Console” tab). And depending on the chat browser currently opened, can try entering this javascript:

// Example printing out Google Gemini Chat
var res = document.getElementsByTagName("extended-response-panel")[0];
var report = res.getElementsByTagName("message-content")[0];
var body = document.getElementsByTagName("body")[0];
let escapeHTMLPolicy = trustedTypes.createPolicy("escapeHTML", {
 createHTML: (string) => string
});
body.innerHTML = escapeHTMLPolicy.createHTML(report.innerHTML);
// Now you can go back to page, cannot scroll but
// Ctrl+P prints out nicely

Or this works for me when revisiting the page:

var report = document.getElementById("extended-response-message-content");
var body = document.getElementsByTagName("body")[0];
let escapeHTMLPolicy = trustedTypes.createPolicy("escapeHTML", {
 createHTML: (string) => string
});
body.innerHTML = escapeHTMLPolicy.createHTML(report.innerHTML);

This page scrolling does not work, but Ctrl + P to print the page does.

The idea behind this, I want to just get the report content, which ends up being hidden away in a mess of div tags, promoted out to the body of the page. This will likely break in the near future as well, but you just need to figure out the correct way to get the report content.

Here is an example of using Gemini’s Deep Research to help me make a practice study guide for my sons calculus course as an example.

Leave a comment

by Andy Wheeler on August 28, 2025 • Permalink

Posted in Criminal Justice, Papers, Personal Productivity, scholarly

Tagged LLM, open-science

Posted by Andy Wheeler on August 28, 2025

https://andrewpwheeler.com/2025/08/28/deep-research-and-open-access/

Some notes on project management

Have recently participated in several projects that I think went well at the day gig – these were big projects, multiple parties, and we came together and got to deployment on a shortened timeline. I think it is worth putting together my notes on why I think this went well.

My personal guiding star for project management is very simple – have a list of things to do, and try to do them as fast as reasonably possible. This is important, as anything that distracts from this ultimate goal is not good. Does not matter if it is agile ceremonies or waterfall excessive requirement gathering. In practice if you are too focused on the bureaucracy either of them can get in the way of the core goals – have a list of things to do and do them in a reasonable amount of time.

For a specific example, we used to have bi-weekly sprints for my team. We stopped doing them for these projects that have IMO gone well, as another group took over project management for the multiple groups. The PM just had an excel spreadsheet, with dates. I did not realize how much waste we had by forcing everything to a two week cycle. We spent too much time trying to plan two weeks out, and ultimately not filling up peoples plates. It is just so much easier to say “ok we want to do this in two days, and then this in the next three days” etc. And when shit happens just be like “ok we need to push X task by a week, as it is much harder than anticipated”, or “I finished Y earlier, but I think we should add a task Z we did not anticipate”.

If sprints make sense for your team go at it – they just did not for my team. They caused friction in a way that was totally unnecessary. Just have a list of things to do, and do them as fast as reasonably possible.

Everything Parallel

So this has avoided the hard part, what to put on the to-do list? Let me discuss another very important high level goal of project management first – you need to do everything in parallel as much as possible.

For a concrete example that continually comes up in my workplace, you have software engineering (writing code) vs software deployment (how that code gets distributed to the right people). I cannot speak to other places (places with a mono-repo/SaaS it probably looks different), but Gainwell is really like 20+ companies all Frankenstein-ed together through acquisitions and separate big state projects over time (and my data science group is pretty much solution architects for AI/ML projects across the org).

It is more work for everyone in this scenario trying to do both writing code and deployment at the same time. Software devs have to make up some reasonably scoped requirements (which will later change) for the DevOps folks to even get started. The DevOps folks may need to work on Docker images (which will later change). So it is more work to do it in parallel than it is sequential, but drastically reduces the overall deliverable timelines. So e.g. instead of 4 weeks + 4 weeks = 8 weeks to deliver, it is 6 weeks of combined effort.

This may seem like “duh Andy” – but I see it all the time people not planning out far enough though to think this through (which tends to look more like waterfall than agile). If you want to do things in months and not quarters, you need everyone working on things in parallel.

For another example at work, we had a product person want to do extensive requirements gathering before starting on the work. This again can happen in parallel. We have an idea, devs can get started on the core of it, and the product folks can work with the end users in the interim. Again more work, things will change, devs may waste 1 or 2 or 4 weeks building something that changes. Does not matter, you should not wait.

I could give examples of purely “write code as well”, e.g. I have one team member write certain parts of the code first, which are inconvenient, because that component not being finished is a blocker for another member of the team. Basically it is almost always worth working harder in the short term if it allows you to do things in parallel with multiple people/teams.

Sometimes the “in parallel” is when team members have slack, have them work on more proof of concept things that you think will be needed down the line. For the stuff I work on this can IMO be enjoyable, e.g. “you have some time, lets put a proof of concept together on using Codex + Agents to do some example work”. (Parallel is not quite the word for this, it is forseeing future needs.) But it is similar in nature, I am having someone work on something that will ultimately change in ways in the future that will result in wasted effort, but that is OK, as the head start on trying to do vague things is well worth it.

What things to put on the list

This is the hardest part – you need someone who understands front to back what the software solution will look like, how it interacts with the world around it (users, databases, input/output, etc.) to be able to translate that vision into a tangible list of things to-do.

I am not even sure if I can articulate how to do this in a general enough manner to even give useful advice. When I don’t know things front to back though I will tell you what, I often make mistakes going down paths that often waste months of work (which I think is sometimes inevitable, no one had the foresight to know it was a bad path until we got quite a ways down it).

I used to think we should do the extensive, months long, requirements gathering to avoid this. I know a few examples where I talked for months with the business owners, came up with a plan, and then later on realized it was based on some fundamental misunderstanding of the business. And the business team did not have enough understanding of the machine learning model to know it did not make sense.

I think mistakes like these are inevitable though, as requirements gathering is a two way street (it is not reasonable for any of the projects I work on to expect the people requesting things to put together a full, scoped out list). So just doing things and iterating is probably just as fast as waiting for a project to be fully scoped out.

Do them as fast as possible

So onto the second part, of “have a list of things to-do and do them as fast as possible”. One of the things with “fast as possible”, people will fill out their time. If you give someone two weeks to do something, most people will not do it faster, they will spend the full two weeks doing that task.

So you need someone technical saying “this should be done in two days”. One mistake I see teams make, is listing out projects that will take several weeks to-do. This is only OK for very senior people. Majority of devs tasks should be 1/2/3 days of work at max. So you need to take a big project and break it down into smaller components. This seems like micro-managing, but I do not know how else to do it and keep things on track. Being more specific is almost always worth my time as opposed to less specific.

Sometimes this even works at higher levels, one of the projects that went well, initial estimates were 6+ months. Our new Senior VP of our group said “nope, needs to be 2-3 months”. And guess what? We did it (he spent money on some external contractors to do some work, but by god we did it). Sometimes do them as fast as possible is a negotiation at the higher levels of the org – well you want something by end of Q3, well we can do A and C, but B will have to wait until later then, and the solution will be temporarily deployed as a desktop app instead of a fully served solution.

Again more work for our devs, but shorter timeline to help others have an even smaller MVP totally makes sense.

AI will not magically save you

Putting this last part in, as I had a few conversations recently about large code conversion projects teams wanted to know if they could just use AI to make short work of it. The answer is yes it makes sense to use AI to help with these tasks, but they expected somewhat of a magic bullet. They each still needed to make a functional CICD framework to test isolated code changes for example. They still needed someone to sit down and say “Joe and Gary and Melinda will work on this project, and have XYZ deliverables in two months”. A legacy system that was built over decades is not a weekend project to just let the machine go brr and churn out a new codebase.

Some of them honestly are groups that just do not want to bite the bullet and do the work. I see projects that are mismanaged (for the criminal justice folks that follow me, on-prem CAD software deployments should not take 12 months). They take that long because the team is mismanaged, mostly people saying “I will do this high level thing in 3 months when I get time”, instead of being like “I will do part A in the next two days and part B in the three days following that”. Or doing things sequentially that should be done in parallel.

To date, genAI has only impacted the software engineering practices of my team at the margins (potentially writing code slightly faster, but probably not). We are currently using genAI in various products though for different end users. (We have deployed many supervised learning models going back years, just more recently have expanded into using genAI for different tasks though in products.)

I do not foresee genAI taking devs jobs in the near future, as there is basically infinite amounts of stuff to work on (everything when you look closely is inefficient in a myriad of ways). Using the genAI tools to write code though looks very much like project management, identifying smaller and more manageable tasks for the machine to work on, then testing those, and moving onto the next steps.

1 Comment

by Andy Wheeler on July 20, 2025 • Permalink

Posted in data science, Personal Productivity

Tagged project-management, software-engineering

Posted by Andy Wheeler on July 20, 2025

https://andrewpwheeler.com/2025/07/20/some-notes-on-project-management/

LinkedIn is the best social media site

The end goals I want for a social media site are:

promote my work
see other peoples work

Social media for other people may have other uses. I do comment and have minor interactions on the social media sites, but I do not use them primarily for that. So my context is more business oriented (I do not have Facebook, and have not considered it). I participate some on Reddit as well, but that is pretty sparingly.

LinkedIn is the best for both relative to X and BlueSky currently. So I encourage folks with my same interests to migrate to LinkedIn.

So I started Crime De-Coder around 2 years ago. I first created a website, and then second started a LinkedIn page.

When I first created the business page, I invited most of my criminal justice contacts to follow the page. I had maybe 500 followers just based on that first wave of invites. At first I posted once or twice a week, and it was very steady growth, and grew to over 1500 followers in maybe just a month or two.

Now, LinkedIn has a reputation for more spammy lifecoach self promotion (for lack of a better description). I intentionally try to post somewhat technical material, but keep it brief and understandable. It is mostly things I am working on that I think will be of interest to crime analysts or the general academic community. Here is one of my recent posts on structured outputs:

Current follower count on LinkedIn for my business page (which in retrospect may have been a mistake, I think they promote business pages less than personal pages), is 3230, and I have fairly consistent growth of a few new followers per day.

I first started posting once a week, and with additional growth expanded to once every other day and at one point once a day. I have cut back recently (mostly just due to time). I did get more engagement, around 1000+ views per day when I was posting every day.

Probably the most important part though of advertising Crime De-Coder is the types of views I am getting. My followers are not just academic colleagues I was previously friends with, it is a decent outside my first degree network of police officers and other non-profit related folks. I have landed several contracts where I know those individuals reached out to me based on my LinkedIn posting. It could be higher, as my personal Crime De-Coder website ranks very poorly on Bing search, but my LinkedIn posts come up fairly high.

When I was first on Twitter I did have a few academic collaborations that I am not sure would have happened without it (a paper with Manne Gerell, and a paper with Gio Circo, although I had met Gio in real life before that). I do not remember getting any actual consulting work though.

I mentioned it is not only better for me for advertising my work, but also consuming other material. I did a quick experiment, just opened the home page and scrolled the first 3 non-advertisement posts on LinkedIn, X, and BlueSky. For LinkedIn

This is likely a person I do not want anything to do with, but their comment I agree with. Whenever I use Service Now at my day job I want to rage quit (just send a Teams chat or email and be done with it, LLMs can do smarter routing anymore). The next two are people are I am directly connected with. Some snark by Nick Selby (which I can understand the sentiment, albeit disagree with, I will not bother to comment though). And something posted by Mindy Duong I likely would be interested in:

Then another advert, and then a post by Chief Patterson of Raleigh, whom I am not directly connected with, but was liked by Tamara Herold and Jamie Vaske (whom I am connected with).

So annoying for the adverts, but the suggested (which the feeds are weird now, they are not chronological) are not bad. I would prefer if LinkedIn had a “general” and “my friends” sections, but overall I am happier with the content I see on LinkedIn than I am the other sites.

X & BlueSky

I first created a personal then Twitter account in 2018. Nadine Connell suggested it, and it was nice then. When I first joined I think it was Cory Haberman tweeted and said to follow my work, and I had a few hundred followers that first day. Then over the next two years, just posting blog posts and papers for the most part, I grew to over 1500 followers IIRC. I also consumed quite a bit of content from criminal justice colleagues. It was much more academic focused, but it was a very good source of recent research, CJ relevant news and content.

I then eventually deleted the Twitter account, due to a colleague being upset I liked a tweet. To be clear, the colleague was upset but it wasn’t a very big deal, I just did not want to deal with it.

I started a Crime De-Coder X account last year. I made an account to watch the Trump interview, and just decided to roll with it. I tried really hard to make X work – I posted daily, the same stuff I had been sharing on LinkedIn, just shorter form. After 4 months, I have 139 followers (again, when I joined Twitter in 2018 I had more than that on day 1). And some of those followers are porn accounts or bots. Majority of my posts get <=1 like and 0 reposts. It just hasn’t resulted in getting my work out there the same way in 2018 or on LinkedIn now.

So in terms of sharing work, the more recent X has been a bust. In terms of viewing other work, my X feed is dominated by short form video content (a mimic of TikTok) I don’t really care about. This is after extensively blocking/muting/saying I don’t like a lot of content. I promise I tried really hard to make X work.

So when I open up the Twitter home feed, it is two videos by Musk:

Then a thread by Per-Olof (whom I follow), and then another short video Death App joke:

So I thought this was satire, but clicking that fellows posts I think he may actually be involved in promoting that app. I don’t know, but I don’t want any part of it.

BlueSky I have not been on as long, but given how easy it was to get started on Twitter and X, I am not going to worry about posting so much. I have 43 followers, and posts similar to X have basically been zero interaction for the most part. The content feed is different than X, but is still not something I care that much about.

We have Jeff Asher and his football takes:

I am connected with Jeff on LinkedIn, in which he only posts his technical material. So if you want to hear Jeff’s takes on football and UT-Austin stuff then go ahead and follow him on BlueSky. Then we have a promotional post by a psychologist (this person I likely would be interested in following his work, this particular post though is not very interesting). And a not funny Onion like post?

Then Gavin Hales, whom I follow, and typically shares good content. And another post I leave with no comment.

My BlueSky feed is mostly dominated by folks in the UK currently. It could be good, but it currently just does not have the uptake to make it worth it like I had with Twitter in 2018. It may be the case given my different goals, to advertise my consulting business, Twitter in 2018 would not be good either though.

So for folks who subscribe to this blog, I highly suggest to give LinkedIn a try for your social media consumption and sharing.

1 Comment

by Andy Wheeler on February 20, 2025 • Permalink

Posted in data science, Personal Productivity, scholarly, social networking, writing

Tagged LinkedIn, social-media

Posted by Andy Wheeler on February 20, 2025

https://andrewpwheeler.com/2025/02/20/linkedin-is-the-best-social-media-site/

Search for:
Recent Posts
Categories
Categories
Site RSS Feeds
- RSS - Posts
- RSS - Comments
Follow Blog via Email

Enter your email address to follow this blog and receive notifications of new posts by email.

Email Address:

Join 392 other subscribers
aoristic big-data cartography census choropleth citeulike consulting cost-benefit courses crime-mapping crime-trends Crime Analysis Criminal Justice data-manipulation data visualization deep-learning ESRI excel flow-data folium geocoding github google-streetview-api grammar of graphics group-based-trajectory gun-violence healthcare homicide-rates hot spots hypothesis-testing linear programming LLM logistic-regression machine-learning MACRO mapping matplotlib meta network NetworkX officer-involved-shooting open-science paper Papers peer-review Poisson prediction Predictive-Policing preprint presentation Python Python-programability pytorch quasi-experiment r recidivism regression resources scholarly scraping seaborn shootings simulation small-multiples social-media social-networking SPSS stackexchange Stata statistics survey time-series uncertainty wdd web-scraping
Top Posts & Pages
Stack Exchange

All posts in category Personal Productivity

Should we care if writing is AI?

Post Views

Profile Views

$100 ad credit

Wrap Up

Copy Editing

Writing New Content

An Extra Note About Citations

Crime De-Coder Work

At the Day Gig

AI Predictions

Everything Parallel

What things to put on the list

Do them as fast as possible

AI will not magically save you

LinkedIn

X & BlueSky

Recent Posts

Categories

Site RSS Feeds

Follow Blog via Email

Top Posts & Pages

Stack Exchange