Pangram is good

Many of the initial wave of “AI writing detectors” were quite bad. The biggest issue you need to be concerned about with an AI writing detector is false positives. If you are a professor and want to check students’ writing, it is very bad to falsely accuse a student.

The Pangram product, though, is quite good, and I suggest folks check it out.

The other main competitor on the market, GPTZero, is clearly lower quality (such as saying the Constitution is AI generated).

GPTZero in their documentation says they are the most accurate AI detector. One of the reasons you don’t really care about accuracy is that you cannot know the underlying rate of AI writing in any corpus except in the scenario where it is artificially generated. And that is the only scenario in which you can know the accuracy for sure. What you care about is specifically the false positive rate and the false negative rate.

Unlike GPTZero, Pangram appears to have very low false positive rates. A simple way to estimate the false positive rate is to just submit writing prior to 2022 to the tool and see how many it flags as AI. ChatGPT came out in late 2022, the tools to generate writing before that were just not even close for people to use in any serious way. So any writing flagged in the older corpus as AI is a false positive.

Here is an example examining legal briefs.

It is an independent assessment. We cannot really know the capture rate (were there more than 66 briefs generated via LLMs in that sample). We can know the false positive rate though. And it is 1/800 in this sample with Pangram.

Pangram says it has a 1 in 10,000 false positive rate across a wide array of writing samples. They even report in their own internal tests that GPTZero has a 2% false positive rate (I am pretty sure GPTZero’s false positive rate is much higher than 2%, hence the Constitution error.)

Many other checks for false negative rates involve people having various models generate writing and then classifying it. It is hard to know if those are very good benchmarks for estimating the false negative rate. But we can easily estimate the false positive rate, and in that respect Pangram is clearly better than other AI writing detectors on the market.

Should we care if writing is AI?

I have used AI tools to help me write. I promise to be forthcoming if I use AI to help me write any substantive sections of writing (in blog posts, books, social media posts, etc.) Currently I am almost always using the LLMs to copy-edit, which is often simply a prompt “check for spelling and grammar issues”.

I do not use it all the time for writing. This post was all written by hand (and then just copy-edited with Gemini CLI).

It is really not that hard to bring your own voice and use AI to aid your writing. Have the LLM read your prior work, then give it a detailed outline, and then iterate. See my transcript on a prior post for an example.

I’d note I have used Pangram to see if my LLM writing is too obviously AI, and it is not. To me, when the writing is clearly AI, this often signals a clear lack of care and effort in the writing. AI writing can be valuable, but it is quite frequently low value slop.

So you get people larping as tech experts.

You can trivially have Claude or whatever software write a Skill file, and then have an LLM write how it is super awesome. This does not make it so.

And you have salespeople write posts that literally make no sense.

This, to be clear, is obviously AI slop.

So these individuals could actually generate useful content if they spent any more than a trivial amount of time. But they don’t, and it shows.

Leave a comment

Leave a comment