# Using the exact reference distribution for small sample Benford tests

I recently came across another potential application related to my testing small samples for randomness in day-of-week patterns. Testing digit frequencies for Benford’s law basically works on the same principles as my day-of-week bins. So here I will show an example in R.

First, download my functions here and save them to your local machine. The only library dependency for this code to work is the `partitions` library, so install that. Now for the code, you can source my functions and load the `partitions` library. The second part of the code shows how to generate the null distribution for Benford’s digits — the idea is that lower digits will have a higher probability of occurring.

``````#Example using small sample tests for Benfords law
library(partitions)
source("C:\\Users\\axw161530\\Dropbox\\PublicCode_Git\\ExactDist\\Exact_Dist.R")

f <- 1:9
p_fd <- log10(1 + (1/f)) #first digit probabilities``````

And so if you do `cbind(f,p_fd)` at the console it prints out:

``````      f       p_fd
[1,] 1 0.30103000
[2,] 2 0.17609126
[3,] 3 0.12493874
[4,] 4 0.09691001
[5,] 5 0.07918125
[6,] 6 0.06694679
[7,] 7 0.05799195
[8,] 8 0.05115252
[9,] 9 0.04575749``````

So we expect over 30% of the first digits to be 1’s, just under 18% to be 2’s, etc. To show how we can use this for small samples, I take an example of fraudulent checks from Mark Nigrini’s book, Digital Analysis using Benford’s law (I can’t find a google books link to this older one — it is the 2000 one published by Global Audit Press I took the numbers from).

``````#check data from Nigrini page 84
checks <- c(1927.48,
27902.31,
86241.90,
72117.46,
81321.75,
97473.96,
93249.11,
89658.17,
87776.89,
92105.83,
79949.16,
87602.93,
96879.27,
91806.47,
84991.67,
90831.83,
93766.67,
88338.72,
94639.49,
83709.28,
96412.21,
88432.86,
71552.16)

#extracting the first digits
fd <- substr(format(checks,trim=TRUE),1,1)
tot <- table(factor(fd, levels=paste(f)))``````

Now Nigrini says this is too small of a sample to perform actual statistical tests, so he just looks at it at face value. If you print out the `tot` object you will see that we have mostly upper values in the series, three 7’s, nine 8’s, and nine 9’s.

``````> tot

1 2 3 4 5 6 7 8 9
1 1 0 0 0 0 3 9 9 ``````

Now, my work on small samples for day-of-week crime sprees showed that given reasonably expected offender behavior, you only needed as few as 8 crimes to have pretty good power to test for randomness in the series. So given that I would expect the check series of 23 is not totally impossible to detect significant deviations from the null Benford’s distribution. But first we need to figure out if I can actually generate the exact distribution for 23 digits in 9 bins in memory.

``````m <- length(tot)
n <- sum(tot)
choose(m+n-1,m-1)``````

Which prints out just under 8 million, ` 7888725`. So we should be able to hold that in memory.

So now comes the actual test, and as my comment says this takes a few minutes for R to figure out — so feel free to go get a coffee.

``````#Takes a few minutes
resG <- SmallSampTest(d=tot,p=p_fd,type="G")
resG``````

Here I use the likelihood ratio G test instead of the more usual Chi-Square test, as I found that always had more power in the day-of-the-week paper. From our print out of `resG` we subsequently get:

``````> resG
Small Sample Test Object
Test Type is G
Statistic is 73.4505062784174
p-value is:  2.319579e-14
Data are:  1 1 0 0 0 0 3 9 9
Null probabilities are:  0.3 0.18 0.12 0.097 0.079 0.067 0.058 0.051 0.046
Total permutations are:  7888725 ``````

So since the p-value is incredibly small, we would reject the null that the first digit distribution of these checks follows Benford’s law. So on its face we can reject the null with this dataset, but it would take more investigation in general to give recommendations of how many observations in practice you would need before you can reasonably even use the small sample distribution. I have code that allows you to test the power given an alternative distribution in those functions, so for a quick and quite conservative test I see the power if the alternative distribution were uniform instead of Benford’s with our 23 observations. The idea is if people make up numbers uniformly instead of according to Benford’s law, what is the probability we would catch them with 23 observations. I label this as conservative, because I doubt people even do a good job of making them uniform — most number fudging cases will be much more egregious I imagine.

``````#power under null of equal probability
p_alt <- rep(1/9,9)
PowUni <- PowAlt(SST=resG,p_alt=p_alt) #again takes a few minutes
PowUni``````

So based on that we get a power estimate of:

``````> PowUni
Power for Small Sample Test
Test statistic is: G
Power is: 0.5276224
Null is: 0.3 0.18 0.12 0.097 0.079 0.067 0.058 0.051 0.046
Alt is: 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11
Alpha is: 0.05
Number of Bins: 9
Number of Observations: 23 ``````

So the power is only 0.5 in this example. Since you want to aim for power of ~0.80 or higher, this shows that you are not likely to uncover more subtle patterns of manipulation with this few of observations. The power would go up for more realistic deviations though — so I don’t think this idea is totally dead in the water.

If you are like me and find it annoying to wait for a few minutes to get the results, a quick and dirty way to make the test go faster is to collapse bins that have zero observations. So our first digits have zero observations in the 3,4,5 and 6 bins. So I collapse those and do the test with only 6 bins, which makes the results return in around ~10 seconds instead of a few minutes. Note that collapsing bins is better suited for the G or the Chi-Square test, because the KS test or Kuiper’s test the order of the observations matter.

``````#Smaller subset
UpProb <- c(p_fd[c(1,2,7,8,9)],sum(p_fd[c(3,4,5,6)]))

resSmall``````

When you do this for the G and the Chi-square test you will get the same test statistics, but in this example the p-value is larger (but is still quite small).

``````> resSmall
Small Sample Test Object
Test Type is G
Statistic is 73.4505062784174
p-value is:  1.527991e-15
Data are:  1 1 3 9 9 0
Null probabilities are:  0.3 0.18 0.058 0.051 0.046 0.37
Total permutations are:  98280  ``````

I can’t say for sure the behavior of the tests when collapsing categories, but I think it is reasonable offhand, especially if you have some substantive reason to collapse them before looking at the data.

In practice, they way I expect this would work is not just for testing one individual, but as a way to prioritize audits of many individuals. Say you had a large company, and you wanted to check the invoices for 1,000’s of managers, but each manager may only have 20 some invoices. You would compute this test for each manager then, and then subsequently rank them by the p-values (or do some correction like the false-discovery-rate) for further scrutiny. That takes a bit more work to code that up efficiently than what I have here though. Like I may pre-compute the exact CDF’s for each test statistic, aggregate them alittle so they fit in memory, and then check the test against the relevant CDF.

But feel free to bug me if you want to use this idea in your own work and want some help implementing it.

For some additional examples, here is some code to get the second digit expected probabilities:

``````#second digit probabilities
s <- 0:9
x <- expand.grid(f*10,s)
x\$end <- log10(1 + (1/(x[,1]+x[,2])))
p_sd <- aggregate(end ~ Var2, data=x, sum)
p_sd``````

which are expected to be much more uniform than the first digits:

``````> p_sd
Var2        end
1     0 0.11967927
2     1 0.11389010
3     2 0.10882150
4     3 0.10432956
5     4 0.10030820
6     5 0.09667724
7     6 0.09337474
8     7 0.09035199
9     8 0.08757005
10    9 0.08499735``````

And we can subsequently also test the check sample for deviation from Benford’s law in the second digits. Here I show an example of using the exact distribution for the Kolmogorov-Smirnov test. (There may be reasonable arguments for using Kuiper’s test with digits as well, but for both the KS and the Kuiper’s V you need to make sure the bins are in the correct order to conduct those tests.) To speed up the computation I only test the first 18 checks.

``````#second digits test for sample checks, but with smaller subset
sd <- substr(format(checks[1:18],trim=TRUE),2,2)
tot_sd <- table(factor(sd, levels=paste(s)))
resK_2 <- SmallSampTest(d=tot_sd,p=p_sd,type="KS")
resK_2``````

And the test results are:

``````> resK_2
Small Sample Test Object
Test Type is KS
Statistic is 0.222222222222222
p-value is:  0.7603276
Data are:  1 2 2 2 1 0 2 4 1 3
Null probabilities are:  0.12 0.11 0.11 0.1 0.1 0.097 0.093 0.09 0.088 0.085
Total permutations are:  4686825 ``````

So for the second digits of our checks we would fail to reject the null that the data follow Benford’s distribution. To test the full 23 checks would generate over 28 million permutations – will update based on how long that takes.

The final example I will give is with another small example dataset — the last 12 purchases on my credit card.

``````#My last 12 purchases on credit card
purch <- c( 72.00,
328.36,
11.57,
90.80,
21.47,
7.31,
9.99,
2.78,
10.17,
2.96,
27.92,
14.49)
#artificial numbers, 72.00 is parking at DFW, 9.99 is Netflix   ``````

In reality, digits can deviate from Benford’s law for reasons that do not have to do with fraud — such as artificial constraints to the system. But, when I test this set of values, I fail to reject the null that they follow Benford’s distribution,

``````fdP <- substr(format(purch,trim=TRUE),1,1)
totP <- table(factor(fdP, levels=paste(f)))

resG_P <- SmallSampTest(d=totP,p=p_fd,type="G")
resG_P``````

So for a quick test the first digits of my credit card purchases do approximately follow Benford’s law.

``````> resG_P
Small Sample Test Object
Test Type is G
Statistic is 12.5740089945434
p-value is:  0.1469451
Data are:  3 4 1 0 0 0 2 0 2
Null probabilities are:  0.3 0.18 0.12 0.097 0.079 0.067 0.058 0.051 0.046
Total permutations are:  125970  ``````

1. #### Borja Esteve Altava

/  July 12, 2019

Hi Andrew. Would be possible to use Benford’s law tests to check if there is cherry picking rather than data forge? Thanks for your work, great post!

• #### apwheele

/  July 12, 2019

It is possible. It can be applied to whatever data, you just need to be reasonably confident in the null distribution of digits. (I can’t give much better advice than that without more specifics!)

• #### Borja Esteve Altava

/  July 12, 2019

Thanks, I am looking for irregularities regarding this very small-size sequence of numbers: c(105, 771, 209, 766, 363, 531, 7000, 71, 109). The p-value for the 1st digit is <0.05 but not much, and the power is not enough to be very confident neither. Not sure it can be solved.

• #### apwheele

/  July 12, 2019

The p-value is what it is — it really hinges on whether the null hypothesis is reasonable (which cannot be verified using the same data — it has to be verified some other way).

The power depends on the exact nature of how folks fudge the numbers. Say with your cherry picking example it results in a higher proportion of larger first digits. So just as a guess something like:

#########################
#power under null, 7/8/9 have higher %
p_alt <- c(1,1,1,1,1,1,2,2,2)
p_alt <- p_alt/sum(p_alt)
PowUni <- PowAlt(SST=resG,p_alt=p_alt)
PowUni
########################

With your set of numbers that is a power of almost 0.6 — not bad for only 9 numbers! You could probably get a better null if you posit a particular way in which individuals fudge numbers — cherry picking just suggests a particular filter.

Power is not so much about saying whether one specific test is reasonable. It is about defining more generally if I do this procedure a bunch in the future, how can I set it up to reasonably detect deviations in any one case.

• #### Borja Esteve Altava

/  July 12, 2019

Thank you, I will give it more thought. Your reply was very useful!

2. #### Tom Berger

/  December 11, 2020

Hi Andrew, this was a very interesting post. I only just found your blog so I’m a bit late to the party, but I wanted to report my results on the same cheque sequence you used above. I used a clever approach to Benfords law from Jetson Luis-Leder where they simultaneously analyse multiple digit places to provide increased statistical power.
https://jee.caltech.edu/documents/1255/Ensminger_and_Leder-Luis.pdf
Simultaneously testing the first 6 digits and then using a two way chi square on this I got a p value of 1.68e-17, not that far away from your chi square p-value : 1.527991e-15
I thought you might find this interesting and read their paper which correlates their digit tests to a forensic audit of the World Bank. Regards, Tom Berger

• #### apwheele

/  December 11, 2020

Thank you for the example Tom! Yes it can be extended to different digits. Part of doing it that way though is a single transaction above is not independent, so not 100% sure how it will work out in practice like that. It is an interesting wrinkle though to suggest!

3. #### Tom Berger

/  December 11, 2020

Thanks for your comment Andrew. Tell me, wouldn’t the inter-dependencies be dealt with using the sumultaneous analyses if all the digit places have a probs for each digit value? Let me quote their actual procedure to make it clear, let me know what you think– “For each digit place
(1st digit, 2nd digit, etc), the frequency of each digit (1, 2, 3, 4, 6, 7, 8, 9) is compared with the expected frequencies given in Table 1. This is in contrast to existing studies, which analyze a single digit place with a single chi square test. Because the Benford distribution gives different frequencies by digit place, the two-way chi square test is the appropriate test rather than testing individual digit places. Furthermore, it corrects for multiple hypothesis testing issues that arise from individual digit place analysis.” What do you think? Cheers, Tom

• #### apwheele

/  December 11, 2020

What I was thinking is that say you have a single transaction with 4 digits, that is not really 4 different pieces of information as stated above. I imagine most fraud won’t be fudging a single digit, but fudging an entire transaction.

Just thinking out loud though, it may be that testing all 4 digits in the above scenario makes sense, since fudging should come out in all the digits, not just the lead.