Reloading classes in python and shared borders

For some housekeeping, if you are not signed up, also make sure to sign up for the RSS feed of my crime de-coder blog. I have not been cross posting here consistently. For the last few posts:

For ASEBP, conference submissions for 2026 are open. (I will actually be going to this in 2026, submitted a 15 minute talk on planning experiments.)

Today will just be a quick post on two pieces of code I thought might be useful to share. The first is useful for humans, when testing code in functions, you can use the importlib library to reload functions. That is, imagine you have code:

import crimepy as cpy

test = cpy.func1(....)

And then when you run this, you see that func1 has an error. You can edit the source code, and then run:

from importlib import reload

reload(cpy)
test = cpy.func1(....)

This is my preferred approach to testing code. Note you need to import a library and reload the library. Not from crimepy import *, this will not work unfortunately.

Recently was testing out code that took quite a while to run, my Patrol Districting. This is a class, and I was editing my methods to make maps. The code itself takes around 5 minutes to run it through when remaking the entire class. What I found was a simpler approach, I can dump out the file to pickle, then reload the library, then load the pickle object. The may the pickle module works, it pulls the method definition from the global environment (pickle just saves the dict items under the hood). So the code looked like this:

from crimepy import pmed
...
pmed12 = pmed.pmed(...)
pmed12.map_plot() # causes error

And then instead of using the reload method as is (which would require me to create an entirely new object), use this approach for de-bugging:

# save file
import pickle
with open('pmed12.pkl', 'wb') as file:
    # Dump data with highest protocol for best performance
    pickle.dump(pmed12, file)

from importlib import reload
# edit method
reload(pmed)

# reload the object
with open('pmed12.pkl', 'rb') as file:
    # Load the pickled data
    pmed12_new = pickle.load(file)

# retest the method
pmed12_new.map_plot()

Writing code itself is often not the bottleneck – testing is. So figuring out ways to iterate testing faster is often worth the effort (I might have saved a day or two of work if I did this approach sooner when debugging that code).

The second code snippet is useful for the machines; I have been having Claude help me write quite a bit of the crimepy work. Here was one though it was having trouble with – calculating the shared border length between two polygons. Basically it went down an overly complicated path to get the exact calculation, whereas here I have an approximation using tiny buffers that works just fine and is much simpler.

def intersection_length(poly1,poly2,smb=1e-15):
    '''
    Length of the intersection between two shapely polygons
    
    poly1 - shapely polygon
    poly2 - shapely polygon
    smb - float, defaul 1e-15, small distance to buffer
    
    The way this works, I compute a very small buffer for
    whatever polygon is simpler (based on length)
    then take the intersection and divide by 2
    so not exact, but close enough for this work
    '''
    # buffer the less complicated edge of the two
    if poly1.length > poly2.length:
        p2, p1 = poly1, poly2
    else:
        p1, p2 = poly1, poly2
    # This basically returns a very skinny polygon
    pb = p1.buffer(smb,cap_style='flat').intersection(p2)
    if pb.is_empty:
        return 0.0
    elif hasattr(pb, 'length')
        return (pb.length-2*smb)/2
    else:
        return 0.0

And then for some tests:

from shapely.geometry import Polygon

poly1 = Polygon([(0, 0), (4, 0), (4, 3), (0, 3)])  # Rectangle
poly2 = Polygon([(2, 1), (6, 1), (6, 4), (2, 4)])  # Overlapping rectangle

intersection_length(poly1,poly2) # should be close to 0

poly3 = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
poly4 = Polygon([(2, 0), (4, 0), (4, 2), (2, 2)])

intersection_length(poly3,poly4) # should be close to 2

poly5 = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
poly6 = Polygon([(2, 0), (4, 0), (4, 3), (1, 3), (1, 2), (2, 2)])

intersection_length(poly5,poly6) # should be close to 3

poly7 = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
poly8 = Polygon([(3, 0), (5, 0), (5, 2), (3, 2)])

intersection_length(poly7,poly8) # should be 0

Real GIS data often has imperfections (polygons that do not perfectly line up). So using the buffer method (and having an option to increase the buffer size) can often help smooth out those issues. It will not be exact, but the inexactness we are talking about will often be to well past the 10th decimal place.

The difference between models, drive-time vs fatality edition

Easily one of the most common critiques I make when reviewing peer reviewed papers is the concept, the difference between statistically significant and not statistically significant is not itself statistically significant (Gelman & Stern, 2006).

If you cannot parse that sentence, the idea is simple to illustrate. Imagine you have two models:

Model     Coef  (SE)  p-value
  A        0.5  0.2     0.01
  B        0.3  0.2     0.13

So often social scientists will say “well, the effect in model B is different” and then post-hoc make up some reason why the effect in Model B is different than Model A. This is a waste of time, as comparing the effects directly, they are quite similar. We have an estimate of their difference (assuming 0 covariance between the effects), as

Effect difference = 0.5 - 0.3 = 0.2
SE of effect difference = sqrt(0.2^2 + 0.2^2) = 0.28

So when you compare the models directly (which is probably what you want to do when you are describing comparisons between your work and prior work), this is a bit of a nothing burger. It does not matter that Model B is not statistically significant, a coefficient of 0.3 is totally consistent with the prior work given the standard errors of both models.

Reminded again about this concept, as Arredondo et al. (2025) do a replication of my paper with Gio on drive time fatalities and driving distance (Circo & Wheeler, 2021). They find that distance (whether Euclidean or drive time) is not statistically significant in their models. Here is the abstract:

Gunshot fatality rates vary considerably between cities with Baltimore, Maryland experiencing the highest rate in the U.S.. Previous research suggests that proximity to trauma care influences such survival rates. Using binomial logistic regression models, we assessed whether proximity to trauma centers impacted the survivability of gunshot wound victims in Baltimore for the years 2015-2019, considering three types of distance measurements: Euclidean, driving distance, and driving time. Distance to a hospital was not found to be statistically associated with survivability, regardless of measure. These results reinforce previous findings on Baltimore’s anomalous gunshot survivability and indicate broader social forces’ influence on outcomes.

This ends up being a clear example of the error I describe above. To make it simple, here is a comparison between their effects and the effects in my and Gio’s paper (in the format Coef (SE)):

Paper     Euclid          Network       Drive Time
Philly     0.042 (0.021)  0.030 (0.016)    0.022 (0.010)
Baltimore  0.034 (0.022)  0.032 (0.020)    0.013 (0.006)

At least for these coefficients, there is literally nothing anomalous at all compared to the work me and Gio did in Philadelphia.

To translate these coefficients to something meaningful, Gio and I estimate marginal effects – basically a reduction of 2 minutes results in a decrease of 1 percentage point in the probability of death. So if you compare someone who is shot 10 minutes from the hospital and has a 20% chance of death, if you could wave a wand and get them to the ER 2 minutes faster, we would guess their probability of death goes down to 19%. Tiny, but over many such cases makes a difference.

I went through some power analysis simulations in the past for a paper comparing longer drive time distances as well (Sierra-Arévalo et al. 2022). So the (very minor) differences could also be due to omitted variable bias (in logit models, even if not confounded with the other X, can bias towards 0). The Baltimore paper does not include where a person was shot, which was easily the most important factor in my research for the Philly work.

To wrap up – we as researchers cannot really change broader social forces (nor can we likely change the location of level 1 emergency rooms). What we can change however are different methods to get gun shot victims to the ER faster. These include things like scoop-and-run (Winter et al., 2022), or even gun shot detection tech to get people to scenes faster (Piza et al., 2023).

References

Some notes on project management

Have recently participated in several projects that I think went well at the day gig – these were big projects, multiple parties, and we came together and got to deployment on a shortened timeline. I think it is worth putting together my notes on why I think this went well.

My personal guiding star for project management is very simple – have a list of things to do, and try to do them as fast as reasonably possible. This is important, as anything that distracts from this ultimate goal is not good. Does not matter if it is agile ceremonies or waterfall excessive requirement gathering. In practice if you are too focused on the bureaucracy either of them can get in the way of the core goals – have a list of things to do and do them in a reasonable amount of time.

For a specific example, we used to have bi-weekly sprints for my team. We stopped doing them for these projects that have IMO gone well, as another group took over project management for the multiple groups. The PM just had an excel spreadsheet, with dates. I did not realize how much waste we had by forcing everything to a two week cycle. We spent too much time trying to plan two weeks out, and ultimately not filling up peoples plates. It is just so much easier to say “ok we want to do this in two days, and then this in the next three days” etc. And when shit happens just be like “ok we need to push X task by a week, as it is much harder than anticipated”, or “I finished Y earlier, but I think we should add a task Z we did not anticipate”.

If sprints make sense for your team go at it – they just did not for my team. They caused friction in a way that was totally unnecessary. Just have a list of things to do, and do them as fast as reasonably possible.

Everything Parallel

So this has avoided the hard part, what to put on the to-do list? Let me discuss another very important high level goal of project management first – you need to do everything in parallel as much as possible.

For a concrete example that continually comes up in my workplace, you have software engineering (writing code) vs software deployment (how that code gets distributed to the right people). I cannot speak to other places (places with a mono-repo/SaaS it probably looks different), but Gainwell is really like 20+ companies all Frankenstein-ed together through acquisitions and separate big state projects over time (and my data science group is pretty much solution architects for AI/ML projects across the org).

It is more work for everyone in this scenario trying to do both writing code and deployment at the same time. Software devs have to make up some reasonably scoped requirements (which will later change) for the DevOps folks to even get started. The DevOps folks may need to work on Docker images (which will later change). So it is more work to do it in parallel than it is sequential, but drastically reduces the overall deliverable timelines. So e.g. instead of 4 weeks + 4 weeks = 8 weeks to deliver, it is 6 weeks of combined effort.

This may seem like “duh Andy” – but I see it all the time people not planning out far enough though to think this through (which tends to look more like waterfall than agile). If you want to do things in months and not quarters, you need everyone working on things in parallel.

For another example at work, we had a product person want to do extensive requirements gathering before starting on the work. This again can happen in parallel. We have an idea, devs can get started on the core of it, and the product folks can work with the end users in the interim. Again more work, things will change, devs may waste 1 or 2 or 4 weeks building something that changes. Does not matter, you should not wait.

I could give examples of purely “write code as well”, e.g. I have one team member write certain parts of the code first, which are inconvenient, because that component not being finished is a blocker for another member of the team. Basically it is almost always worth working harder in the short term if it allows you to do things in parallel with multiple people/teams.

Sometimes the “in parallel” is when team members have slack, have them work on more proof of concept things that you think will be needed down the line. For the stuff I work on this can IMO be enjoyable, e.g. “you have some time, lets put a proof of concept together on using Codex + Agents to do some example work”. (Parallel is not quite the word for this, it is forseeing future needs.) But it is similar in nature, I am having someone work on something that will ultimately change in ways in the future that will result in wasted effort, but that is OK, as the head start on trying to do vague things is well worth it.

What things to put on the list

This is the hardest part – you need someone who understands front to back what the software solution will look like, how it interacts with the world around it (users, databases, input/output, etc.) to be able to translate that vision into a tangible list of things to-do.

I am not even sure if I can articulate how to do this in a general enough manner to even give useful advice. When I don’t know things front to back though I will tell you what, I often make mistakes going down paths that often waste months of work (which I think is sometimes inevitable, no one had the foresight to know it was a bad path until we got quite a ways down it).

I used to think we should do the extensive, months long, requirements gathering to avoid this. I know a few examples where I talked for months with the business owners, came up with a plan, and then later on realized it was based on some fundamental misunderstanding of the business. And the business team did not have enough understanding of the machine learning model to know it did not make sense.

I think mistakes like these are inevitable though, as requirements gathering is a two way street (it is not reasonable for any of the projects I work on to expect the people requesting things to put together a full, scoped out list). So just doing things and iterating is probably just as fast as waiting for a project to be fully scoped out.

Do them as fast as possible

So onto the second part, of “have a list of things to-do and do them as fast as possible”. One of the things with “fast as possible”, people will fill out their time. If you give someone two weeks to do something, most people will not do it faster, they will spend the full two weeks doing that task.

So you need someone technical saying “this should be done in two days”. One mistake I see teams make, is listing out projects that will take several weeks to-do. This is only OK for very senior people. Majority of devs tasks should be 1/2/3 days of work at max. So you need to take a big project and break it down into smaller components. This seems like micro-managing, but I do not know how else to do it and keep things on track. Being more specific is almost always worth my time as opposed to less specific.

Sometimes this even works at higher levels, one of the projects that went well, initial estimates were 6+ months. Our new Senior VP of our group said “nope, needs to be 2-3 months”. And guess what? We did it (he spent money on some external contractors to do some work, but by god we did it). Sometimes do them as fast as possible is a negotiation at the higher levels of the org – well you want something by end of Q3, well we can do A and C, but B will have to wait until later then, and the solution will be temporarily deployed as a desktop app instead of a fully served solution.

Again more work for our devs, but shorter timeline to help others have an even smaller MVP totally makes sense.

AI will not magically save you

Putting this last part in, as I had a few conversations recently about large code conversion projects teams wanted to know if they could just use AI to make short work of it. The answer is yes it makes sense to use AI to help with these tasks, but they expected somewhat of a magic bullet. They each still needed to make a functional CICD framework to test isolated code changes for example. They still needed someone to sit down and say “Joe and Gary and Melinda will work on this project, and have XYZ deliverables in two months”. A legacy system that was built over decades is not a weekend project to just let the machine go brr and churn out a new codebase.

Some of them honestly are groups that just do not want to bite the bullet and do the work. I see projects that are mismanaged (for the criminal justice folks that follow me, on-prem CAD software deployments should not take 12 months). They take that long because the team is mismanaged, mostly people saying “I will do this high level thing in 3 months when I get time”, instead of being like “I will do part A in the next two days and part B in the three days following that”. Or doing things sequentially that should be done in parallel.

To date, genAI has only impacted the software engineering practices of my team at the margins (potentially writing code slightly faster, but probably not). We are currently using genAI in various products though for different end users. (We have deployed many supervised learning models going back years, just more recently have expanded into using genAI for different tasks though in products.)

I do not foresee genAI taking devs jobs in the near future, as there is basically infinite amounts of stuff to work on (everything when you look closely is inefficient in a myriad of ways). Using the genAI tools to write code though looks very much like project management, identifying smaller and more manageable tasks for the machine to work on, then testing those, and moving onto the next steps.

Using DuckDB WASM + Cloudflare R2 to host and query big data (for almost free)

The motivation here, prompted by a recent question Abigail Haddad had on LinkedIn:

For the machines, the context is hosting a dataset of 150 million rows (in another post Abigail stated it was around 72 gigs). And you want the public to be able to make ad-hoc queries on that data. Examples where you may want to do this are public dashboards (think a cities open data site, just puts all the data on R2 and has a front end).

This is the point where traditional SQL databases for websites probably don’t make sense. Databases like Supabase Postgres or MySQL can have that much data, given the cost of cloud computing though and what they are typically used for, it does not make much sense to put 72 gigs and use them for data analysis type queries.

Hosting the data as static files though in an online bucket, like Cloudflare’s R2, and then querying the data makes more sense for that size. Here to query the data, I also use a WASM deployed DuckDB. What this means is I don’t really have to worry about a server at all – it should scale to however many people want to use the service (I am just serving up HTML). The client’s machine handles creating the query and displaying the resulting data via javascript, and Cloudflare basically just pushes data around.

If you want to see it in action, you can check out the github repo, or see the demo deployed on github pages to illustrate generating queries. To check out a query on my Cloudflare R2 bucket, you can run SELECT * FROM 'https://data-crimedecoder.com/books.parquet' LIMIT 10;:

Cloudflare is nice here, since there are no egress charges (big data you need to worry about that). You do get charged for different read/write operations, but the free tiers seem quite generous (I do not know quite how to map these queries to Class B operations in Cloudflare’s parlance, but you get 10 million per month and all my tests only generated a few thousand).

For some notes on this set-up. On Cloudflare, to be able to use DuckDB WASM, I needed to expose the R2 bucket via a custom domain. Using the development url did not work (same issue as here). I also set my CORS Policy to:

[
  {
    "AllowedOrigins": [
      "*"
    ],
    "AllowedMethods": [
      "GET",
      "HEAD"
    ],
    "AllowedHeaders": [
      "*"
    ],
    "ExposeHeaders": [],
    "MaxAgeSeconds": 3000
  }
]

While my Crime De-Coder site is PHP, all the good stuff happens client-side. So you can see some example demo’s of the GSU book prices data.

One of the annoying things about this though, with S3 you can partition the files and query multiple partitions at once. Here something like SELECT * FROM read_parquet('https://data-crimedecoder.com/parquet/Semester=*/*') LIMIT 10; does not work. You can union the partitions together manually. So not sure if there is a way to set up R2 to work the same way as the S3 example (set up a FTP server? let me know in the comments!).

For pricing, for the scenario Abigail had of 72 gigs of data, we then have:

  • $10 per year for the domain
  • 0.015*72*12 = $13 for storage of the 72 gigs

So we have a total cost to run this of $23 per year. And it can scale to a crazy number of users and very large datasets out of the box. (My usecase here is just $10 for the domain, you get 10 gigs for free.)

Since this can be deployed on a static site, there are free options (like github pages). So the page with the SQL query part is essentially free. (I am not sure if there is a way to double dip on the R2 custom domain, such as just putting the HTML in the bucket. Yes, you can just put the HTML in the bucket and it will render like normal.)

While this example only shows generating a table, you can do whatever additional graphics client side. So could make a normal looking dashboard with dropdowns, and those just execute various queries and fill in the graphs/tables.

Build Stuff

I have had this thought in my head for a while – criminology research to me is almost all boring. Most of the recent advancement in academia is focused on making science more rigorous – more open methods, more experiments, stronger quasi-experimental designs. These are all good things, but to me still do not fundamentally change the practical implementation of our work.

Criminology research is myopically focused on learning something – I think this should be flipped, and the emphasis be on doing something. We should be building things to improve the crime and justice system.

How criminology research typically goes

Here is a screenshot of the recent articles published in the Journal of Quantitative Criminology. I think this is a pretty good cross-section of high-quality, well-respected research in criminology.

Three of the four articles are clearly ex-ante evaluations of different (pretty normal) policies/behavior by police and their subsequent downstream effects on crime and safety. They are all good papers, and knowing how effective a particular policy works (like stop and frisk, or firearm seizures) are good! But they are the literal example where the term ivory tower comes from – these are things happening in the world, and academics passively observe and say how well they are working. None of the academics in those papers were directly involved in any boots on the ground application – they were things normal operations the police agencies in question were doing on their own.

Imagine someone said “I want to improve the criminal justice system”, and then “to accomplish this, I am going to passively observe what other people do, and tell them if it is effective or not”. This is almost 100% of what academics in criminology do.

The article on illicit supply chains is another one that bothers me – it is sneaky in the respect that many academics would say “ooh that is interesting and should be helpful” given its novelty. I challenge anyone to give a concrete example of how the findings in the article can be directly useful in any law enforcement context. Not hypothetical, “can be useful in targeting someone for investigation”, like literal “this specific group can do specific X to accomplish specific Y”. We have plenty of real problems with illicit supply chains – drug smuggling in and out of the US (recommend the contraband show on Amazon, who knew many manufactures smuggle weed from US out to the UK!). Fentanyl or methamphetamine production from base materials. Retail theft groups and selling online. Plenty of real problems.

Criminology articles tend to be littered with absurdly vague accusations that they can help operations. They almost always cannot.

So we have articles that are passive evaluations of policies other people thought up. I agree this is good, but who exactly comes up with the new stuff to try out? We just have to wait around and hope other people have good ideas and take the time to try them out. And then we have theoretical articles larping as useful in practice (since other academics are the ones reviewing the papers, and no one says “erm, that is nice but makes no sense for practical day to day usage”).

Some may say this is the way science is supposed to work. My response to that is I don’t know dude, go and look at what folks are doing in the engineering or computer science or biology department. They seem to manage both theoretical and practical advancements at the same time just fine and dandy.

Well what have you built Andy?

It is a fair critique if you say “most of your work is boring Andy”. Most of my work is the same “see how a policy works from the ivory tower”, but a few are more “build stuff”. Examples of those include:

In the above examples, the one that I know has gotten the most traction are simple rules to identify crime spikes. I know because I have spent time demonstrating that work to various crime analysts across the country, and so many have told me “I use your Poisson Z-score Andy”. (A few have used the patrol area work as well, so I should be in the negative for carbon generation.)

Papers are not what matter though – papers are a distraction. The applications are what matter. The biggest waste currently in academic criminology work is peer reviewed papers. Our priorities as academics are totally backwards. We are evaluated on whether we get a paper published, we should be evaluated on whether we make the world a better place. Papers by themselves do not make the world a better place.

Instead of writing about things other people are doing and whether they work, we should spend more of our time trying to create things that improve the criminal justice system.

Some traditional academics may not agree with this – science is about formulating and testing hypotheses. This need not be in conflict with doing stuff. Have a theory about human nature, what better way to prove the theory than building something to attempt to change things for the better according to your theory. If it works in real life to accomplish things people care about guess what – other people will want to do it. You may even be able to sell it.

Examples of innovations I am excited about

Part of what prompted this was I was talking to a friend, and basically none of the things we were excited about have come from academic criminologists. I think a good exemplar of what I mean here is Anthony Tassone, the head of Truleo. To be clear, this is not a dig but a compliment, following some of Anthony’s posts on social media (LinkedIn, X), he is not a Rhodes Scholar. He is just some dude, building stuff for criminal justice agencies mostly using the recent advancements in LLMs.

For a few other examples of products I am excited about how they can improve criminal justice (I have no affiliations with these beyond I talk to people). Polis for evaluating body worn camera feeds. Dan Tatenko for CaseX is building an automated online crime reporting system that is much simpler to use. The folks at Carbyne (for 911 calls) are also doing some cool stuff. Matt White at Multitude Insights is building a SaaS app to better distribute BOLOs.

The folks at Polis (Brian Lande and Jon Wender) are the only two people in this list that have anything remotely to do with academic criminology. They each have PhDs (Brian in sociology and Jon in criminology). Although they were not tenure track professors, they are former/current police officers with PhDs. Dan at CaseX was a detective not that long ago. The folks at Carbyne I believe are have tech backgrounds. Matt has a military background, but pursued his start up after doing an MBA.

The reason I bring up Anthony Tassone is because when we as criminologists say we are going to passively evaluate what other people are doing, we are saying “we will just let tech people like Anthony make decisions on what real practitioners of criminal justice pursue”. Again not a dig on Anthony – it is a good thing for people to build cool stuff and see if there is a market. My point is that if Anthony can do it, why not academic criminologists?

Rick Smith at Axon is another example. While Axon really got its dominate market due to conducted energy devices and then body worn cameras (so hardware), quite a bit of the current innovation at Axon is software. And Rick did not have a background in hardware engineering either, he just had an idea and built it.

Transferring over into professional software engineering since 2020, let me tell my fellow academics, you too can write software. It is more about having a good idea that actually impacts practice.

Where to next?

Since the day gig (working on fraud-waste-abuse in Medicaid claims) pays the bills, most of my build stuff is now focused on that. The technical skills to learn software engineering are currently not effectively taught in Criminal Justice PhD programs, but they could be. Writing a dissertation is way harder than learning to code.

While my python book has a major focus on data analysis, it is really the same skills to jump to more general software engineering. (I specifically wrote the book to cover more software engineering topics, like writing functions and managing environments, as most of the other python data science books lack that material.)

Skills gap is only part of the issue though. The second is supporting work that pursues building stuff. It is really just norms in the current academe that stop this from occurring now. People value papers, NIJ (at least used to) mostly fund very boring incremental work.

I discussed start ups (people dreaming and building their own stuff) and other larger established orgs (like Axon). Academics are in a prime position to pursue their own start ups, and most Universities have some support for this (see Joel Caplan and Simsi for an example of that path). Especially for software applications, there are few barriers. It is more about time and effort spent pursuing that.

I think the more interesting path is to get more academic criminologists working directly with software companies. I will drop a specific example since I am pretty sure he will not be offended, everyone would be better off if Ian Adams worked directly for one of these companies (the companies, Ian’s take home pay, long term advancement in policing operations). Ian writes good papers – it would be better if Ian worked directly with the companies to make their tools better from the get go.

My friend I was discussing this with gave the example of Bell Labs. Software orgs could easily have professors take part time gigs with them directly, or just go work with them on sabbaticals. Axon should support something like that now.

While this post has been focused on software development, I think it could look similar for collaborating with criminal justice agencies directly. The economics will need to be slightly different (they do not have quite as much expendable capital to support academics, the ROI for private sector I think should be easily positive in the long run). But that I think that would probably be much more effective than the current grant based approach. (Just pay a professor directly to do stuff, instead of asking NIJ to indirectly support evaluation of something the police department has decided to already put into operation.)

Scientific revolutions are not happening in journal articles. They are happening by people building stuff and accomplishing things in the real world with those innovations.


For a few responses to this post, Alex sent me this (saying my characterization of Larry as passively observing is not quite accurate), which is totally reasonable:

Nice post on building/ doing things and thanks for highlighting the paper with Larry. One error however, Larry was directly involved in the doing. He was the chief science officer for the London Met police and has designed their new stop and frisk policy (and targeting areas) based directly on our work. Our work was also highlighted by the Times London as effective crime policy and also by the Chief of the London Met Police as well who said it was one of the best policy relevant papers he’s ever seen. All police are now being by trained on the new legislation on stop and search in procedurally just ways. You may not have known this background but it’s directly relevant to your post.

Larry Sherman (and David Weisburd), and their work on hot spots + direct experiments with police are really exemplars of “doing” vs “learning”. (David Kennedy and his work on focused deterrence is another good example.) In the mid 90s when Larry or David did experiments, they likely were directly involved in a way that I am suggesting – the departments are going and asking Larry “what should we do”.

My personal experience, trying to apply many of the lessons of David’s and Larry’s work (which was started around 30 years ago at this point), is not quite like that. It is more of police departments have already committed to doing something (like hotspots), and want help implementing the project, and maybe some grant helps fund the research. Which is hard and important work, but honestly just looks like effective project management (and departments should just invest in researchers/project managers directly, the external funding model does not make sense long term). For a more on point example of what I mean by doing, see what Rob Guerette did as an embedded criminologist with Miami PD.

Part of the reason I wrote the post, if you think about the progression of policing, we have phases – August Vollmer for professionalization in the early 1900’s. I think you could say folks like Larry and David (and Bill Bratton) brought about a new age of metrics to PDs in the 90s.

There are also technology changes that fundamentally impact PDs. Cars + 911 is one. The most recent one is a new type of oversight via body worn cameras. Folks who are leading this wave of professionalization changes are tech folks (like Rick Smith and Anthony Tassone). I think it is a mistake to just sit on the sidelines and see what these folks come up with – I want academic criminologists to be directly involved in the nitty gritty of the implementations of these systems and making them better.

A second response to this is that building stuff is hard, which I agree and did not mean to imply it was as easy as writing papers (it is not). Here is Anthony Tassone’s response on X:

I know this is hard. This is part of why I mentioned the Bell labs path. Working directly for an already established company is much easier/safer than doing your own startup. Bootstrapping a startup is additionally much different than doing VC go big or go home – which academics on sabbaticals and as a side hustle are potentially in a good position to do this.

Laura Huey did this path, and does not have nice things to say about it:

I have not talked to Laura specifically about this, but I suspect it is her experience running the Canadian Society of Evidence Based Policing. Which I would not suggest starting a non-profit either honestly. Even if you start a for-profit, there is no guarantee you will be in a good position in your current academic position to be well supported.

Again no doubt building useful stuff is harder than writing papers. For a counter to these though, doing my bootstrapped consulting firm is definitely not as stressful as building a large company like Anthony. And working for a tech company directly was a good career move for me (although now I spend most of my day building stuff to limit fraud-waste-abuse in Medicaid claims, not improving policing).

My suggestion that the field should be more focused on building stuff was not because it was easier, it was because if you don’t there is a good chance you are mostly irrelevant.

LinkedIn is the best social media site

The end goals I want for a social media site are:

  • promote my work
  • see other peoples work

Social media for other people may have other uses. I do comment and have minor interactions on the social media sites, but I do not use them primarily for that. So my context is more business oriented (I do not have Facebook, and have not considered it). I participate some on Reddit as well, but that is pretty sparingly.

LinkedIn is the best for both relative to X and BlueSky currently. So I encourage folks with my same interests to migrate to LinkedIn.

LinkedIn

So I started Crime De-Coder around 2 years ago. I first created a website, and then second started a LinkedIn page.

When I first created the business page, I invited most of my criminal justice contacts to follow the page. I had maybe 500 followers just based on that first wave of invites. At first I posted once or twice a week, and it was very steady growth, and grew to over 1500 followers in maybe just a month or two.

Now, LinkedIn has a reputation for more spammy lifecoach self promotion (for lack of a better description). I intentionally try to post somewhat technical material, but keep it brief and understandable. It is mostly things I am working on that I think will be of interest to crime analysts or the general academic community. Here is one of my recent posts on structured outputs:

Current follower count on LinkedIn for my business page (which in retrospect may have been a mistake, I think they promote business pages less than personal pages), is 3230, and I have fairly consistent growth of a few new followers per day.

I first started posting once a week, and with additional growth expanded to once every other day and at one point once a day. I have cut back recently (mostly just due to time). I did get more engagement, around 1000+ views per day when I was posting every day.

Probably the most important part though of advertising Crime De-Coder is the types of views I am getting. My followers are not just academic colleagues I was previously friends with, it is a decent outside my first degree network of police officers and other non-profit related folks. I have landed several contracts where I know those individuals reached out to me based on my LinkedIn posting. It could be higher, as my personal Crime De-Coder website ranks very poorly on Bing search, but my LinkedIn posts come up fairly high.

When I was first on Twitter I did have a few academic collaborations that I am not sure would have happened without it (a paper with Manne Gerell, and a paper with Gio Circo, although I had met Gio in real life before that). I do not remember getting any actual consulting work though.

I mentioned it is not only better for me for advertising my work, but also consuming other material. I did a quick experiment, just opened the home page and scrolled the first 3 non-advertisement posts on LinkedIn, X, and BlueSky. For LinkedIn

This is likely a person I do not want anything to do with, but their comment I agree with. Whenever I use Service Now at my day job I want to rage quit (just send a Teams chat or email and be done with it, LLMs can do smarter routing anymore). The next two are people are I am directly connected with. Some snark by Nick Selby (which I can understand the sentiment, albeit disagree with, I will not bother to comment though). And something posted by Mindy Duong I likely would be interested in:

Then another advert, and then a post by Chief Patterson of Raleigh, whom I am not directly connected with, but was liked by Tamara Herold and Jamie Vaske (whom I am connected with).

So annoying for the adverts, but the suggested (which the feeds are weird now, they are not chronological) are not bad. I would prefer if LinkedIn had a “general” and “my friends” sections, but overall I am happier with the content I see on LinkedIn than I am the other sites.

X & BlueSky

I first created a personal then Twitter account in 2018. Nadine Connell suggested it, and it was nice then. When I first joined I think it was Cory Haberman tweeted and said to follow my work, and I had a few hundred followers that first day. Then over the next two years, just posting blog posts and papers for the most part, I grew to over 1500 followers IIRC. I also consumed quite a bit of content from criminal justice colleagues. It was much more academic focused, but it was a very good source of recent research, CJ relevant news and content.

I then eventually deleted the Twitter account, due to a colleague being upset I liked a tweet. To be clear, the colleague was upset but it wasn’t a very big deal, I just did not want to deal with it.

I started a Crime De-Coder X account last year. I made an account to watch the Trump interview, and just decided to roll with it. I tried really hard to make X work – I posted daily, the same stuff I had been sharing on LinkedIn, just shorter form. After 4 months, I have 139 followers (again, when I joined Twitter in 2018 I had more than that on day 1). And some of those followers are porn accounts or bots. Majority of my posts get <=1 like and 0 reposts. It just hasn’t resulted in getting my work out there the same way in 2018 or on LinkedIn now.

So in terms of sharing work, the more recent X has been a bust. In terms of viewing other work, my X feed is dominated by short form video content (a mimic of TikTok) I don’t really care about. This is after extensively blocking/muting/saying I don’t like a lot of content. I promise I tried really hard to make X work.

So when I open up the Twitter home feed, it is two videos by Musk:

Then a thread by Per-Olof (whom I follow), and then another short video Death App joke:

So I thought this was satire, but clicking that fellows posts I think he may actually be involved in promoting that app. I don’t know, but I don’t want any part of it.

BlueSky I have not been on as long, but given how easy it was to get started on Twitter and X, I am not going to worry about posting so much. I have 43 followers, and posts similar to X have basically been zero interaction for the most part. The content feed is different than X, but is still not something I care that much about.

We have Jeff Asher and his football takes:

I am connected with Jeff on LinkedIn, in which he only posts his technical material. So if you want to hear Jeff’s takes on football and UT-Austin stuff then go ahead and follow him on BlueSky. Then we have a promotional post by a psychologist (this person I likely would be interested in following his work, this particular post though is not very interesting). And a not funny Onion like post?

Then Gavin Hales, whom I follow, and typically shares good content. And another post I leave with no comment.

My BlueSky feed is mostly dominated by folks in the UK currently. It could be good, but it currently just does not have the uptake to make it worth it like I had with Twitter in 2018. It may be the case given my different goals, to advertise my consulting business, Twitter in 2018 would not be good either though.

So for folks who subscribe to this blog, I highly suggest to give LinkedIn a try for your social media consumption and sharing.

How much do students pay for textbooks at GSU?

Given I am a big proponent of open data, replicable scientific results, and open access publishing, I struck up a friendship with Scott Jacques at Georgia State University. One of the projects we pursued was a pretty simple, but could potentially save students a ton of money. If you have checked out your universities online library system recently, you may have noticed they have digital books (mostly from academic presses) that you can just read. No limits like the local library, they are just available to all students.

So the idea Scott had was identify books students are paying for, and then see if the library can negotiate with the publisher to have it for all students. This shifts the cost from the student to the university, but the licensing fees for the books are not that large (think less than $1000). This can save money especially if it is a class with many students, so say a $30 book with 100 students, that is $3000 students are ponying up in toto.

To do this we would need course enrollments and the books they are having students buy. Of course, this is data that does exist, but I knew going in that it was just not going to happen that someone just nicely gave us a spreadsheet of data. So I set about to scrape the data, you can see that work on Github if you care too.

The github repo in the data folder has fall 2024 and spring 2025 Excel spreadsheets if you want to see the data. I also have a filterable dashboard on my crime de-coder site.

You can filter for specific colleges, look up individual books, etc. (This is a preliminary dashboard that has a few kinks, if you get too sick of the filtering acting wonky I would suggest just downloading the Excel spreadsheets.)

One of the aspects though of doing this analysis, the types of academic publishers me and Scott set out to identify are pretty small fish. The largest happen to be Academic textbook publishers (like Pearson and McGraw Hill). The biggest, coming in at over $300,000 students spend on in a year is a Pearson text on Algebra.

You may wonder why so many students are buying an algebra book. It is assigned across the Pre-calculus courses. GSU is a predominantly low income serving institution, with the majority of students on Pell grants. Those students at least will get their textbooks reimbursed via the Pell grants (at least before the grant money runs out).

Being a former professor, these course bundles in my area (criminal justice) were comically poor quality. I accede the math ones could be higher quality, I have not purchased this one specifically, but this offers two solutions. One, Universities should directly contract with Pearson to buy licensing for the materials at a discount. The bookstore prices are often slightly higher than just buying from other sources (Pearson or Amazon) directly. (Students on Pell Grants need to buy from the bookstore though to be reimbursed.)

A second option is simply to pay someone to create open access materials to swap out. Universities often have an option for taking a sabbatical to write a text book. I am pretty sure GSU could throw 30k at an adjunct and they would write just as high (if not higher) quality material. For basic material like that, the current LLM tools could help speed the process by quite a bit.

For these types of textbooks, professors use them because they are convenient, so if a lower cost option were available that met the same needs, I am pretty sure you could convince the math department to have those materials as the standard. If we go to page two of the dashboard though, we see some new types of books pop up:

You may wonder, what is Conley Smith Publishing? It happens to be an idiosyncratic self publishing platform. Look, I have a self published book as well, but having 800 business students a semester buy your self published $100 using excel book, that is just a racket. And it is a racket that when I give that example to friends almost everyone has experienced in their college career.

There is no solution to the latter professors ripping off their students. It is not illegal as far as I’m aware. I am just guessing at the margins, that business prof is maybe making $30k bonus a semester forcing their students to buy their textbook. Unlike the academic textbook scenario, this individual will not swap out with materials, even if the alternative materials are higher quality.

To solve the issue will take senior administration in universities caring that professors are gouging their (mostly low income) students and put a stop to it.

This is not a unique problem to GSU, this is a problem at all universities. Universities could aim to make low/no-cost, and use that as advertisement. This should be particularly effective advertisement for low income serving universities.

If you are interested in a similar analysis for your own university, feel free to get in touch with either myself or Scott. We would like to expand our cost saving projects beyond GSU.

Bitcoin, Ethereum, and Axon

In 2022, I did a post on the cointegration between Bitcoin, Ethereum, Gold, and the S&P 500. I have had a few conversations about Bitcoin recently with friends and family, so figured it would be worth updating that post.

Also had a discussion with a friend about Axon last week, and when talking about stock I said “what is it at $200” and his reply was “It is close to $700”. So throwing them in the mix as well.

Here is the same indexed between 0/1 chart, so you can see all the different investments appear to be pretty correlated. Since mid 2022 all have been on a fairly consistent upward trajectory.

Now the way this chart works is y = x - min(x)/(max(x) - min(x), where x is the closing price (sampled every Friday). This hides the variation, plotting the closing prices on the logged scale better shows how volatile the different stocks are. So the S&P is a steady march, Gold has been quite consistent, the others not so much.

And a final way to show the data is to index to a start point. Here my initial post was in February 2022, so I start from there and Closing/Closing2_11_2022. So a value of 2 means it doubled from its start point, 3 tripled etc.

I was prompted to look at Ethereum back then due to the popularity of NFTs. I decided not to invest, and if looking as of last Friday, I would be at about the exact same position as I would have been with the S&P. But I would have been under for almost two years. And it would have been almost the exact same story for Bitcoin over the same period, only with the more recent increase would I have really beat the more tame investments from Bitcoin. Axon though!

With the talks about US government investment in Bitcoin (see Bryan’s Clear Value Tax video on the subject), given that traditional index funds do not cover them, I will have to consider it more seriously. (I have had 401k’s that included Gold, I presume they have not expanded to include crypto though.) They are definitely more speculative than investing in index funds, but that can be a good thing if you know that going in.

Given the NFT fad passed with Eth, I wondered if it peaked (it was already falling from a local high of 3k by that time in February 2022 I initially wrote that post). But just a few hours before writing this post, BlackRock and Fidelity purchased a big chunk, so Eth should likely continue to climb at least in the short term and is not entirely pegged to the popularity of NFTs.

The original post I wrote about cointegration analysis, which is really only useful for very short term. Thinking about more long term investments, Bitcoin is harder to peg. The Clear Value Tax video shows Powell comparing Bitcoin to Gold, which I think is probably a good way to view it. So I think you can legitimately view it as a hedge against more traditional investments at this point (and ditto for Ethereum – does CoinBase have an index fund for coins?).

Now when evaluating specific companies, whether something is a good investment is more about whether you think the company itself is on a good trajectory. As disclosure I don’t have any direct financial ties to Axon, nor have I invested in them directly beyond if my 401k has them in the portfolio. I think Axon’s rise is legit and not a fad.

So Axon currently has the dominant market share for body worn cameras and conducted energy devices. Body worn cameras I think are likely to expand out into other areas beyond police and corrections officers. There are some news articles for store workers, I suspect medical workers and teachers though are bigger expanding markets in the future. Motorola stock is not doing too shabby over this time period as well, so they may be able to capture some of that same market as well.

I am not as in the know about drones, but I presume their hardware experience for BWC and Taser make them well positioned to expand out drone capabilities.

I am not sure what prompted the most recent rise mid 2024 for Axon. I wondered if it lined up with Draft One (the generative AI to help write reports), but it appears slightly after that. Their software products I am not as bullish about offhand. I think Draft One has very good potential, although see a recent article by Ian Adams and colleagues showing it does not decrease time writing reports.

But given they have such strong market share in other areas (like BWC) they have already established sales relationships. And if they can figure out BWC they have the technical capabilities to figure out the data software stuff, like RMS as well. Basically all the different “analytic” companies have no moat – Axon could hire me (or various other data scientists) to build those same analytic software programs directly for Axon’s different current software products.

Identifying excess rounding

Given the hubbub about Blue Cross and paying for anesthesiologists, there was an interesting paper making the rounds, Comparison of Anesthesia Times and Billing Patterns by Anesthesia Practitioners. (Shared via Crémieux.)

Most medical claims are billed via what are call CPT codes (Current Procedural Terminology). So if you go to the ER, you will get billed for a code 99281 to 99285. The final digit encodes different levels of complexity for the case, with 5’s being more complex. It was news to me that anesthesiologists actually bill for time directly, but the above linked paper showed (pretty plainly) that there is strong evidence they round up to every 5 minutes.

Now the paper just selected the anesthesiologists that have the highest proportion of billed times ending in 5’s. Here I will show a better way to flag specific problematic anesthesiologists (using repeated binomial tests and false discovery rate corrections).

Here I simulate 1,000 doctors, and select 40 of them to be bad, and those 40 round all of their claims up to the nearest 5 minute mark. Whereas the other docs are just billing the time as is. And they have varying total number of claims, between 100 and 500.

import numpy as np
from scipy.stats import gamma,uniform
import matplotlib.pyplot as plt
import pandas as pd
from scipy.stats import binomtest,false_discovery_control

np.random.seed(10)

docs = 1000
# pick a random set 
bad_docs = np.random.choice(np.arange(docs),40,replace=False)
doci = []
claims = []

for i in range(docs):
    totn = int(np.ceil(uniform.rvs(100,500))) # number of claims
    doci += [i]*totn
    g = gamma.rvs(6,scale=12,size=totn)
    if i in bad_docs:
        g = np.ceil(g/5)*5
    else:
        g = g.round()
    claims += g.tolist()

dat = pd.DataFrame(zip(doci,claims),columns=['Doc','Time'])

# Histogram
fig, ax = plt.subplots(figsize=(8,4))
dat['Time'].hist(bins=range(201),alpha=0.8,color='k',ax=ax)
plt.savefig('Histogram.png',dpi=500,bbox_inches='tight')

You can see my gamma distribution is not as heavy tailed as the JAMA paper, but qualitatively has the same spike traits. Based on this, you can see the spike is relative to the other density, and so that shows in the JAMA paper there are more anesthesiologists rounding at 60 minutes than there are at the other 5 minute intervals.

In this particular example, it would be trivial to spot the bad docs, since they round 100% of the time, and you would only expect around 20% (since billing in minute intervals).

dat['Round'] = dat['Time'] % 5 == 0
dat['N'] = 1
gs = dat.groupby('Doc',as_index=False)[['N','Round']].sum()
gs['Perc'] = gs['Round']/gs['N']
gs.sort_values(by='Perc',ascending=False,ignore_index=True,inplace=True)
gs

# Upper Quantiles
np.quantile(gs['Perc'],[0.75,0.8,0.85,0.9,0.95,0.99])

But you can see a problem with using a top 5% quantile cut off here. Since I only have 4% bad doctors, using that hard cut-off will result in a few false positive flags. My suggested approach is to create a statistical test (Chi-Square, binominal, KS, whatever makes sense for individual doctors). Run the test for each doctor, get the p-value, and then run a false discovery rate correction on the p-values.

The above where doctors round 100% of the time is too easy, so here I simulate 40 doctors will round up 10% to 30% of the time. I also have fewer cases (more cases and more rounding make it much easier to spot).

# Redoing sim, but it is a smaller percentage of the stats are bad
for i in range(docs):
    totn = int(np.ceil(uniform.rvs(100,500))) # number of claims
    doci += [i]*totn
    g = gamma.rvs(6,scale=12,size=totn)
    if i in bad_docs:
        randperc = int(np.round(totn*uniform.rvs(0.1,0.2)))
        badn = np.random.choice(np.arange(totn),randperc,replace=False)
        g[badn] = np.ceil(g[badn]/5)*5
        g = g.round()
    else:
        g = g.round()
    claims += g.tolist()

dat = pd.DataFrame(zip(doci,claims),columns=['Doc','Time'])
dat['Round'] = dat['Time'] % 5 == 0
dat['N'] = 1
gs = dat.groupby('Doc',as_index=False)[['N','Round']].sum()
gs['Perc'] = gs['Round']/gs['N']
gs.sort_values(by='Perc',ascending=False,ignore_index=True,inplace=True)

Now we can apply the binomial test to each doctor, then adjust for false discovery rate.

# Calculating binom test
def bt(x):
    k,n = x.iloc[0],x.iloc[1]
    b = binomtest(k,n,p=0.2,alternative='greater')
    return b.pvalue

gs['p'] = gs[['Round','N']].apply(bt,axis=1)
gs['q'] = false_discovery_control(gs['p'],method='by')

# Captures 28 out of 40 bad docs, no false positives
gs['BadDocs'] = gs['Doc'].isin(bad_docs)
gs[gs['q'] < 0.05]

If you check out the doctors via gs.head(50), you can see that a few of the bad-docs where adjusted to have q-values of 1, but they ended up being low N and in the range you would expect.

While anesthesiologists billing is different, this same approach would be fine for CPT codes that have the 1-5 modifier (you might use a leave one out strategy and a Chi-Square test). Anesthesiologists if they know they will be scrutinized with exact 5 minutes, they will likely adjust and round up, but to not regular numbers. If that is the case, the default distribution will be expected to be uniform on the 0-9 digits (sometimes people use a Benford like test for the trailing digits, this is the same idea). That will be harder to fake.

I don’t have an issue with Blue Cross saying they will only bill for pre-allotted times. But even without making an explicit policy like that, they can identify bad actors and pursue investigations into those problematic anesthesiologists even without making widespread policy changes.

Year in Review 2024

Past year in review posts I have made focused on showing blog stats. Writing this in early December, but total views will likely be down this year – I am projecting around 140,000 views in total for this site. But I have over 25k views for the Crime De-Coder site, so it is pretty much the same compared to 2023 combining the two sites.

I do not have a succinct elevator speech to tell people what I am working on. With the Crime De-Coder consulting gig, it can be quite eclectic. That Tukey quote being a statistician you get to play in everyone’s backyard is true. Here is a rundown of the paid work I conducted in the past year.

Evidence Based CompStat: Work with Renee Mitchell and the American Society of Evidence Based Policing on what I call Evidence Based CompStat. This mostly amounts to working directly with police departments (it is more project management than crime analysis) to help them get started with implementing evidence based practices. Reach out if that sounds like something your department would be interested in!

Estimating DV Violence: Work supported by the Council on CJ. I forget exactly the timing of events. This was an idea I had for a different topic (to figure out why stores and official reports of thefts were so misaligned). Alex approached me to help with measuring national level domestic violence trends, and I pitched this idea (use local NIBRS data and NCVS to get better local estimates).

Premises Liability: I don’t typically talk about ongoing cases, but you can see a rundown of some of the work I have done in the past. It is mostly using the same stats I used as a crime analyst, but in reference to civil litigation cases.

Patrol Workload Analysis: I would break workload analysis for PDs down into two categories, advanced stats and CALEA reports. I had one PD interested in the simpler CALEA reporting requirement (which I can do for quite a bit cheaper than the other main consulting firm that offers these services).

Kansas City Python Training: Went out to Kansas City for a few days to train their analysts up in using python for Focused Deterrence. If you think the agenda in the pic below looks cool get in touch, I would love to do more of these sessions with PDs. I make it custom for the PD based on your needs, so if you want “python and ArcGIS”, or “predictive models” or whatever, I will modify the material to go over those advanced applications. I have also been pitching the same idea (short courses) for PhD programs. (So many posers in private sector data science, I want more social science PhDs with stronger tech skills!)

Patterson Opioid Outreach: Statistical consulting with Eric Piza and Kevin Wolff on a street outreach intervention intended to reduce opioid overdose in Patterson New Jersey. I don’t have a paper to share for that at the moment, but I used some of the same synthetic control in python code I developed.

Bookstore prices: Work with Scott Jacques, supported by some internal GSU money. Involves scraping course and bookstore data to identify the courses that students spend the most on textbooks. Ultimate goal in mind is to either purchase those books as unlimited epubs (to save the students money), or encourage professors to adopt better open source materials. It is a crazy amount of money students pour into textbooks. Several courses at GSU students cumulatively spend over $100k on course materials per semester. (And since GSU has a large proportion of Pell grant recipients, it means the federal government subsidizes over half of that cost.)

General Statistical Consulting: I do smaller stat consulting contracts on occasion as well. I have an ongoing contract to help with Pam Metzger’s group at the SMU Deason center. Did some small work for AH Datalytics on behind the scenes algorithms to identify anomalous reporting for the real time crime index. I have several times in my career consulted on totally different domains as well, this year had a contract on calculating regression spline curves for some external brain measures.

Data Science Book: And last (that I remember), I published Data Science for Crime Analysis with Python. I still have not gotten my 100 sales I would consider it a success – so if you have not bought a copy go do that right now. (Coupon code APWBLOG will get you $10 off for the next few weeks, either the epub or the paperback.)

Sometimes this seems like I am more successful than I am. I have stopped counting the smaller cold pitches I make (I should be more aggressive with folks, but most of this work is people reaching out to me). But in terms of larger grant proposals or RFPs in that past year, I have submitted quite a few (7 in total) and have landed none of them to date! Submitted a big one to support surveys that myself and Gio won the NIJ competition on for place based surveys to NIJ in their follow up survey solicitation, and it was turned down for example. So it goes.

In addition to the paid work, I still on occasion publish peer reviewed articles. (I need to be careful with my time though.) I published a paper with Kim Rossmo on measuring the buffer zone in journey to crime data. I also published the work on measuring domestic violence supported by the Council on CJ with Alex Piquero.

I took the day gig in Data Science at the end of 2019. Citations are often used as a measure of a scholars influence on the field – they are crazy slow though.

I had 208 citations by the end of 2019, I now have over 1300. Of the 1100 post academia, only a very small number are from articles I wrote after I left (less than 40 total citations). A handful for the NIJ recidivism competition paper (with Gio), and a few for this Covid and shootings paper in Buffalo. The rest of the papers that have a post 2019 publishing date were entirely written before I left academia.

Always happy to chat with folks on teaming up on papers, but it is hard to take the time to work on a paper for free if I have other paid work at the moment. One of the things I need to do to grow the business is to get some more regular work. So if you have a group (academic, think tank, public sector) that is interested in part time (or fractional I guess is what the cool kids are calling it these days), I would love to chat and see if I could help your group out.