Youtube interview with Manny San Pedro on Crime Analysis and Data Science

I recently did an interview with Manny San Pedro on his YouTube channel, All About Analysis. We discuss various data science projects I conducted while either working as an analyst, or in a researcher/collaborator capacity with different police departments:

Here is an annotated breakdown of the discussion, as well as links to various resources I discuss in the interview. This is not a replacement for listening to the video, but is an easier set of notes to link to more material on what particular item I am discussing.

0:00 – 1:40, Intro

For rundown of my career, went to do PhD in Albany (08-15). During that time period I worked as a crime analyst at Troy, NY, as well as a research analyst for my advisor (Rob Worden) at the Finn Institute. My research focused on quant projects with police departments (predictive modeling and operations research). In 2019 went to the private sector, and now work as an end-to-end data scientist in the healthcare sector working with insurance claims.

You can check out my academic and my data science CV on my about page.

I discuss the workshop I did at the IACA conference in 2017 on temporal analysis in Excel.

Long story short, don’t use percent change, use other metrics and line graphs.

7:30 – 13:10, Patrol Beat Optimization

I have the paper and code available to replicate my work with Carrollton PD on patrol beat optimization with workload equality constraints.

For analysts looking to teach themselves linear programming, I suggest Hillier’s book. I also give examples on linear programming on this blog.

It is different than statistical analysis, but I believe has as much applicability to crime analysis as your more typical statistical analysis.

13:10 – 14:15, Million Dollar Hotspots

There are hotspots of crime that are so concentrated, the expected labor cost reduction in having officers assigned full time likely offsets the position. E.g. if you spend a million dollars in labor addressing crime at that location, and having a full time officer reduces crime by 20%, the return on investment for hotspots breaks even with paying the officers salary.

I call these Million dollar hotspots.

14:15 – 28:25, Prioritizing individuals in a group violence intervention

Here I discuss my work on social network algorithms to prioritize individuals to spread the message in a focussed deterrence intervention. This is opposite how many people view “spreading” in a network, I identify something good I want to spread, and seed the network in a way to optimize that spread:

I also have a primer on SNA, which discusses how crime analysts typically define nodes and edges using administrative data.

Listen to the interview as I discuss more general advice – in SNA it matters what you want to accomplish in the end as to how you would define the network. So I discuss how you may want to define edges via victimization to prevent retaliatory violence (I think that would make sense for violence interupptors to be proactive for example).

I also give an example of how detective case allocation may make sense to base on SNA – detectives have background with an individuals network (e.g. have a rapport with a family based on prior cases worked).

28:25 – 33:15, Be proactive as an analyst and learn to code

Here Manny asked the question of how do analysts prevent their role being turned into more administrative role (just get requests and run simple reports). I think the solution to this (not just in crime analysis, but also being an analyst in the private sector) is to be proactive. You shouldn’t wait for someone to ask you for specific information, you need to be defining your own role and conducting analysis on your own.

He also asked about crime analysis being under-used in policing. I think being stronger at computer coding opens up so many opportunities that learning python, R, SQL, is the area I would like to see stronger skills across the industry. And this is a good career investment as it translates to private sector roles.

33:15 – 37:00, How ChatGPT can be used by crime analysts

I discuss how ChatGPT may be used by crime analysis to summarize qualitative incident data and help inform . (Check out this example by Andreas Varotsis for an example.)

To be clear, I think this is possible, but the tech I don’t think is quite up to that standard yet. Also do not submit LEO sensitive data to OpenAI!

Also always feel free to reach out if you want to nerd out on similar crime analysis questions!

The serenity prayer and being a senior developer

The serenity prayer, for those who don’t know it is:

God, grant me the serenity to accept the things I cannot change, courage to change the things I can, and wisdom to know the difference.

I think this is an important concept that distinguishes good senior developers from junior developers (or data scientists, or crime analysts, the title doesn’t really matter).

Many very green junior developers tend to err on the ‘I cannot change anything’ side. Or put another way, they are told ‘we are going to do XYZ’, and instead of saying ‘we don’t need to do Y, we can just do XZ’ they just go with the flow and do what others tell them to do. For a more concrete example, close to every project at my workplace that uses Hadoop, it is probably unnecessary. So often groups come in and say ‘we need to go from DatabaseX -> Hadoop -> Machine Learning Model -> DatabaseY’. So people go on this path, even though you could just chunk up the data into more memory safe ways and cut out Hadoop entirely.

Another common data science one I come across is ‘the business wants a ranking of priority claims that places them into bins of 1/2/3’. Instead of making a proper utility derived decision rule, the data scientist gives the business what they ask for, using ad-hoc and clearly suboptimal rules to make the bins. It is similar to the XY problem, juniors just need to recognize they have agency to go back to the business partners and say ‘we should actually do it like this instead’.

For a crime analysis example, when I worked at Troy PD and implemented these weekly metrics, the Chief at the time asked me to remove the error bars on the weekly forecasts. I simply explained to him that I used those to tell if a recent uptick was anomalous (if inside the bars it is what we would expect), and he said OK I understand now why you do that. I do things on occasion because a higher up asks that I don’t prefer, but you should push back in data science roles to nudge people to the right metrics (who often do not have as much expertise as you). It takes courage as the prayer goes.

I use the condition good senior developer earlier in the post, as I know senior people who fall into the trap of just going with the flow too much as well. But another typology for seniors is the ‘accept the things I cannot change’. I have come across this less often, but there are a few people who are very zealous about different tools/methods – kubernetes, everything needs to be CICD, agile – even when they are not possible to coerce to the particular situation. Many of these methods could be fine if they could be applied easily to the project at hand, but if it takes 2 years to develop your kubernetes or CICD pipeline, whereas I can log into a virtual machine, do a one time set up and be done in a much shorter period of time, you should probably rethink your approach.

Often the developers don’t realize it will take 2 years (or there are fundamental problems with the approach that makes it not feasible). That is why good seniors have the wisdom to know the difference between things they can change and things they cannot.


I am going to be annoying and plug my consulting firm, CRIME De-Coder LLC for a bit here on the blog. So please check my work and get in touch if you or your agency/business have any needs for statistical analysis, process automation, program analysis, predictive analytics, etc.