Too relaxed? Naive Bayes does not improve recidivism forecasting in the NIJ challenge

So the paper Improving Recidivism Forecasting With a Relaxed Naïve Bayes Classifier (Lee et al., 2023), recently published in Crime & Delinquency, has incorrect results. Note I am not sandbagging on the authors, I reviewed this paper for JQC and Journal of Criminal Justice, so I have given the authors this same feedback already (multiple times!). The authors however did not correct their results, and just journal shopped and published the wrong findings.

I have replication code here to review. (Note I initially made a mistake in my code replication, reversed calculating p(x|y), I calculated p(y|x) by accident, see this older code I shared in my prior reviews, but I was still correct in my assertion that Lee’s results were wrong.)

So the main thing that made me go to this effort, the authors report unbelieveable results. They report Brier Scores for Females (Round 1) of 0.104 and for males 0.159 – these scores blow the competition out of the water. The leaderboard was 0.15 for Females and 0.19 for males. Note how I don’t list to the third decimal – the difference between the teams you needed to go down that low. Lee also reports unbelievably low Brier scores for the alternative logit and random forest models – their results just on their face are not believable.

If the authors really believe their results this kind of sucks for them they did not participate in the NIJ challenge, they would have won more than $150,000! But I am pretty sure they are miscalculating their Brier scores somewhere. My replication code shows them in the same ballpark as everyone else, but they would not have made the leaderboard. Here are my estimates of what their Brier scores should be reported as (the Brier column below in the two tables):

Folks can go and look at their paper and their set of spreadsheets in the supplemental material – I have posted not many more than 50 lines of (non-comment) python code that replicates their regression model coefficients and shows their Brier scores are wrong though. (And subsequently any points Lee et al. 2023 make about fairness are thus wrong as well.)

NIJ probably released papers at some point, but if you want to see other folks discussion, there is Circo & Wheeler (2022) (for mine and Gio’s results for team MCHawks), and Mohler & Porter (2021) for team PASDA.

I may put in the slate sometime to discuss naive Bayes (and other categorical encoding schemes). It is not a bad idea for data with many categories, but for this NIJ data there just isn’t that much to squeeze out of the data. So any future work will be unlikely to dramatically improve upon the competition results (it is difficult to overfit this data). Again given my analysis here, I am pretty sure a valid data analysis (not peeking) at best will “beat” the competition results in the 3rd decimal place (if they can improve at all).

Now part of the authors argument is that this method (relaxed naive Bayes) results in simpler interpretations. Typically people interpret “simple” models in terms of the end results, e.g. having a simple checklist of integer weights. The more I deal with predictive models though, I think this is maybe misguided. You could also interpret “simple” in terms of the code used for how someone derived the weights (and evaluated the final metrics). This is important when auditing code that others have written, as you will ultimately take the code and apply it to your data.

I think this “simpler to estimate the same results” is probably more important for scientists and outside groups wanting to verify the integrity of any particular machine learning model than “simple end result weights”. Otherwise scientists can make up results and say my method is better. Which is simpler I suppose, but misses the boat a bit in terms of why we want simple models to begin with.

References

Youtube interview with Manny San Pedro on Crime Analysis and Data Science

I recently did an interview with Manny San Pedro on his YouTube channel, All About Analysis. We discuss various data science projects I conducted while either working as an analyst, or in a researcher/collaborator capacity with different police departments:

Here is an annotated breakdown of the discussion, as well as links to various resources I discuss in the interview. This is not a replacement for listening to the video, but is an easier set of notes to link to more material on what particular item I am discussing.

0:00 – 1:40, Intro

For rundown of my career, went to do PhD in Albany (08-15). During that time period I worked as a crime analyst at Troy, NY, as well as a research analyst for my advisor (Rob Worden) at the Finn Institute. My research focused on quant projects with police departments (predictive modeling and operations research). In 2019 went to the private sector, and now work as an end-to-end data scientist in the healthcare sector working with insurance claims.

You can check out my academic and my data science CV on my about page.

I discuss the workshop I did at the IACA conference in 2017 on temporal analysis in Excel.

Long story short, don’t use percent change, use other metrics and line graphs.

7:30 – 13:10, Patrol Beat Optimization

I have the paper and code available to replicate my work with Carrollton PD on patrol beat optimization with workload equality constraints.

For analysts looking to teach themselves linear programming, I suggest Hillier’s book. I also give examples on linear programming on this blog.

It is different than statistical analysis, but I believe has as much applicability to crime analysis as your more typical statistical analysis.

13:10 – 14:15, Million Dollar Hotspots

There are hotspots of crime that are so concentrated, the expected labor cost reduction in having officers assigned full time likely offsets the position. E.g. if you spend a million dollars in labor addressing crime at that location, and having a full time officer reduces crime by 20%, the return on investment for hotspots breaks even with paying the officers salary.

I call these Million dollar hotspots.

14:15 – 28:25, Prioritizing individuals in a group violence intervention

Here I discuss my work on social network algorithms to prioritize individuals to spread the message in a focussed deterrence intervention. This is opposite how many people view “spreading” in a network, I identify something good I want to spread, and seed the network in a way to optimize that spread:

I also have a primer on SNA, which discusses how crime analysts typically define nodes and edges using administrative data.

Listen to the interview as I discuss more general advice – in SNA it matters what you want to accomplish in the end as to how you would define the network. So I discuss how you may want to define edges via victimization to prevent retaliatory violence (I think that would make sense for violence interupptors to be proactive for example).

I also give an example of how detective case allocation may make sense to base on SNA – detectives have background with an individuals network (e.g. have a rapport with a family based on prior cases worked).

28:25 – 33:15, Be proactive as an analyst and learn to code

Here Manny asked the question of how do analysts prevent their role being turned into more administrative role (just get requests and run simple reports). I think the solution to this (not just in crime analysis, but also being an analyst in the private sector) is to be proactive. You shouldn’t wait for someone to ask you for specific information, you need to be defining your own role and conducting analysis on your own.

He also asked about crime analysis being under-used in policing. I think being stronger at computer coding opens up so many opportunities that learning python, R, SQL, is the area I would like to see stronger skills across the industry. And this is a good career investment as it translates to private sector roles.

33:15 – 37:00, How ChatGPT can be used by crime analysts

I discuss how ChatGPT may be used by crime analysis to summarize qualitative incident data and help inform . (Check out this example by Andreas Varotsis for an example.)

To be clear, I think this is possible, but the tech I don’t think is quite up to that standard yet. Also do not submit LEO sensitive data to OpenAI!

Also always feel free to reach out if you want to nerd out on similar crime analysis questions!

Make more money

So I enjoy Ramit Sethi’s Netflix series on money management – fundamentally it is about money coming in and money going out and the ability to balance a budget. On occasion I see other budget coaches focus on trivial expenses (the money going out) whereas for me (and I suspect the majority of folks reading this blog with higher degrees and technical backgrounds) you should almost always be focused on finding a higher paying job.

Lets go with a common example people use as unnecessary discretionary spending – getting a $10 drink at Starbucks every day. If you do this, over the course of a 365 day year, you will have spent $3650 additional dollars. If you read my blog about coding and statistics and that expense bothers you, you are probably not making as much money as you should be.

Ramit regularly talks about asking for raises – I am guessing most people reading this blog if you got a raise it would be well over that Starbucks expense. But part of the motivation to write this post is in reference to formerly being a professor. I think many criminal justice (CJ) professors are underemployed, and should consider better paying jobs. I am regularly starting to see public sector jobs in CJ that have substantially better pay than being a professor. This morning was shared a position for an entry level crime analyst at the Reno Police Department with pay range from $84,000 to $102,000:

The low end of that starting pay range is competitive with the majority of starting assistant professor salaries in CJ. You can go check out what the CJ professors at Reno make (which is pretty par for the course for CJ departments in the US) in comparison. If I had stayed as a CJ professor, even with moving from Dallas to other universities and trying to negotiate raises, I would be lucky to be making over $100k at this point in time. Again, that Reno position is an entry level crime analyst – asking for a BA + 2 years of experience or a Masters degree.

Private sector data science jobs in comparison, in DFW area in 2019 entry level were often starting at $105k salary (based on personal experience). You can check out BLS data to examine average salaries in data science if you want to look at your particular metro area (it is good to see the total number in that category in an area as well).

While academic CJ salaries can sometimes be very high (over $200k), these are quite rare. There are a few things going against professor jobs, and CJ ones in particular, that depress CJ professor wages overall. Social scientists in general make less than STEM fields, and CJ departments are almost entirely in state schools that tend to have wage compression. Getting an offer at Harvard or Duke is probably not in the cards if you have a CJ degree.

In addition to this, with the increase in the number of PhDs being granted, competition is stiff. There are many qualified PhDs, making it very difficult to negotiate your salary as an early career professor – the university could hire 5 people who are just as qualified in your stead who aren’t asking for that raise.

So even if you are lucky enough to have negotiating power to ask for a raise as a CJ professor (which most people don’t have), you often could make more money by getting a public sector CJ job anyway. If you have quant skills, you can definitely make more money in the private sector.

At this point, most people go back to the idea that being a professor is the ultimate job in terms of freedom. Yes, you can pursue whatever research line you want, but you still need to teach courses, supervise students, and occasionally do service to the university. These responsibilities all by themselves are a job (the entry level crime analyst at Reno will work less overall than the assistant professor who needs to hustle to make tenure).

To me the trade off in freedom is worth it because you get to work directly with individuals who actually care what you do – you lose freedom because you need to make things within the constraints of the real world that real people will use. To me being able to work directly on real problems and implement my work in real life is a positive, not a negative.

Final point to make in this blog, because of the stiff competition for professor positions, I often see people suggesting there are too many PhDs. I don’t think this is the case though, you can apply the skills you learned in getting your CJ PhD to those public and private sector jobs. I think CJ PhD programs just need small tweaks to better prepare students for those roles, in addition to just letting people know different types of positions are available.

It is pretty much at the point that alt-academic jobs are better careers than the majority of CJ academic professor positions. If you had the choice to be an assistant professor in CJ at University of Nevada Reno, or be a crime analyst at Reno PD, the crime analyst is the better choice.

Javascript apps and ASEBP update

So for a quick update, my most recent post on ASEBP, This One Simple Trick Will Improve Attitudes Toward Police. (Note you need a ASEBP membership to read.) There are several recent studies by different groups showing follow up to victims, even if you won’t solve the crime in the end, improves overall attitudes towards police. Simple thing for PDs to do. See the reference list at the end of the post for various studies.

Besides that, no blog posts here recently as I have been working on my CRIME De-Coder site, in particular developing a few additional javascript demo’s. My most recent one is a social network app applying my dominant set algorithm (to prioritize call-ins in a group violence/focused deterrence intervention) (Wheeler et al., 2019).

The javascript apps are very nice, as they are all client side – my website just serves the text files, and your local browser does all the hard work. I don’t need to worry about dealing with LEO sensitive data in that scenario either.

I am still learning a ton of website development (will have some surveys deployed using PHP + google sheets here soonish on CRIME De-Coder). Debate on whether it is worth writing up blog posts here. The javascript network application is almost a 1:1 translate of my python code. Vectorized stuff I don’t know much about doing in javascript, but the network algorithm stuff is mostly just dictionaries, sets, and loops. If interested, you can just right click on the browser when the page is open and inspect the source.

References

  • Clark, B., Ariel, B., & Harinam, V. (2022). How Should the Police Let Victims Down? The Impact of Reassurance Call-Backs by Local Police Officers to Victims of Vehicle and Cycle Crimes: A Block Randomized Controlled Trial. Police Quarterly, Online First.
  • Curtis-Ham, S., & Cantal, C. (2022). Locks, lights, and lines of sight: an RCT evaluating the impact of a CPTED intervention on repeat burglary victimisation. Journal of Experimental Criminology, Online First.
  • Henning, Kris et al. 2023. The Impact of Online Crime Reporting on Community Trust, Police Chief Online, April 12, 2023
  • Wheeler, A. P., McLean, S. J., Becker, K. J., & Worden, R. E. (2019). Choosing representatives to deliver the message in a group violence intervention. Justice Evaluation Journal, 2(2), 93-117.

Criminology not on the brink

I enjoy reading Jukka Savolainen’s hot takes, most recently Give Criminology a Chance: Notes from a discipline on the brink. I think Jukka is wrong on a few points, but if you are a criminologist who goes to ASC conferences, definitely go and read it! To be specific, in addition to the title here are two penultimate paragraphs in full that I mostly disagree with:

I arrived in Atlanta with a pessimistic view of academic criminology. During my 30 years in the field, the scholarship has become increasingly political and intolerant of evidence that contradicts the progressive narrative. The past few years have been particularly discouraging for those who care about scientific rigor and truth. Despite these reservations, I approached the ASC meeting with an open mind.

The situation is far from hopeless. True, criminology possesses precious little viewpoint diversity. Much of the scholarship is more interested in pursuing a political agenda than objective truth. The ASC’s outward stance as a politically neutral arbiter of scientific evidence is at odds with its recent history as an activist organization.

Although his take on a generic American Society of Criminology experience is again not misleading and accurate, I am not so sure about the assessment of the trend over time, e.g. “increasingly political and intolerant”. Nor do I think criminology has too “little viewpoint diversity”.

The latter statement is to be frank absurd. For those who haven’t been to an ASC conference, there are no restrictions to who can become a member of the American Society of Criminology. The yearly conference is essentially open as well – you have to submit an abstract for review, but I have never heard of an abstract being turned down (let me know if you are aware of an example!) So you really get the kaleidoscope (as Jukka articulated). Policing scholars, abolitionists, quantitative, qualitative, ghost criminology – criminologists are a heterogeneous bunch.

About the only way to steelman the statement “precious little viewpoint diversity” is to say something more like certain opinions in the field are rewarded/punished, such as being in advanced positions at ASC, or limiting what gets published in the ASC journals (Criminology or Criminology and Public Policy). Or maybe that the average mix of the field slants one way or another (say between pro criminal justice or critical criminal justice).

I have not been around 30 years like Jukka, and I suppose I lost my card carrying criminologist privileges when I went to the private sector, but I haven’t seen any clear change in the nature of field, the ASC conference, or what has been published, in the last ~10 years I have been in a reasonable position to make that judgment. I think Jukka (or anyone) would be hard pressed to quantify his perception – but certainly open to real evidence if I am wrong here (again just my opinion based on fewer years of experience than Jukka).

As a side story, I have heard many of my friends who do work in policing state that they have been criticized for that by colleagues, and subsequently argue our field is “biased against cops”. I don’t doubt my friends personal experiences, but I have personally never been criticized for working with the police. I have been criticized by fellow policing scholars as “downloading datasets” and “not being a real policing scholar”. I know qualitative criminologists who think they are biased against in the field (based on rates of qualitative publishing). I know quantitative criminologists who have given examples of bias in the field against more rigorous empirical methods. I know Europeans who think the field is biased towards Americans. I bet the ghost criminologists think the living are biased against the (un)dead.

I think saying “Much of the scholarship is more interested in pursuing a political agenda than objective truth” is a tinge strong, but sure it happens (I am not quite sure how to map “much” to a numeric value so the statement can be confirmed or refuted). I would say being critical of some work, but then uncritically sharing equally unrigorous work that confirms your pre-conceived notions is an example of this! So if you think one or more is “much”, then I guess I don’t disagree with Jukka here – to be clear though I think the majority of criminologists I have met are interested in pursuing the truth (even if I disagree with the methods they use).

So onto the last sentence of Jukka’s I disagree with, “The ASC’s outward stance as a politically neutral arbiter of scientific evidence is at odds with its recent history as an activist organization.”. But I disagree with this because I personally have a non-normative take on science – I don’t think science is wholly defined by being a neutral arbiter of truth, and doing science in the real world literally involves things that are “activist”.

I believe if you asked most people with Phds what defines science, they would say that science is defined via the scientific method. I personally think that is wrong though. I think about the only thing we share as scientists are being critique-y assholes. The way I do my work is so different from many other criminologists (both quantitative and qualitative), let alone researchers in other scientific fields (like theoretical physics or history), that I think saying we “share a common research method” is a bit of a stretch.

When my son was younger and had science fairs, they were broken into two different types of submissions; traditional science experiments, like measure a plants growth in sunlight vs without, or engineering “build things”. The academic work I am most proud of is in the engineering “build things” camp. These modest contributions in various algorithms – a few have been implemented in major software, and I know some crime analysis units using that work as well – really have nothing to do with the scientific method. Me deriving standard errors for control charts for crime trends is only finding truth in a very tautological way – I think they are useful though.

There is no bright line between my work and “activism” – I don’t think that is a bad thing though and it was the point of the work. You could probably say Janet Lauritsen is an activist for more useful national level CJ statistics. Jukka appears to me to be making normative opinions about he thinks Janet’s activism is more rigorously motivated than Vitale’s – which I agree with, but doesn’t say much if anything about the field of criminology as a whole or recent changes in the field. (If anything it is evidence against Jukka’s opinion, I posit Janet is clearly more influential in the field than Vitale.)


To end with the note “on the brink” – it may be unfair to Jukka (sometimes you don’t get to pick your titles in magazine articles). Part of the way I view being an academic and critiquing work I imagine people find irksome – it involves taking real words people say, trying to reasonably map them to statements that can be confirmed or refuted (often people say things that are quite fuzzy), and then articulating why those statements are maybe right/maybe wrong. It can seem pedantic, but I am a Popper kind-of-guy, and being able to confirm or refute statements I think is the only way we can get closer to objective truth.

To do this with “on the brink” takes more leaps than statements such as “increasingly political and intolerant”. “Criminology” is the general study of criminal behavior – which I am pretty confident will continue on as long as people commit crimes with or without the ASC yearly conference. We can probably limit the “on the brink” statement to something more specific like the American Society of Criminology on the brink. I don’t know about the ASC financials, but I am going to guess Jukka meant by this statement more of a proclamation about the legitimacy of the organization to outside groups.

I am not so sure this is the point of ASC though – it derives its value by being a social club for people who do criminology research. At least that is my impression of going to ASC conferences from my decade as a criminologist. Part of Jukka’s point is that things are getting worse more recently – you can’t lose something you never had to begin with though.

This one simple change will dramatically improve reproducibility in journals

So Eric Stewart is back in the news, and it appears a new investigation has prompted him to resign from Florida State. For a background on the story, I suggest reading Justin Pickett’s EconWatch article. In short, Justin did analysis of his own papers he co-authored with Stewart to show what is likely data fabrication. Various involved parties had superficial responses at first, but after some prodding many of Stewart’s papers were subsequently retracted.

So there is quite a bit of human messiness in the responses to accusations of error/fraud, but I just want to focus on one thing. In many of these instances, the flow goes something like:

  1. individual points out clear numeric flaws in a paper
  2. original author says “I need time to investigate”
  3. multiple months later, original author has still not responded
  4. parties move on (no resolution) OR conflict (people push for retraction)

My solution here is a step that mostly fixes the time lag in steps 2/3. Authors who submit quantitative results should be required to submit statistical software log files along with their article to the journal from the start.

So there is a push in social sciences to submit fully reproducible results, where an outside party can replicate 100% of the analysis. This is difficult – I work full time as a software engineer – it requires coding skills most scientists don’t have, as well as outside firms to devote resources to the validation. (Offhand, if you hired me to do this, I would probably charge something like $5k to $10k I am guessing given the scope of most journal articles in social sciences.)

An additional problem with this in criminology research, we are often working with sensitive data that cannot easily be shared.

I agree a fully 100% reproducible would be great – lets not make the perfect the enemy of the good though. What I am suggesting is that authors should directly submit the log files that they used to produce tables/regression results.

Many authors currently are running code interactively in Stata/R/SPSS/whatever, and copy-pasting the results into tables. So in response to 1) above (the finding of a data error), many parties assume it is a data transcription error, and allow the original authors leeway to go and “investigate”. If journals have the log files, it is trivial to see if a data error is a transcription error, and then can move into a more thorough forensic investigation stage if the logs don’t immediately resolve any discrepancies.


If you are asking “Andy, I don’t know how to save a log file from my statistical analysis”, here is how below. It is a very simple thing – a single action or line of code.

This is under the assumption people are doing interactive style analysis. (It is trivial to save a log file if you have created a script that is 100% reproducible, e.g. in R it would then just be something like Rscript Analysis.R > logfile.txt.) So is my advice to save a log file when doing interactive partly code/partly GUI type work.

In Stata, at the beginning of your session use the command:

log using "logfile.txt", text replace

In R, at the beginning of your session:

sink("logfile.txt")
...your code here...
# then before you exit the R session
sink()

In SPSS, at the end of your session:

OUTPUT EXPORT /PDF DOCUMENTFILE="local_path\logfile.pdf".

Or you can go to the output file and use the GUI to export the results.

In python, if you are doing an interactive REPL session, can do something like:

python > logfile.txt
...inside REPL here...

Or if you are using Jupyter notebooks can just save the notebook a html file.

If interested in learning how to code in more detail for regression analysis, I have PhD course notes on R/SPSS/Stata.


This solution is additional work from the authors perspective, but a very tiny amount. I am not asking for 100% reproducible code front to back, I just want a log file that shows the tables. These log files will not show sensitive data (just summaries), so can be shared.

This solution is not perfect. These log files can be edited. Requiring these files will also not prevent someone from doctoring data outside of the program and then running real analysis on faked data.

It ups the level of effort for faking results though by a large amount compared to the current status quo. Currently it just requires authors to doctor results in one location, this at a minimum requires two locations (and to keep the two sources equivalent is additional work). Often the outputs themselves have additional statistical summaries though, so it will be clearer if someone doctored the results than it would be from a simpler table in a peer reviewed article.

This does not 100% solve the reproducibility crisis in social sciences. It does however solve the problem of “I identified errors in your work” and “Well I need 15 months to go and check my work”. Initial checks for transcription vs more serious errors with the log files can be done by the journal or any reasonable outsider in at most a few hours of work.

ASEBP blog posts, and auto screenshotting websites

I wanted to give an update here on the Criminal Justician series of blogs I have posted on the American Society of Evidence Based Policing (ASEBP) website. These include:

  • Denver’s STAR Program and Disorder Crime Reductions
    • Assessing whether Denver’s STAR alternative mental health responders can be expected to decrease a large number of low-level disorder crimes.
  • Violent crime interventions that are worth it
    • Two well-vetted methods – hot spots policing and focused deterrence – are worth the cost for police to implement to reduce violent crime.
  • Evidence Based Oversight on Police Use of Force
    • Collecting data in conjunction with clear administrative policies has strong evidence it overall reduces officer use of force.
  • We don’t know what causes widespread crime trends
    • While we can identify whether crime is rising or falling, retrospectively identifying what caused those ups and downs is much more difficult.
  • I think scoop and run is a good idea
    • Keeping your options open is typically better than restricting them. Police should have the option to take gun shot wound victims directly to the emergency room when appropriate.
  • One (well done) intervention is likely better than many
    • Piling on multiple interventions at once makes it impossible to tell if a single component is working, and is likely to have diminishing returns.

Going forward I will do a snippet on here, and refer folks to the ASEBP website. You need to sign up to be able to read that content – but it is an organization that is worth joining (besides for just reading my takes on science around policing topics).


So my CRIME De-Coder LLC has a focus on the merger of data science and policing. But I have a bit of wider potential application. Besides statistical analysis in different subject areas, one application I think will be of wider interest to public and private sector agencies is my experience in process automation. These often look like boring things – automating generating a report, sending an email, updating a dashboard, etc. But they can take substantial human labor, and automating also has the added benefit of making a process more robust.

As an example, I needed to submit my website as a PDF file to obtain a copyright. To do this, you need to take screenshots of your website and all its subsequent pages. Googling on this for selenium and python, the majority of the current solutions are out of date (due to changes in the Chrome driver in selenium over time). So here is the solution I scripted up the morning I wanted to submit the copyright – it took about 2 hours total in debugging. Note that this produces real screenshots of the website, not the print to pdf (which looks different).

It is short enough for me to just post the entire script here in a blog post:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
from PIL import Image
import os

home = 'https://crimede-coder.com/'

url_list = [home,
            home + 'about',
            home + 'blog',
            home + 'contact',
            home + 'services/ProgramAnalysis',
            home + 'services/PredictiveAnalytics',
            home + 'services/ProcessAutomation',
            home + 'services/WorkloadAnalysis',
            home + 'services/CrimeAnalysisTraining',
            home + 'services/CivilLitigation',
            home + 'blogposts/2023/ServicesComparisons']

res_png = []

def save_screenshot(driver, url, path, width):
    driver.get(url)
    # Ref: https://stackoverflow.com/a/52572919/
    original_size = driver.get_window_size()
    #required_width = driver.execute_script('return document.body.parentNode.scrollWidth')
    required_width = width
    required_height = driver.execute_script('return document.body.parentNode.scrollHeight')
    driver.set_window_size(required_width,required_height)
    #driver.save_screenshot(path)  # has scrollbar
    driver.find_element(By.TAG_NAME, 'body').screenshot(path)  # avoids scrollbar
    driver.set_window_size(original_size['width'], original_size['height'])

options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)

for url in url_list:
    driver.get(url)
    if url == home:
        name = "index.png"
    else:
        res_url = url.replace(home,"").replace("/","_")
        name = res_url + ".png"
    time.sleep(1)
    res_png.append(name)
    save_screenshot(driver,url,name,width=1400)

driver.quit()

# Now appending to PDF file
images = [Image.open(f).convert('RGB') for f in res_png if f[-3:] == 'png']
i1 = images.pop(0)
i1.save(r'Website.pdf', save_all=True, append_images=images)

# Now removing old PNG files
for f in res_png:
    os.remove(f)

One of the reasons I want to expand knowledge of coding practices into policing (as well as other public sector fields) is that this simple of a thing doesn’t make sense for me to package up and try to monetize. The IP involved in a 2 hour script is not worth that much. I realize most police departments won’t be able to take the code above and actually use it – it is better for your agency to simply do a small contract with me to help you automate the boring stuff.

I believe this is in large part a better path forward for many public sector agencies, as opposed to buying very expensive Software-as-a-Service solutions. It is better to have a consultant to provide a custom solution for your specific agency, than to spend money on some big tool and hope your specific problems fit their mold.

Crime De-Coder LLC Website

So I have created CRIME De-Coder LLC, a firm to do my consulting work with police departments. Check out my website, crimede-coder.com.

Feedback is welcome. In particular check out the services pages, and my first blog post on what distinguishes my services from most firms. Providing computer code to generate the end product is “teaching a man a fish”, whereas most firms just drop a final report and leave.

And of course feel free to reach out to consult@crimede-coder.com if you are interested in pursuing a project. Going forward I plan on making a new post around once a month, so sign up in your feed reader or using a service like IFTTT.


Setting up a stand alone website is not that hard in the end. Currently it is a static site with some custom javascript (hosted on Hostinger). I should do a PHP server for the new blog posts and RSS feed eventually, but for now this is fine. I suggest for those interested in the same get the Jon Duckett books (HTML/Javascript/PHP) for overview of the tech, and then check out Dani Kross’s youtube tutorials (for random things like editing the htaccess file).

I am not doing a newsletter for the blog-posts, as I am concerned it will get my email on random block lists. But if there is demand for it in the future I will figure out some other service I guess to do that.

I wanted a more bare-metal setup (not a hosted wordpress like this site), as in the future I will likely do demo’s of dashboards, host some pyscript, make a sign in for paid content, etc. I just wanted flexibility from the start. So stay tuned for more content from CRIME De-Coder!

Getting access to paywalled newspaper and journal articles

So recently several individuals have asked about obtaining articles they do not have access to that I cite in my blog posts. (Here or on the American Society of Evidence Based Policing.) This is perfectly fine, but I want to share a few tricks I have learned on accessing paywalled newspaper articles and journal articles over the years.

I currently only pay for a physical Sunday newspaper for the Raleigh News & Observer (and get the online content for free because of that). Besides that I have never paid for a newspaper article or a journal article.

Newspaper paywalls

Two techniques for dealing with newspaper paywalls. 1) Some newspapers you get a free number of articles per month. To skirt this, you can open up the article in a private/incognito window on your preferred browser (or open up the article in another browser entirely, e.g. you use Chrome most of the time, but have Firefox just for this on occasion.)

If that does not work, and you have the exact address, you can check the WayBack machine. For example, here is a search for a WaPo article I linked to in last post. This works for very recent articles, so if you can stand being a few days behind, it is often listed on the WayBack machine.

Journal paywalls

Single piece of advice here, use Google Scholar. Here for example is searching for the first Braga POP Criminology article in the last post. Google scholar will tell you if a free pre or post-print URL exists somewhere. See the PDF link on the right here. (You can click around to “All 8 Versions” below the article as well, and that will sometimes lead to other open links as well.)

Quite a few papers have PDFs available, and don’t worry if it is a pre-print, they rarely substance when going into print.1

For my personal papers, I have a google spreadsheet that lists all of the pre-print URLs (as well as the replication materials for those publications).

If those do not work, you can see if your local library has access to the journal, but that is not as likely. And I still have a Uni affiliation that I can use for this (the library and getting some software cheap are the main benefits!). But if you are at that point and need access to a paper I cite, feel free to email and ask for a copy (it is not that much work).

Most academics are happy to know you want to read their work, and so it is nice to be asked to forward a copy of their paper. So feel free to email other academics as well to ask for copies (and slip in a note for them to post their post-prints to let more people have access).

The Criminal Justician and ASEBP

If you like my blog topics, please consider joining the American Society of Evidence Based Policing. To be clear I do not get paid for referrals, I just think it is a worthwhile organization doing good work. I have started a blog series (that you need a membership for to read), and post once a month. The current articles I have written are:

So if you want to read more of my work on criminal justice topics, please join the ASEBP. And it is of course a good networking resource and training center you should be interested in as well.


  1. You can also sign up for email alerts on Google Scholar for papers if you find yourself reading a particular author quite often.↩︎

Scorpion was probably not doing hot spots policing

So the Wall Street Journal had a recent article describing how crackdowns in hot spots of crime may not be the best policing tactic, Tyre Nichols Case Prompts Questions About Police Tactics in Crime Hot Spots. This is actually an OK article, but to be clear “hot spots” policing isn’t really defined by police tactics, hot spots are just a method to identify small areas with the most crime in a city. Identifying the hot spots does not explicitly determine the policing (or non-policing) tactic that one should use to reduce crime in that area. The Washington Post had a recent article in a similar vein critiquing the work of Tamara Herold in Breonna Taylor’s death. The WaPo article even prompted a response by a group of well known criminologists how it was inappropriate to blame Herold’s strategy.

So hotspots have always had a mix of different policing tactics that go with it, the most common strategies I would say are problem oriented policing (Braga et al., 1999), increased street or traffic stops (MacDonald et al., 2016; Sherman & Rogan, 1995), or simply patrolling/hanging out in the area (Groff et al., 2015; Koper, 1995). The WSJ article talks about Joel Caplan’s RTM group (which I think do good work), and they are really just doing a version of problem oriented policing. (POP has always had a component of working in tandem with the community and different public/private sector agencies.)

One of the reasons I wanted to write about this post though, is that often in my career I see a disconnect in purportedly hot spots policing (or similar tactics, such as DDACTS) on paper and what is actually happening on the ground. So using the Memphis Open Crime Data, I identified the top 100 street segments in terms of violent crime (code on github to replicate). As I suspected, the place where Nichols was pulled over is not a hot spot of crime, making the connection between the Scorpion units behavior and hot spots policing tactics a bit suspect.

If the embedded google map does now work, here is a screen shot to show how none of the top 100 street midpoints are around the location of where Nichol’s was initially stopped:

It happens to be the case that officers often have misperceptions of where hot spots are (Macbeth & Ariel, 2019; Ratcliffe & McCullagh, 2001). And that if left to no oversight, there tends to be a mismatch between where police proactivity is occurring and where the most serious crime is spatially concentrated (Wheeler et al., 2018). That is why a system to feed back information to officers for whether they are making high quality stops is so important (Worden et al., 2018).

To be clear, this is not me making excuses for researchers or crime analysts to not know what is actually occurring in their jurisdictions, and to potentially ignore the secondary harms that can come with intensive policing. But in my experience, taking the time to do hot spots policing right, which at its most basic is actually identifying hot spots using data, is a good sign that police departments take seriously the tactics they use and to seriously think about mitigating some of these secondary harms. Hot spots policing does not intrinsically result in unequal outcomes, which can be done via tactics that mitigate harm (such as problem oriented policing), or constructing a hot spots policy that promotes racial equity in outcomes from the start (Wheeler, 2020).

References

  • Braga, A.A., Weisburd, D.L., Waring, E.J., Mazerolle, L.G., Spelman, W., & Gajewski, F. (1999). Problem‐oriented policing in violent crime places: A randomized controlled experiment. Criminology, 37(3), 541-580.
  • Groff, E. R., Ratcliffe, J. H., Haberman, C. P., Sorg, E. T., Joyce, N. M., & Taylor, R. B. (2015). Does what police do at hot spots matter? The Philadelphia policing tactics experiment. Criminology, 53(1), 23-53.
  • Koper, C.S. (1995). Just enough police presence: Reducing crime and disorderly behavior by optimizing patrol time in crime hot spots. Justice Quarterly, 12(4), 649-672.
  • Macbeth, E., & Ariel, B. (2019). Place-based statistical versus clinical predictions of crime hot spots and harm locations in Northern Ireland. Justice Quarterly, 36(1), 93-126.
  • MacDonald, J., Fagan, J., & Geller, A. (2016). The effects of local police surges on crime and arrests in New York City. PLoS one, 11(6), e0157223.
  • Ratcliffe, J.H., & McCullagh, M.J. (2001). Chasing ghosts? Police perception of high crime areas. British Journal of Criminology, 41(2), 330-341.
  • Sherman, L.W., & Rogan, D.P. (1995). Effects of gun seizures on gun violence:“Hot spots” patrol in Kansas City. Justice Quarterly, 12(4), 673-693.
  • Wheeler, A.P. (2020). Allocating police resources while limiting racial inequality. Justice Quarterly, 37(5), 842-868.
  • Wheeler, A. P., Steenbeek, W., & Andresen, M. A. (2018). Testing for similarity in area‐based spatial patterns: Alternative methods to Andresen’s spatial point pattern test. Transactions in GIS, 22(3), 760-774.
  • Worden, R.E., McLean, S.J., Wheeler, A.P., Reynolds, D.L., Dole, C., Cochran, H. Smart Stops: An Inquiry into Proactive Policing. Summary Report to the National Institute of Justice, Award No. 2013-MU-CX-0012.