ASEBP blog posts, and auto screenshotting websites

I wanted to give an update here on the Criminal Justician series of blogs I have posted on the American Society of Evidence Based Policing (ASEBP) website. These include:

  • Denver’s STAR Program and Disorder Crime Reductions
    • Assessing whether Denver’s STAR alternative mental health responders can be expected to decrease a large number of low-level disorder crimes.
  • Violent crime interventions that are worth it
    • Two well-vetted methods – hot spots policing and focused deterrence – are worth the cost for police to implement to reduce violent crime.
  • Evidence Based Oversight on Police Use of Force
    • Collecting data in conjunction with clear administrative policies has strong evidence it overall reduces officer use of force.
  • We don’t know what causes widespread crime trends
    • While we can identify whether crime is rising or falling, retrospectively identifying what caused those ups and downs is much more difficult.
  • I think scoop and run is a good idea
    • Keeping your options open is typically better than restricting them. Police should have the option to take gun shot wound victims directly to the emergency room when appropriate.
  • One (well done) intervention is likely better than many
    • Piling on multiple interventions at once makes it impossible to tell if a single component is working, and is likely to have diminishing returns.

Going forward I will do a snippet on here, and refer folks to the ASEBP website. You need to sign up to be able to read that content – but it is an organization that is worth joining (besides for just reading my takes on science around policing topics).


So my CRIME De-Coder LLC has a focus on the merger of data science and policing. But I have a bit of wider potential application. Besides statistical analysis in different subject areas, one application I think will be of wider interest to public and private sector agencies is my experience in process automation. These often look like boring things – automating generating a report, sending an email, updating a dashboard, etc. But they can take substantial human labor, and automating also has the added benefit of making a process more robust.

As an example, I needed to submit my website as a PDF file to obtain a copyright. To do this, you need to take screenshots of your website and all its subsequent pages. Googling on this for selenium and python, the majority of the current solutions are out of date (due to changes in the Chrome driver in selenium over time). So here is the solution I scripted up the morning I wanted to submit the copyright – it took about 2 hours total in debugging. Note that this produces real screenshots of the website, not the print to pdf (which looks different).

It is short enough for me to just post the entire script here in a blog post:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
from PIL import Image
import os

home = 'https://crimede-coder.com/'

url_list = [home,
            home + 'about',
            home + 'blog',
            home + 'contact',
            home + 'services/ProgramAnalysis',
            home + 'services/PredictiveAnalytics',
            home + 'services/ProcessAutomation',
            home + 'services/WorkloadAnalysis',
            home + 'services/CrimeAnalysisTraining',
            home + 'services/CivilLitigation',
            home + 'blogposts/2023/ServicesComparisons']

res_png = []

def save_screenshot(driver, url, path, width):
    driver.get(url)
    # Ref: https://stackoverflow.com/a/52572919/
    original_size = driver.get_window_size()
    #required_width = driver.execute_script('return document.body.parentNode.scrollWidth')
    required_width = width
    required_height = driver.execute_script('return document.body.parentNode.scrollHeight')
    driver.set_window_size(required_width,required_height)
    #driver.save_screenshot(path)  # has scrollbar
    driver.find_element(By.TAG_NAME, 'body').screenshot(path)  # avoids scrollbar
    driver.set_window_size(original_size['width'], original_size['height'])

options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)

for url in url_list:
    driver.get(url)
    if url == home:
        name = "index.png"
    else:
        res_url = url.replace(home,"").replace("/","_")
        name = res_url + ".png"
    time.sleep(1)
    res_png.append(name)
    save_screenshot(driver,url,name,width=1400)

driver.quit()

# Now appending to PDF file
images = [Image.open(f).convert('RGB') for f in res_png if f[-3:] == 'png']
i1 = images.pop(0)
i1.save(r'Website.pdf', save_all=True, append_images=images)

# Now removing old PNG files
for f in res_png:
    os.remove(f)

One of the reasons I want to expand knowledge of coding practices into policing (as well as other public sector fields) is that this simple of a thing doesn’t make sense for me to package up and try to monetize. The IP involved in a 2 hour script is not worth that much. I realize most police departments won’t be able to take the code above and actually use it – it is better for your agency to simply do a small contract with me to help you automate the boring stuff.

I believe this is in large part a better path forward for many public sector agencies, as opposed to buying very expensive Software-as-a-Service solutions. It is better to have a consultant to provide a custom solution for your specific agency, than to spend money on some big tool and hope your specific problems fit their mold.

Creating automated reports using python and Jupyter notebooks

Deborah Osborne had a chat with Chris Bruce the other day about general crime analysis, and they discussed the regular reading of reports. I did this as well when I worked at Troy New York as the lone analyst. I would come in and skim the ~50 reports for the prior day.

The chief and mayor wanted a breakdown of particular noteworthy events, so I would place my own notes in a spreadsheet and then make a daily report. My set up was not fully automated but close – I had a pretty detailed HTML template, and once my daily data was inputted, I would run a SPSS script to fill in the HTML. I also did a simple pin map in batch geo (one thing that was not automated about it) and inserted into the report.

I had two other quite regular reports I would work on. One was a weekly command staff report about overall trends and recent upticks, the other was a monthly Compstat meeting going over similar stats. (I also had various other products to release – detective assignments/workload, sending aggregate stats to the Albany Crime Analysis Center.)

If I had to do these again knowing what I know now, I would automate nearly 100% of this work in python. For the reports, I would use jupyter notebooks (I actually do not like coding in these very much, I much prefer plain text IDEs, but they are good for generating nice looking reports I will show.)

Making Reports in Jupyter Notebooks

I have provided the notes to fully automate a simple report here on Github. To replicate, first you need to download the Dallas PD open data and create a local sqlite database (can’t upload that large of file to github).

So first before you start, if you download the .py files, you can run at the command prompt something like:

cd D:\Dropbox\Dropbox\PublicCode_Git\Blog_Code\Python\jupyter_reports
python 00_CreateDB.py

Just replace the cd path to wherever you saved the files on your local machine (and this assumes you have Anaconda installed). Then once that is done, you can replicate the report locally, but it is really meant as a pedogological tool – you can see how I wrote the jupyter notebook to query the local database. For your own case you would write the SQL code and connection to whereever your local crime data is store.

Here is an example of how you can use string substitution in python to create a query that is date aware for when the code is run:

Part of the hardest part of making standardized reports is you can typically make the data formatted how you want, but to get little pieces looking exactly how you want them is hard. So here is a default pivot table exported in a Jupyter notebook for some year to date statistics (note the limitations of this, why I prefer graphs and do not show a percent change).

And here is code I wrote to change font sizes and insert a title. A bit of work, but once you figure it out once you are golden.

You can go look at the notebook itself, but I also have an example of generating a weekly error bar chart (much preferred over the Compstat YTD tables):

Final note, to compile the notebook without showing any code (the police chief does not want to see your python code!), it looks like this from the command line.

jupyter nbconvert --execute example_report.ipynb --no-input --to html

I have further notes in the github page on automating this fully via bat files for windows, renaming files to make them update to the current date, etc.

Why Automate?

I know some analysts are reading this and thinking to themselves – I can generate these reports in 30 minutes using Excel and Powerpoint – why spend time to learn something new to make it 100% automated? There are a few reasons. One is pure time savings in the end. Say for the weekly report you spend 1 hour, and it takes you three work days (24 hours) to fully automate. You will recover your time in 24 weeks.

Time savings is not the only component though. Fully automating means the workflow is 100% reproducible, and makes it easier to transfer that workflow to other analysts in the future. This is an important consideration when scaling – if you need to spend a few hours once a week forever, you can only take on generating so many reports. It is better for your time to do a one time large sink into automating something, than to loan out a portion of your time forever.

A final part is the skills you develop when generating automated reports are more similar to data science roles in the private sector – so consider it an investment in your career as well. The types of machine learning pipelines I create at my current role would not be possible if I could not fully automate. I would only be able to do 2 or maybe 3 projects forever and just maintain them. I fully automate my data pipelines though, and then can hand off that job to a DevOps engineer, and only worry about fixing things when they break. (Or a more junior data scientist can take over that project entirely.)