LinkedIn is the best social media site

The end goals I want for a social media site are:

  • promote my work
  • see other peoples work

Social media for other people may have other uses. I do comment and have minor interactions on the social media sites, but I do not use them primarily for that. So my context is more business oriented (I do not have Facebook, and have not considered it). I participate some on Reddit as well, but that is pretty sparingly.

LinkedIn is the best for both relative to X and BlueSky currently. So I encourage folks with my same interests to migrate to LinkedIn.

LinkedIn

So I started Crime De-Coder around 2 years ago. I first created a website, and then second started a LinkedIn page.

When I first created the business page, I invited most of my criminal justice contacts to follow the page. I had maybe 500 followers just based on that first wave of invites. At first I posted once or twice a week, and it was very steady growth, and grew to over 1500 followers in maybe just a month or two.

Now, LinkedIn has a reputation for more spammy lifecoach self promotion (for lack of a better description). I intentionally try to post somewhat technical material, but keep it brief and understandable. It is mostly things I am working on that I think will be of interest to crime analysts or the general academic community. Here is one of my recent posts on structured outputs:

Current follower count on LinkedIn for my business page (which in retrospect may have been a mistake, I think they promote business pages less than personal pages), is 3230, and I have fairly consistent growth of a few new followers per day.

I first started posting once a week, and with additional growth expanded to once every other day and at one point once a day. I have cut back recently (mostly just due to time). I did get more engagement, around 1000+ views per day when I was posting every day.

Probably the most important part though of advertising Crime De-Coder is the types of views I am getting. My followers are not just academic colleagues I was previously friends with, it is a decent outside my first degree network of police officers and other non-profit related folks. I have landed several contracts where I know those individuals reached out to me based on my LinkedIn posting. It could be higher, as my personal Crime De-Coder website ranks very poorly on Bing search, but my LinkedIn posts come up fairly high.

When I was first on Twitter I did have a few academic collaborations that I am not sure would have happened without it (a paper with Manne Gerell, and a paper with Gio Circo, although I had met Gio in real life before that). I do not remember getting any actual consulting work though.

I mentioned it is not only better for me for advertising my work, but also consuming other material. I did a quick experiment, just opened the home page and scrolled the first 3 non-advertisement posts on LinkedIn, X, and BlueSky. For LinkedIn

This is likely a person I do not want anything to do with, but their comment I agree with. Whenever I use Service Now at my day job I want to rage quit (just send a Teams chat or email and be done with it, LLMs can do smarter routing anymore). The next two are people are I am directly connected with. Some snark by Nick Selby (which I can understand the sentiment, albeit disagree with, I will not bother to comment though). And something posted by Mindy Duong I likely would be interested in:

Then another advert, and then a post by Chief Patterson of Raleigh, whom I am not directly connected with, but was liked by Tamara Herold and Jamie Vaske (whom I am connected with).

So annoying for the adverts, but the suggested (which the feeds are weird now, they are not chronological) are not bad. I would prefer if LinkedIn had a “general” and “my friends” sections, but overall I am happier with the content I see on LinkedIn than I am the other sites.

X & BlueSky

I first created a personal then Twitter account in 2018. Nadine Connell suggested it, and it was nice then. When I first joined I think it was Cory Haberman tweeted and said to follow my work, and I had a few hundred followers that first day. Then over the next two years, just posting blog posts and papers for the most part, I grew to over 1500 followers IIRC. I also consumed quite a bit of content from criminal justice colleagues. It was much more academic focused, but it was a very good source of recent research, CJ relevant news and content.

I then eventually deleted the Twitter account, due to a colleague being upset I liked a tweet. To be clear, the colleague was upset but it wasn’t a very big deal, I just did not want to deal with it.

I started a Crime De-Coder X account last year. I made an account to watch the Trump interview, and just decided to roll with it. I tried really hard to make X work – I posted daily, the same stuff I had been sharing on LinkedIn, just shorter form. After 4 months, I have 139 followers (again, when I joined Twitter in 2018 I had more than that on day 1). And some of those followers are porn accounts or bots. Majority of my posts get <=1 like and 0 reposts. It just hasn’t resulted in getting my work out there the same way in 2018 or on LinkedIn now.

So in terms of sharing work, the more recent X has been a bust. In terms of viewing other work, my X feed is dominated by short form video content (a mimic of TikTok) I don’t really care about. This is after extensively blocking/muting/saying I don’t like a lot of content. I promise I tried really hard to make X work.

So when I open up the Twitter home feed, it is two videos by Musk:

Then a thread by Per-Olof (whom I follow), and then another short video Death App joke:

So I thought this was satire, but clicking that fellows posts I think he may actually be involved in promoting that app. I don’t know, but I don’t want any part of it.

BlueSky I have not been on as long, but given how easy it was to get started on Twitter and X, I am not going to worry about posting so much. I have 43 followers, and posts similar to X have basically been zero interaction for the most part. The content feed is different than X, but is still not something I care that much about.

We have Jeff Asher and his football takes:

I am connected with Jeff on LinkedIn, in which he only posts his technical material. So if you want to hear Jeff’s takes on football and UT-Austin stuff then go ahead and follow him on BlueSky. Then we have a promotional post by a psychologist (this person I likely would be interested in following his work, this particular post though is not very interesting). And a not funny Onion like post?

Then Gavin Hales, whom I follow, and typically shares good content. And another post I leave with no comment.

My BlueSky feed is mostly dominated by folks in the UK currently. It could be good, but it currently just does not have the uptake to make it worth it like I had with Twitter in 2018. It may be the case given my different goals, to advertise my consulting business, Twitter in 2018 would not be good either though.

So for folks who subscribe to this blog, I highly suggest to give LinkedIn a try for your social media consumption and sharing.

Aoristic analysis, ebooks vs paperback, website footer design, and social media

For a few minor updates, I have created a new Twitter/X account to advertise Crime De-Coder. I do not know if there is some setting that people ignore all unverified accounts, but would appreciate the follow and reshare if you are still on the platform.

I also have an account on LinkedIn, and sometimes comment on the Crime Analysis Reddit.

I try to share cool data visualizations and technical posts. I know LinkedIn in particular can be quite vapid self-help guru type advice, which I avoid. I know being more technical limits the audience but that is ok. So appreciate the follow if you are on those platforms and resharing the work.

Ebooks vs Paperbacks

Part of the reason to start X account back up is to just try more advertising. I have sold not quite 80 to date (including pre-sales). My baseline goal was 100.

For the not pre-sales, I have sold 35% ebooks and 65% paperbacks. So spending some time to distribute your book paperback seems to me to be worth it.

Again feel like most academics who publish technical books self-publishing is a very good idea. So read the above linked post about some of the logistics of self-publishing.

Aoristic analysis in python

On the CRIME De-Coder blog, check out my post on Aoristic analysis. It has links to python code on github for those who just want the end result. It has several methods though to do hour of day and hour by day of week breakdowns. With the ability to do it by categories in data. And post hoc generate a few graphs. I like the line graphs the best:

But the more common heatmap I can understand why people like it

Website Design

I have a few minor website design updates. The homepage is more svelte. Wife suggested that it should be easier to see what I do right when you are on homepage, so put the jumbotron box at the bottom and the services tiles (with no pictures) at the top.

It does not look bad on mobile either (I only recently figured out that in Chrome’s DevTools they have a button to do turn on mobile view, very helpful!)

Final part is that I made a footer for my pages:

I am not real happy with this. One of the things you notice when you start doing web-design is everyone’s web-page looks the same. There are some basic templates for WordPress or Wix (and probably other CMS generators). Here is Supabases’s footer for example:

And now that I have shown you, you will see quite a few websites have that design. So I did the svg links to social media, but I may change that. (And go with no footer again, there is not a real obvious need for it.) So open to suggestions!

In intentionally made many of the decisions for the way the Crime De-Coder site looks not only to make it usable but to make it at least somewhat different. Avoid super long scrolls, sticky header (that still works quite well on phones). The header is quite dense with many sub-pages (I like it though).

I think alot of public sector agencies that are doing data dashboards now do not look very nice. Many are just iframed Tableau dashboards. If you want help with those data visualizations embedded in a more organic way in your site, that is something Crime De-Coder can help with.

LinkedIn posting and link promotion: impression vs reality

For folks who are interested in following my work, my advice is either email or RSS. This site you should see ‘follow blog via email’ and the RSS link on the right hand side. I sometimes post a note here on crimede-coder stuff, but not always, so just do the same (RSS, or use if-this-than-that service to turn RSS into email) on that site if you want to keep abreast of all my posts.

Another way to follow my work though is on LinkedIn. So feel free to connect with me or follow my content:

I post short form blogs/reactions on occasion (plus share my other posts/work). Social media promoting your work is often cringy, but I try to post informative and technical content (and not totally vapid self-help stuff). And I write things for people to view them, so I think it is important to promote my work.

One of the most recent things I have heard a few influencers mention how embedding links directly in LinkedIn posts they think de-promotes their work. See this discussion on HackerNews, or this person’s advice for two examples.

I formed a few opinions based on my regular postings over the past year+, but impressions of things over extended periods can often be wrong. So I actually downloaded the data to see! In terms of the thing about links and being de-promoted, I don’t see that in my data at all – this is a table of impressions broken down by the domain I linked to (for domains with at least 2+ posts over the prior year):

I did notice however two different domains – youtube and newsobserver (the Raleigh newspaper) tend to not have much engagement. So it may be certain domains are not as promoted. It is of course possible that particular content was not popular (I thought my crim observations on the Mark Rober glitterbombs would be more popular, but maybe not). But I think this is a large enough sample to at least give a good hint that they are not promoted in the same way my other links are. My no URL posts have slightly less engagement than my posts to this blog or the crimede-coder site, so overall the idea that links are penalized doesn’t appear to me to be true without more conditional statements.

Data is important, as again I think impressions can be bad for things that repeatedly happen over a long period of time. So offhand I though Tue/Thu I had less engagement, so stopped posting on those days. What does the data say?

| Day | Avg Impressions | Number Posts |
----------------------------------------
| Sun |         1,860   |         32   |
| Mon |         1,370   |         44   |
| Tue |         1,220   |         35   |
| Wed |         1,273   |         41   |
| Thu |         1,170   |         34   |
| Fri |         1,039   |         39   |
| Sat |         1,602   |         38   |

The data says Sun/Sat have higher impressions, and days of the week are lower. If anything Friday is the low day, not Tuesday/Thursday.

I have had other examples of practitioners argue with me in crime analysis or academic circles in my career that strike me as similar. In that perceptions (that people strongly believed in), did not align with actual data. So I just don’t think this idea of ‘taking the average of impressions over posts over the past year’ is something that you can really know just based on passive observation. Your perceptions are likely to be dominated by a few examples, which may be off the mark. Ditto for knowing how much crime happens at a particular location, or knowing how much different things impact survival rates for gunshots.

It is definately possible that my small page experience (currently at a few over 2700 followers on LinkedIn) is not the same as the large influencers. But without looking at actual data, I don’t trust peoples instincts on aggregate metrics all that much.

Another meta LinkedIn tip (I received from Rob Fornango) is to post tall images, so when people are scrolling your content stays on the screen longer. Here is an example post from Rob’s

It is hard for me to test this though, the links on LinkedIn sometimes expand the link to bigger images and sometimes not (and sometimes I edit the image it displays as well). And I think after a while they turn them into tiny images as well. Someone tell the folks on LinkedIn to allow us to use markdown!

So I mean I could spend a full time job tinkering, but looking at the data I have at hand I don’t plan on changing much. Just posting links to my work, and having an occasional comment as well if I think it will be of interest to more people than myself. Content over micro optimization that is (since the algorithm could change tomorrow anyway).

One of the things I have debated on is buying adverts to promote my python book. I think they are just on the cusp of a net loss though given clickthrough rates and margins on my book. So for example, LinkedIn estimates if I spend $140 to promote a post, I will get 23-99 clicks. My buy rate on the site is around 5%, so that would generate 1-5 book sales. My margins are not that high on a sale, so I would not make money on that.

I have been wondering if I posted direct adverts on Reddit for the book to the learn python forum how that would go. But I think it would be much of the same as LinkedIn (too low of clickthrough to make it worth it). But if I do those tests in the future will write up a blog post on my experience!


LinkedIn I can only find how to download my stats on the company crimede-coder page, not my personal page. Here is the script I used to convert the LinkedIn short urls back to the original domains I linked, plus the analysis:

'''
python Code to parse the domains from my
crimede-coder linkedin posts
run on 7/24/2024, so only has posts
from that date through the prior year
'''

import requests
import traceback
import pandas as pd
import time
from urllib.parse import urlparse

errors = {}

def get_link(url):
    time.sleep(2)
    try:
        res = requests.get(url)
    except Exception:
        er = traceback.format_exc()
        print(f'Error message is \n\n{er}')
        return ''
    if res.ok:
        it = res.text.split()
        it = [i for i in it if i[:4] == 'href']
        rl = it[3]
    else:
        print(f'Not ok, {url}, response: {r2.reason}')
        errors[url] = res
        return ''
    return rl[6:].replace('/">','').replace('">','')

# more often than not, linkedin converts the link in the post
# to a lnkd.in short url
def get_refer(txt):
    rs = txt.split()
    rs = [i for i in rs if i[:8] == 'https://']
    if rs:
        url = rs[0]  # if more than one link, only grabs the first
        if url[:15] == 'https://lnkd.in':
            return get_link(url)
        else:
            return url
    else:
        return ''


# this is data exported from LinkedIn on my Crime De-Coder page only goes back one year
df = pd.read_excel('crime-de-coder_content_1721834275879.xls',sheet_name='All posts',header=1)

# only need to keep a few columns
keep_cols = ['Post title','Post link','Created date','Impressions','Clicks','Likes','Comments','Reposts']
df = df[keep_cols].copy()

df['url'] = df['Post title'].apply(get_refer)

def domain(url):
    if url == '':
        return 'NO URL'
    else:
        pu = urlparse(url)
        return pu.netloc

df['domain'] = df['url'].apply(domain)

# caching out file, so do not need to reget url info
df.to_csv('ParseInfo.csv',index=False)

# Can aggregate to domain
agg_stats = df.groupby('domain',as_index=False)['Impressions'].describe()
agg_stats.sort_values(by=['count','mean'],ascending=False,ignore_index=True,inplace=True)
count_cols = list(agg_stats)[1:]
agg_stats[count_cols] = agg_stats[count_cols].fillna(0).astype(int)

# This is a nice way to print/view the results in terminal
print('\n\n' + agg_stats.head(22).to_markdown() + '\n\n')

Soft launching tech recruiting

I am soft-launching a tech recruiting service. I have had conversations with people on all sides of the equation on a regular basis, so I might as well make it a formal thing I do.

If you are an agency looking to fill a role, get in touch. If you are looking for a role, get in touch at https://crimede-coder.com/contact or send an email directly to andrew.wheeler@crimede-coder.com.

Why am I doing this?

I have a discussion with either a friend or second-degree friend about once a month who are current professors who ask me about making the jump to private sector. You can read my post, Make More Money, how I think many criminal justice professors are grossly underpaid. For PhD students you can see my advice at Flipping a CJ PhD to an Alt-Academic career.

If you are a current student or professor and want to chat, reach out and let me know you are interested. I am just going to start keeping a list of folks to help match them to current opportunities.

I have discussions with people who are trying to hire for jobs regularly as well. This includes police departments that are upping their game to hire more advanced roles, think tanks who want to hire early career individuals, and some tech companies in the CJ space who need to fill data science roles.

These are good jobs, and we have good people, so why are these agencies and businesses having a hard time filling these roles? Part of it is advertisement – these agencies don’t do a good job of getting the word out to the right audience. A second part is people have way off-base salary expectations (this is more common for academic positions, post docs I am looking at you). Part of the salary discussion is right sizing the role and expectations – you can’t ask for 10+ years experience and have a 90k salary for someone with an advanced degree – doesn’t really matter what job title you are hiring for.

I can help with both of those obviously – domain knowledge and my network can help your agency right size and fill that role.

Finally, I get cold messaged by recruiters multiple times a month. The straw to finally put this all on paper is I routinely encounter gross incompetence from recruiters. They do not understand the role the business is hiring for, they do not have expertise to evaluate potential candidates, and by cold emailing they clearly do not have a good network to pull potential candidates from.

If you are an agency or company whom you think my network of scholars can help fill your roll get in touch. I only get paid when you fill the position, so it is no cost to try to use my recruiting services. Again will help go over the role with you and say whether it feasible to fill that position as is, or whether it should be tweaked.

Below is my more detailed advice for job seekers. Again reach out if you are a job seeker, even if we have not met we can chat, I will see you do good work, and I will put you on my list of potential applicants to pull from in the future.

Tech Job Applying Advice, P1 Tech Roles

Here is an in-depth piece of advice I gave a friend recently – I think this will be useful in general to individuals in the social sciences who are interested in making the jump to the private sector.

First is understanding what jobs are available. This blog has a focus on quantitative work, but even if you do qualitative work there are tech opportunities. Also some jobs only need basic quant skills (Business Analyst) that any PhD will have (if you know how to use Excel and PowerPoint you have the necessary tech skills to be a business analyst).

Job labels and responsibilities are fuzzy, but here is a rundown of different tech roles and some descriptions:

  • Data Scientist
    • Role, fitting models and automating processes (writing code to shift data around)
    • need to have more advanced coding/machine learning background, e.g. have examples in python/R/SQL and know machine learning concepts
  • Business Analyst
    • anyone with a PhD can do this, Excel/Powerpoint
    • domain knowledge is helpful (which can be learned)
  • Program Manager/Project Manager
    • Help manage teams, roles are similar to “managing grants”, “supervising students”
    • often overlap with various project management strategies (agile, scrum).
    • These names are all stupid though, it is just supervising and doing “non-tech” things to help teams
  • Product Owner
    • Leads longer term development of a product, e.g. we should build X & Y in the next 3-6 months
    • Mix of tech or non-tech background (typically grow into this role from other prior roles)
    • If no tech need strong domain knowledge
    • Sometimes need to “sell” product, internally or externally
  • Director
    • Leads larger team of data scientists/programmers
    • Discusses with C-level, budgets/hiring/revenue projections
    • Often internal from Data Scientist or less often Product Owner/Business Analyst
    • but is possible to be direct into role with good domain knowledge

Salaries vary, it will generally be:

Business Analyst < Project Manager < {Data Scientist,Product Owner} < Director

But not always – tech highly values writing code, so it is not crazy for a supervisory role (Director) to make less than a senior Data Scientist.

Within Business Analyst you can have Junior/Senior (JR/SR) roles (for PhDs you should come in as Senior). Data scientist can have JR/SR/{Lead,Principal} (PhD should come in as Senior). JR needs supervision, SR can be by themselves and be OK, Lead is expected to mentor and supervise JRs.

Very generic salary ranges for typical cities (you should not take a job lower than the low end on these, with enough work you can find jobs higher, but will be hard in most markets):

  • Business Analyst: 70k – 120k
  • JR Data Scientist: 100k – 130k
  • SR Data Scientist: 130k – 180k
  • Program Manager: 100k – 150k
  • Product Owner: 120k – 160k
  • Director: 150k – 250k

Note I am not going to go and update this post (so this is September 2023), just follow up with me or do your own research to figure out typical salary ranges when this gets out of date in a year from now.

So now that you are somewhat familiar with roles, you need to find roles to apply to. There are two strategies; 1) find open roles online, 2) find specific companies. Big piece of advice here is YOU SHOULD BE APPLYING TO ROLES RIGHT NOW. Too many people think “I am not good enough”. YOU ARE GOOD ENOUGH TO APPLY TO 100s OF POSITIONS RIGHT NOW BASED ON YOUR PHD. Stop second guessing yourself and apply to jobs!

Tech Job Applying Advice, P2 Finding Positions

So one job strategy is to go to online job boards, such as LinkedIn, and apply for positions. For example, if I go search “Project Manager” in the Raleigh-Durham area, I get something like two dozen jobs pop-up. You may be (wrongly) thinking I don’t qualify for this job, buts lets look specifically at a job at NTT Data for a project manager, here are a few things they list:

  • Working collaboratively with product partners and chapter leaders to enable delivery of the squad’s mission through well-executed sprints
  • Accelerating overall squad performance, efficiency and value delivered by engaging within and across squads to find opportunities to improve agile maturity and metrics, and providing coaching, training and resources
  • Maintaining and updating squad performance metrics (e.g., burn-down charts) and artifacts to ensure accurate and clear feedback to the squad members and transparency to other partners
  • Managing, coordinating logistics for and participating in agile events (e.g., backlog prioritization, sprint planning, daily meetings, retrospectives and as appropriate, scrum of scrum masters)

This is all corporate gobbledygook for “managing a team to make sure people are doing their work on time” (and all the other bullet points are just more junk to say the same thing). You know who does that? Professors who supervise multiple students and manage grants.

For those with more quant programming skills, you have more potential opportunities (you can apply to data scientist jobs that require coding). But even if you do not have those skills, there are still plenty of opportunities.

Note that many of these jobs list “need to have” and “want to have”. You should still apply even if you do not meet all of the “need to have”. Very often these requirements will be made up and not actually “need to have” (it is common for job adverts to have obvious copy-paste mistakes or impossible need to haves). That NTT Data one has a “Certified Scrum Master (CSM) required” – if you see a bunch of jobs and that is what is getting you cut guess what? You can go and take a scrum master course in two days and check off that box. And have ChatGPT rewrite your cover letter asking it to sprinkle in agile buzzwords in the professor supervisory experience – people will never know that you just winged it when supervising students instead of using someone elses made up project management philosophy.

So I cannot say that your probability of landing any particular job is high, it may only be 1%. But unlike in academia, you can go on LinkedIn, and if you live in an urban area, likely find 100+ jobs that you could apply for right now (that pay more than a starting assistant professor in criminal justice).

So apply to many jobs, and most people I talk to with this strategy will be able to land something in 6-12 months. For resume/cover letter advice, here is my data science CV, and here is an example cover letter. For CV make it more focused on clear outcomes you have accomplished, instead of just papers say something like “won grant for 1 million dollars”, “supervised 5 students to Phd completion”, “did an RCT that reduced crime by 10%”. But you do not need to worry about making it only fit on 1 page (it can be multiple pages). Make it clear you have a PhD, people appreciate that, and people appreciate if you have a book published with a legit publisher as well (lay people find that more obviously impressive than peer reviewed publishing, because most people don’t know anything about peer reviewed publishing).

Do not bother tinkering to make different materials for every job (if the job requires cover letter, make generic and just swap out a few key words/company name). A cover letter will not make or break your job search, so don’t bother to customize it (I do not know how often they are even read).

Tech Job Applying Advice, P3 Finding Companies

The second strategy is to find companies you are interested in. Do you do work on drug abuse and victimization? There are probably healthcare companies you will be interested in. Do you do work tangentially related to fraud? Their are positions at banks who need machine learning skills. Are you interested in illegal markets? I bet various social media platforms need help with solutions to prevent selling illegal contraband.

This goes as well for think tanks (many cities have local think tanks that do good work, think beyond just RAND). These and civil service jobs (e.g. working for children and family services as an analyst) typically do not pay as high as private sector, but are still often substantially better than entry level assistant professor salaries (you can get think-tank or civil service gigs in the 80-120k range).

After you have found a company that you are interested in, you can go and look at open positions and apply to them (same as above). But an additional strategy at this point is to identify potential people you want to work with, and cold email/message them on social media.

It is similar to the above advice – many people will not answer your cold emails. It may be only 1/10 answer those emails. But an email is easy – there is no harm. Do not overthink it, send an email that is “Hey I think you do cool things, I do cool things too and would like to work together. Can we talk?” People will respond to something like that more often than you think. And if they don’t, it is their loss.

Here the biggest issue is a stigma associated with particular companies – people think Meta is some big evil company and they don’t want to work for them. And people think being an academic has some special significance/greater purpose.

If you go and build something for Meta that helps reduce illegal contraband selling by some miniscule fraction, you will have prevented a very large number of crimes. I build models that incrementally do a better job of identifying health care claims that are mis-billed. These models consistently generate millions of dollars of revenue for my company (and save several state Medicaid systems many millions more).

The world is a better place with me building stuff like that for the private sector. No doubt in my mind I have generated more value for society in the past 3 years than I would have in my entire career as an academic. These tech companies touch so many people, even small improvements can have big impacts.

Sorry to burst some academic bubbles, but that paper you are writing does not matter. It only matters to the extent you can get someone outside the ivory tower to alter their behavior in response to that paper. You can just cut out the academic middle man and work for companies that want to do that work of making the world a better place, instead of just writing about it. And make more money while you are it.

Youtube interview with Manny San Pedro on Crime Analysis and Data Science

I recently did an interview with Manny San Pedro on his YouTube channel, All About Analysis. We discuss various data science projects I conducted while either working as an analyst, or in a researcher/collaborator capacity with different police departments:

Here is an annotated breakdown of the discussion, as well as links to various resources I discuss in the interview. This is not a replacement for listening to the video, but is an easier set of notes to link to more material on what particular item I am discussing.

0:00 – 1:40, Intro

For rundown of my career, went to do PhD in Albany (08-15). During that time period I worked as a crime analyst at Troy, NY, as well as a research analyst for my advisor (Rob Worden) at the Finn Institute. My research focused on quant projects with police departments (predictive modeling and operations research). In 2019 went to the private sector, and now work as an end-to-end data scientist in the healthcare sector working with insurance claims.

You can check out my academic and my data science CV on my about page.

I discuss the workshop I did at the IACA conference in 2017 on temporal analysis in Excel.

Long story short, don’t use percent change, use other metrics and line graphs.

7:30 – 13:10, Patrol Beat Optimization

I have the paper and code available to replicate my work with Carrollton PD on patrol beat optimization with workload equality constraints.

For analysts looking to teach themselves linear programming, I suggest Hillier’s book. I also give examples on linear programming on this blog.

It is different than statistical analysis, but I believe has as much applicability to crime analysis as your more typical statistical analysis.

13:10 – 14:15, Million Dollar Hotspots

There are hotspots of crime that are so concentrated, the expected labor cost reduction in having officers assigned full time likely offsets the position. E.g. if you spend a million dollars in labor addressing crime at that location, and having a full time officer reduces crime by 20%, the return on investment for hotspots breaks even with paying the officers salary.

I call these Million dollar hotspots.

14:15 – 28:25, Prioritizing individuals in a group violence intervention

Here I discuss my work on social network algorithms to prioritize individuals to spread the message in a focussed deterrence intervention. This is opposite how many people view “spreading” in a network, I identify something good I want to spread, and seed the network in a way to optimize that spread:

I also have a primer on SNA, which discusses how crime analysts typically define nodes and edges using administrative data.

Listen to the interview as I discuss more general advice – in SNA it matters what you want to accomplish in the end as to how you would define the network. So I discuss how you may want to define edges via victimization to prevent retaliatory violence (I think that would make sense for violence interupptors to be proactive for example).

I also give an example of how detective case allocation may make sense to base on SNA – detectives have background with an individuals network (e.g. have a rapport with a family based on prior cases worked).

28:25 – 33:15, Be proactive as an analyst and learn to code

Here Manny asked the question of how do analysts prevent their role being turned into more administrative role (just get requests and run simple reports). I think the solution to this (not just in crime analysis, but also being an analyst in the private sector) is to be proactive. You shouldn’t wait for someone to ask you for specific information, you need to be defining your own role and conducting analysis on your own.

He also asked about crime analysis being under-used in policing. I think being stronger at computer coding opens up so many opportunities that learning python, R, SQL, is the area I would like to see stronger skills across the industry. And this is a good career investment as it translates to private sector roles.

33:15 – 37:00, How ChatGPT can be used by crime analysts

I discuss how ChatGPT may be used by crime analysis to summarize qualitative incident data and help inform . (Check out this example by Andreas Varotsis for an example.)

To be clear, I think this is possible, but the tech I don’t think is quite up to that standard yet. Also do not submit LEO sensitive data to OpenAI!

Also always feel free to reach out if you want to nerd out on similar crime analysis questions!

Text analysis, alt competition sites, and ASC

A bit of a potpourri blog post today. First, I am not much of a natural language processing wiz. But based on the work of Peter Baumgartner at RTI (assigning reduced form codes based on text descriptions), I was pointed out the simpletransformers library. It is very easy to download complicated NLP architectures (like RoBERTa with 100 million+ parameters) and retrain them to your idiosyncratic data.

Much of the issue working with text data is the cleaning, and with these extensive architectures they are not so necessary. See for example this blog post on classifying different toxic comments. Out of the box the multi-label classification gets an AUC score pretty damn close to the winning entry in the Kaggle contest this data was developed for. No text munging necessary.

Playing around on my personal machine I have been able to download and re-tune the pretrained RoBERTa model – doing that same model as the blog post (with just all the defaults for the model), it takes around 7 hours of my GPU.

The simpletransformers library has a ton of different pre-set architectures for different problems. But the ones I have played around with with labelled data (e.g. you have text data on the right hand side, and want to predict a binary or multinomial outcome), I have had decent success with.

Another text library I have played around with (although have not had as much success in production) is dirty_cat. This is for unsupervised modeling, which unfortunately is a harder task to evaluate what is successful than supervised learning.

Alt Competition Sites

I recently spent two days trying to work on a recent Kaggle competition, a follow up to the toxic comments one above. My solution is nowhere close to the current leaderboard though, and given the prize total (and I expect something like 5,000 participants), this just isn’t worth my time to work on it more.

Two recent government competitions I did compete in though, the NIJ recidivism, and the NICHD maternal morbidity. (I will release my code for the maternal morbidity when the competition is fully scored, it is a fuzzy one not a predictive best accuracy one.) Each of these competitions had under 50 teams participate, so it is much less competition than Kaggle. The CDC has a new one as well, for using a network based approach to violence and drug problems.

For some reason these competitions are not on the Challenge.gov website. Another site I wanted to share as well is DataDriven competitions. If I had found that sooner I might have given the floodwater competition a shot.

I have mixed feeling about the competitions, and they are risky. I probably spent for NIJ and NICHD what I would consider something like $10,000 to $20,000 of my personal time on the code solutions (for each individually). I knew NIJ would not have many submissions (I did not participate in the geographic forecasting, and saw some people win with silly strategies). If you submitted anything in the student category you would have won close to the same amount as my team did (as not all the slots were filled up). And the NICHD was quite onerous to do all the paperwork, so I figured would also be low turnout (and the prizes are quite good). So whether I think it is worth it for me to give a shot is guessing the total competition pool, level of effort to submit a good submission, and how the prizes are divvied up as well as the total dollar amount.

The CDC violence one is strangely low prizes, so I wouldn’t bother to submit unless I already had some project I was working on anyway. I think a better use of the Fed challenges would be to have easier pilot work, and based on the pilot work fund larger projects. So consider the initial challenge sort of equivalent to a grant proposal. This especially makes sense for generating fairness algorithms (not so much for who has the best hypertuned XGBoost model on a particular train/test dataset).

Missing ASC

The American Society of Criminology conference is going on now in Chicago. A colleague emailed the other day asking if I was coming, and I do feel some missing of meeting up with friends. The majority of presentations are quite bad (both for content and presentation style), so it is more of an excuse to have a beer with friends than anything.

I debated with my wife about taking a family vacation to Chicago during this conference earlier in the year. We decided against it for the looming covid – I correctly predicted it would still be quite prevalent (and I am guessing it will be indefinitely at this point given vaccine hesitancy and new variants). I incorrectly predicted though I wouldn’t be able to get a vaccine shot until October (so very impressed with the distribution on that front). Even my son has a shot (didn’t even try to guess when that would happen). So I am not sure if I made the correct choice in retrospect – the risk of contraction is as high as I guessed, but risk of adverse effects given we have the vaccines are very low.

CrimCon Roundtable: Flipping a Criminal Justice PhD to an alt-academic Data Science Career

This Thursday 11/19/2020 at 1 PM Eastern, I will be participating in a roundtable for the online CrimCon event. This is free for everyone to zoom in, and here is the link to the program, I am on Stream 3!

The title is above — I have been a private sector data scientist at HMS for not quite a year now. I wanted to organize a panel to help upcoming PhD’s in criminal justice get some more exposure to potential data science positions, outside the traditional tenure track. Here is the abstract:

Tenure-track positions in academia are becoming more challenging to obtain, and only a small portion of junior faculty continue in academia to the rank of full professor. Therefore, students may opt to explore alternate options to obtain employment after their PhD is finished. These alternatives to the tenure track are often called “alt-academic” jobs. This roundtable will be focused on discussing various opportunities that exist for PhD’s in criminal justice and behavioral sciences spanning the public sector, the private sector, and non-profits/think tanks. Panelists will also discuss gaps in the typical PhD curriculum, with the goal of aiding current students to identify steps they can take to make themselves more competitive for alt-academic roles.

And here are each of the panelists bios:

Dr. Andrew Wheeler is currently a Data Scientist at HMS working on problems related to predictive modeling and optimization in relation to health insurance claims. Before joining HMS, he received a PhD degree in Criminal Justice from SUNY Albany. While in academia his research focused on collaborating with police departments for various problems including; evaluating crime reduction initiatives, place based and person based predictive modeling, data analytics for crime analysis, and developing models for the efficient and fair delivery of police resources.

Dr. Jennifer Gonzalez is the Senior Director of Population Health at the Meadows Mental Health Policy Institute, where she manages the Institute’s research and data portfolio. She earned her doctoral degree in epidemiology and a M.S. degree in criminal justice. Before joining MMHPI, Dr. Gonzalez was a tenured associate professor at the University of Texas School of Public Health, where she maintained a portfolio of more than $10 million in research funding and published more than one hundred interdisciplinary articles focused on the health of those who come into contact with—and work within—the criminal justice system.

Dr. Kyleigh Clark-Moorman is a Senior Research Associate for the Public Safety Performance Project at The Pew Charitable Trusts, a non-profit public policy organization. Kyleigh began working at Pew in 2019 and completed her PhD in Criminology and Criminal Justice at the University of Massachusetts, Lowell in May 2020. As an early career researcher, Dr. Clark-Moorman’s work has been published in Criminal Justice and Behavior, Criminal Justice Studies, and the Journal of Criminal Justice. In her role at Pew, Kyleigh is responsible for research design and data analysis focused on various criminal justice topics while also working with external partners to produce high-impact reports and analyses to raise awareness and drive public policy.

Matt Vogel is Associate Professor in the School of Criminal Justice at the University at Albany, SUNY and the Director of the Laboratory for Decision Making in Criminology and Criminal Justice. Matt regularly assists local agencies with data and evaluation needs. Some of his ongoing collaborations include assessments of racial representation on capital juries in Missouri, a longitudinal evaluation of a school-based violence reduction program, and the implementation of a police-hospital collaboration to help address retaliatory violence in St. Louis. Prior to joining the faculty at UAlbany, Matt worked in the Department of Criminology and Criminal Justice at the University of Missouri – St. Louis and held a long-term visiting appointment with the Faculty of Architecture at TU Delft (the Netherlands).

If you have any upfront questions you would like addressed by the panel, always feel free to send me a pre-emptive email (or comment below).


Update: The final roundtable is now posted on Youtube. See below for the panels thoughts on pursuing non-tenure track jobs with your social science Phd.

A bunch of random shout outs

Busy, busy, busy! Hopefully I will have some time in the near future to write up some more data science posts. But for now, here is a small python snippet to help you build interaction variables between two sets of numpy arrays/dataframes.

import numpy as np
def np_int(a,b):
    rows = a.shape[0]
    cols = a.shape[1]*b.shape[1]
    return np.einsum('ij,ik->ijk', a, b).reshape((rows,cols))

This works for pytorch as well (just replace np.einsum with torch.einsum). So coming up (eventually) I will illustrate encoding interaction between hidden layers in a deep learning model. But for now some quicker updates.

Shout out #1: Scott Jacques has continued to push the charge for open access to criminology journals. He has two recent posts about post-prints, and how our main journal (Criminology) has an excessive policy of not allowing authors to post post prints for over two years (whereas the majority of criminology journals allow you to post immediately).

Several aspects of open science are tricky – posting pre-prints/post-prints is not. If we can come together as a group this is an easy, no cost way to greatly improve the accessibility of our work to the greater public.

Shout out #2: The folks at Police Rewired have hosted a hackathon intended to Hack Hate. It is too late to participate, but they will be displaying the results this Sunday. I have not had the chance to participate in any code hackathons, I will need to make a concerted effort in the future to give at least one a shot. (It seems hard, how can you do any work in only a day or a week or two!? But the proof is in the pudding so to speak, I’ve have seen some pretty cool things come out of various hackathons in the past.)

Shout out #3: My workplace, HMS, is involved in a data sharing collaborative called the Digital Health DRC. They also have a hackathon coming up, but this is related to Telehealth use. The Digital Health DRC is pretty cool though, it is basically a way for HMS (and several other private sector entities) to share various datasets with researchers over the globe.

The scope of HMS’s data is somewhat outside the realm of my old stomping grounds of criminology (but not entirely, a big part of my job is identifying potentially fraudulent patterns in claims data). But for folks who have a research question that could be answered using health insurance claims data, this is a good resource to look into. (HMS has pretty good coverage of Medicare claims across the US.)

Finally, I experimented a few days on the site with hosting ads. I managed to serve up a few thousand and make 10 cents. So I will turn that off for now. I debated on putting the button for folks to donate a coffee, but even that is not necessary. (I can afford the few bucks for the domain, and I use dropbox to back up my files anyway, so hosting extra materials is not a big deal.) I rather folks just take my nerdy notes and make your own cool stuff (and share them with me!) I may need to figure out a better hosting solution for images though — google photos is continuing to give me troubles I see (so if you see an image is not coming through feel free to let me know in the comments or send me an email).

Reasons Police Departments Should Consider Collaborating with Me

Much of my academic work involves collaborating and consulting with police departments on quantitative problems. Most of the work I’ve done so far is very ad-hoc, through either the network of other academics asking for help on some project or police departments cold contacting me directly.

In an effort to advertise a bit more clearly, I wrote a page that describes examples of prior work I have done in collaboration with police departments. That discusses what I have previously done, but doesn’t describe why a police department would bother to collaborate with me or hire me as a consultant. In fact, it probably makes more sense to contact me for things no one has previously done before (including myself).

So here is a more general way to think about (from a police departments or criminal justice agencies perspective) whether it would be beneficial to reach out to me.

Should I do X?

So no one is going to be against different evidence based policing practices, but not all strategies make sense for all jurisdictions. For example, while focussed deterrence has been successfully applied in many different cities, if you do not have much of a gang violence problem it probably does not make sense to apply that strategy in your jurisdiction. Implementing any particular strategy should take into consideration the cost as well as the potential benefits of the program.

Should I do X may involve more open ended questions. I’ve previously conducted in person training for crime analysts that goes over various evidence based practices. It also may involve something more specific, such as should I redistrict my police beats? Or I have a theft-from-vehicle problem, what strategies should I implement to reduce them?

I can suggest strategies to implement, or conduct cost-benefit analysis as to whether a specific program is worth it for your jurisdiction.

I want to do X, how do I do it?

This is actually the best scenario for me. It is much easier to design a program up front that allows a police department to evaluate its efficacy (such as designing a randomized trial and collecting key measures). I also enjoy tackling some of the nitty-gritty problems of implementing particular strategies more efficiently or developing predictive instruments.

So you want to do hotspots policing? What strategies do you want to do at the hotspots? How many hotspots do you want to target? Those are examples of where it would make sense to collaborate with me. Pretty much all police departments should be doing some type of hot spots policing strategy, but depending on your particular problems (and budget constraints), it will change how you do your hot spots. No budget doesn’t mean you can’t do anything — many strategies can be implemented by shifting your current resources around in particular ways, as opposed to paying for a special unit.

If you are a police department at this stage I can often help identify potential grant funding sources, such as the Smart Policing grants, that can be used to pay for particular elements of the strategy (that have a research component).

I’ve done X, should I continue to do it?

Have you done something innovative and want to see if it was effective? Or are you putting a bunch of money into some strategy and are skeptical it works? It is always preferable to design a study up front, but often you can conduct pretty effective post-hoc analysis using quasi-experimental methods to see if some crime reduction strategy works.

If I don’t think you can do a fair evaluation I will say so. For example I don’t think you can do a fair evaluation of chronic offender strategies that use officer intel with matching methods. In that case I would suggest how you can do an experiment going forward to evaluate the efficacy of the program.

Mutual Benefits of Academic-Practitioner Collaboration

Often I collaborate with police departments pro bono — which you may ask what is in it for me then? As an academic I get evaluated mostly by my research productivity, which involves writing peer reviewed papers and getting research grants. So money is not the main factor from my perspective. It is typically easier to write papers about innovative problems or programs. If it involves applying for a grant (on a project I am interested in) I will volunteer my services to help write the grant and design the study.

I could go through my career writing papers without collaborating with police departments. But my work with police departments is more meaningful. It is not zero-sum, I tend to get better ideas when understanding specific agencies problems.

So get in touch if you think I can help your agency!

The week at Stackexchange 5/21/2013 Edition

Posts I’ve found interesting during the week (or likely over longer periods!) at various forums I participate at.

CrossValidated

GIS

Academia

Others

SPSS Nabble Group

Hopefully I get more time to blog in the near future, but currently busy, busy, busy! Working on visualizing JTC flow data (presenting at ASC this fall), getting everyone to approve my prospectus, and I have a few more SPSS blog posts I have in mind (restricted cubic splines, visually weighted regression, and using Ripley’s K to analyze temporal crime sprees!)