The Trace the other day presented an article giving a bit of (superficial overall in the end) critique of CrimeSolutions.gov. They are right in that the particular scenario with the Bronx defenders office highlights the need for a change in the way content aggregators like CrimeSolutions presents overall recommendations. I have reviewed for CrimeSolutions, and I think they did a reasonable job in creating a standardized form, but will give my opinion here about how we can think about social programs like the Bronx defenders program beyond the typical null hypothesis significance testing – we need to think about overall costs and benefits of the programs. The stat testing almost always just focuses on the benefits part, not the cost part.
But first before I go into more details on CrimeSolutions, I want to address Thomas Abt’s comments about potential political interference in this process. This is pizzagate level conspiracy theory nonsense from Abt. So the folks reviewing for Crime Solutions are other professors like me (or I should more specifically say I was a former professor). I’d like to see the logic from Abt how Kate Bowers, a professor at University College London, is compromised by ties to Donald Trump or the Republican Party.
Us professors get a standardized form to fill in the blank on the study characteristics, so there is no reasonable way that the standardized form biases reviews towards any particular political agenda. They are reviewed by multiple people (e.g. if I disagree with another researcher, we have emails back and forth to hash out why we had different ratings). So it not only has to be individuals working for the man, but collusion among many of us researchers to be politically biased like Abt suggests.
The only potential way I can see any political influence in the process is if people at DSG selectively choose particular studies. (This would only make sense though to say promote more CJ oriented interventions over other social service type interventions). Since anyone can submit a study (even non US ones!) highly skeptical political bias happens in that aspect either. Pretty sure the DSG folks want people to submit more studies FYI.
FYI Abt’s book Bleeding Out is excellent, not sure why he is spouting this nonsense about politics in this case though. So to be clear claiming political bias in these reviews is total non-sense, but of course the current implementation of the CrimeSolutions final end recommendation could be improved. (I really like the Trace as well, have talked to them before over Gio’s/my work on shooting fatalities, this article however doesn’t have much meat to critique CrimeSolutions beyond some study authors are unhappy and Abt’s suggestion of nefarious intentions.)
How does CrimeSolutions work now?
At a high level, CrimeSolutions wants to be a repository for policy makers to help make simple decisions on different policy decisions – what I take as a totally reasonable goal. So last I knew, they had five different end results a study could fall into (I am probably violating some TOS here sharing this screenshot but whatever, we do alot of work filling in the info as a reviewer!) These include Effective, Promising, Ineffective, Null Effect, and Inconclusive.
You get weights based on not only the empirical evidence presented, but aspects of the design itself (e.g. experiments are given a higher weight than quasi-experiments), the outcomes examined (shorter time periods less weight than longer time periods), the sample size, etc. It also includes fuzzy things like description of the program (enough to replicate), and evidence presented of adherence to the program (which gets the most points for quantitative evidence, but has categories for qualitative evidence and no evidence of fidelity as well).
So Promising is basically some evidence that it works, but the study design is not the strongest. You only get null effect is the study design is strong and there were no positive effects found. Again I mean ‘no positive effects’ in the limited sense that there are crime end points specified, e.g. reduced recidivism, overall crime counts in an area, etc. (it is named CrimeSolutions). But there can of course be other non-crime beneficial aspects to the program (which is the main point of this blog post).
When I say at the beginning that the Trace article is a bit superficial, it doesn’t actually present any problems with the CrimeSolutions instrument beyond the face argument hey I think this recommendation should be different! If all you take is someone not happy with the end result we will forever be unhappy with CrimeSolutions. You can no doubt ex ante make arguments all day long why you are unhappy for any idiosyncratic reason. You need to objectively articulate the problems with the CrimeSolutions instrument if you want to make any progress.
So I can agree that the brand No Effect for the Bronx defenders office does not tell the whole story. I can also say how the current CrimeSolutions instruments fails in this case, and can suggest solutions about how to amend it.
Going Beyond p-values
So in the case of the Bronx Defenders analysis, what happens is that the results are not statistically significant in terms of crime reductions. Also because it is a large sample and well done experimental design, it unfortunately falls into the more damning category of No Effects (Promising or Inconclusive are actually more uncertain categories here).
One could potentially switch the hypothesis testing on its head and do non-inferiority tests to somewhat fit the current CrimeSolutions mold. But I have an approach I think is better overall – to evaluate the utility of a program, you need to consider both its benefits (often here we are talking about some sort of crime reduction), as well as its costs:
Utility = Benefits - Costs
So here we just want Benefits > Costs
to justify any particular social program. We can draw this inequality as a diagram, with costs and benefits as the two axes (I will get to the delta triangle symbols in a minute). Any situation in which the benefits are greater than the costs, we are on the good side of the inequality – the top side of the line in the diagram. Social programs that are more costly will need more evidence of benefits to justify investment.
Often we are not examining a program in a vacuum, but are comparing this program to another counter-factual, what happens if that new proposed program does not exist?
Utility_a = Benefits_a - Costs_a : Program A's utility
Utility_s = Benefits_s - Costs_s : Status Quo utility
So here we want in the end for Utility_a > Utility_s
– we rather replace the current status quo with whatever this program is, as it improves overall utility. It could be the case that the current status quo is do nothing, which in the end is Utility_s = Benefits_s - Costs_s = 0 - 0 = 0
.
It could also be the case that even if Benefits_a > Costs_a
, that Utility_a < Utility_s
– so in that scenario the program is beneficial, but is worse in overall utility to the current status quo. So in that case even if rated Effective in current CrimeSolutions parlance, a city would not necessarily be better off ponying up the cash for that program. We could also have the situation Benefits_a < Costs_a
but Utility_a > Utility_s
– that is the benefits of the program are still net negative, but they still have better utility than the current status quo.
So to get whether the new proposed program has added utility over the status quo, we take the difference in two equations:
Utility_a = Benefits_a - Costs_a : Program A's utility
- Utility_s = Benefits_s - Costs_s : Status Quo utility
--------------------------------------------------------
Δ Utility = Δ Benefits - Δ Costs
And we end up with our changes in the graph I showed before. Note that this implies a particular program can actually have negative effects on crime control benefits, but if it reduces costs enough it may be worth it. For example Megan Stevenson argues pre-trial detention is not worth the costs – although it no doubt will increase crime some, it may not be worth it. Although Stevenson focuses on harms to individuals, she may even be right just in terms of straight up costs of incarceration.
For the Bronx defenders analysis, they showed no benefits in terms of reduced crime. But the intervention was a dramatic cost savings compared to the current status quo. I represent the Bronx defenders results as a grey box in the diagram. It is centered on the null effects for crime benefits, but is clearly in the positive utility part of the graph. If it happened that it was expensive or no difference in costs though, the box would shift right and not clearly be in the effective portion.
For another example, I show the box as not a point in this graph, but an area. An intervention can show some evidence of efficacy, but not reach the p-value < 0.05 threshold. The Chicago summer jobs program is an example of this. It is rated as no effects. I think DSG could reasonably up the sample size requirement for individual recidivism studies, but even if this was changed to the promising or inconclusive recommendation in CrimeSolutions parlance the problem still remains by having a binary yes/no end decision.
So here the box has some uncertainty associated with it in terms of the benefits, but has more area on the positive side of the utility line. (These are just generic diagrams, not meant to be an exact representation, it could be more area of the square should be above the positive utility line given the estimates.) If the authors want to argue that the correct counter-factual status quo is more expensive – so it would shift the pink box to the left – it could as is be a good idea to invest in more. Otherwise it makes sense for the federal govt to invest in more research programs trying to replicate, although from a local govt perspective may not be worth the risk to invest in something like this given the uncertainty. (Just based on the Chicago experiment it probably would be worth the risk for a local govt IMO, but I believe overall jobs and crime programs have a less than stellar track record.)
So these diagrams are nice, but it leaves implicit how CrimeSolutions would in practice measure costs to put this on the diagram. Worst case scenario costs are totally unknown (so would span the entire X axis here, but in many scenarios I imagine people can give reasonable estimates of the costs of social programs. So I believe a simple solution to the current CrimeSolutions issue is two-fold.
- They should incorporate costs somewhere into their measurement instrument. This could either be as another weighted term in the Outcome Evidence/Primary Outcomes portion of the instrument, or as another totally separate section.
- It should have breakdowns on the website that are not just a single final decision endpoint, but show a range of potential results in a diagram like I show here. So while not quite as simple as the binary yes/no in the end, I believe that policy makers can handle that minor bit of added level of complexity.
Neither of these will make CrimeSolutions foolproof – but better to give suggestions to improve it than to suggest to get rid of it completely. I can forsee issues of defining in this framework what are the relevant costs. So the Stevenson article I linked to earlier talks about individual harm, it may be someone can argue that is not the right cost to calculate (and could do something like a willingness to pay experiment). But that goes for the endpoint outcomes as well – we could argue whether or not they are reasonable for the situation as well. So I imagine the CrimeSolutions/DSG folks can amend the instrument to take these cost aspects into account.
4 Comments