I translated my book for $7 using openai

The other day an officer from the French Gendarmerie commented that they use my python for crime analysis book. I asked that individual, and he stated they all speak English. But given my book is written in plain text markdown and compiled using Quarto, it is not that difficult to pipe the text through a tool to translate it to other languages. (Knowing that epubs under the hood are just html, it would not suprise me if there is some epub reader that can use google translate.)

So you can see now I have available in the Crime De-Coder store four new books:

ebook versions are normally $39.99, and print is $49.99 (both available worldwide). For the next few weeks, can use promo code translate25 (until 11/15/2025) to purchase epub versions for $19.99.

If you want to see a preview of the books first two chapters, here are the PDFs:

And here I added a page on my crimede-coder site with testimonials.

As the title says, this in the end cost (less than) $7 to convert to French (and ditto to convert to Spanish).

Here is code demo’ing the conversion. It uses OpenAI’s GPT-5 model, but likely smaller and cheaper models would work just fine if you did not want to fork out $7. It ended up being a quite simple afternoon project (parsing the markdown ended up being the bigger pain).

So the markdown for the book in plain text looks like this:

It ends up that because markdown uses line breaks to denote different sections, that ends up being a fairly natural break to do the translation. These GenAI tools cannot repeat back very long sequences, but a paragraph is a good length. Long enough to have additional context, but short enough for the machine to not go off the rails when trying to just return the text you input. Then I just have extra logic to not parse code sections (that start/end with three backticks). I don’t even bother to parse out the other sections (like LaTeX or HTML), and I just include in the prompt to not modify those.

So I just read in the quarto document, split by “”, then feed in the text sections into OpenAI. I did not test this very much, just use the current default gpt-5 model with medium reasoning. (It is quite possible a non-reasoning smaller model will do just as well. I suspect the open models will do fine.)

You will ultimately still want someone to spot check the results, and then do some light edits. For example, here is the French version when I am talking about running code in the REPL, first in English:

Running in the REPL

Now, we are going to run an interactive python session, sometimes people call this the REPL, read-eval-print-loop. Simply type python in the command prompt and hit enter. You will then be greeted with this screen, and you will be inside of a python session.

And then in French:

Exécution dans le REPL

Maintenant, nous allons lancer une session Python interactive, que certains appellent le REPL, boucle lire-évaluer-afficher. Tapez simplement python dans l’invite de commande et appuyez sur Entrée. Vous verrez alors cet écran et vous serez dans une session Python.

So the acronym is carried forward, but the description of the acronym is not. (And I went and edited that for the versions on my website.) But look at this section in the intro talking about GIS:

There are situations when paid for tools are appropriate as well. Statistical programs like SPSS and SAS do not store their entire dataset in memory, so can be very convenient for some large data tasks. ESRI’s GIS (Geographic Information System) tools can be more convenient for specific mapping tasks (such as calculating network distances or geocoding) than many of the open source solutions. (And ESRI’s tools you can automate by using python code as well, so it is not mutually exclusive.) But that being said, I can leverage python for nearly 100% of my day to day tasks. This is especially important for public sector crime analysts, as you may not have a budget to purchase closed source programs. Python is 100% free and open source.

And here in French:

Il existe également des situations où les outils payants sont appropriés. Les logiciels statistiques comme SPSS et SAS ne stockent pas l’intégralité de leur jeu de données en mémoire, ils peuvent donc être très pratiques pour certaines tâches impliquant de grands volumes de données. Les outils SIG d’ESRI (Système d’information géographique) peuvent être plus pratiques que de nombreuses solutions open source pour des tâches cartographiques spécifiques (comme le calcul des distances sur un réseau ou le géocodage). (Et les outils d’ESRI peuvent également être automatisés à l’aide de code Python, ce qui n’est pas mutuellement exclusif.) Cela dit, je peux m’appuyer sur Python pour près de 100 % de mes tâches quotidiennes. C’est particulièrement important pour les analystes de la criminalité du secteur public, car vous n’avez peut‑être pas de budget pour acheter des logiciels propriétaires. Python est 100 % gratuit et open source.

So it translated GIS to SIG in French (Système d’information géographique). Which seems quite reasonable to me.

I paid an individual to review the Spanish translation (if any readers are interested to give me a quote for the French version copy-edits, would appreciate it). She stated it is overall very readable, but just has many minor things. Here is a a sample of suggestions:

Total number of edits she suggested were 77 (out of 310 pages).

If you are interested in another language just let me know. I am not sure about translation for the Asian languages, but I imagine it works OK out of the box for most languages that are derivative of Latin. Another benefit of self-publishing, I can just have the French version available now, but if I am able to find someone to help with the copy-edits

LinkedIn is the best social media site

The end goals I want for a social media site are:

  • promote my work
  • see other peoples work

Social media for other people may have other uses. I do comment and have minor interactions on the social media sites, but I do not use them primarily for that. So my context is more business oriented (I do not have Facebook, and have not considered it). I participate some on Reddit as well, but that is pretty sparingly.

LinkedIn is the best for both relative to X and BlueSky currently. So I encourage folks with my same interests to migrate to LinkedIn.

LinkedIn

So I started Crime De-Coder around 2 years ago. I first created a website, and then second started a LinkedIn page.

When I first created the business page, I invited most of my criminal justice contacts to follow the page. I had maybe 500 followers just based on that first wave of invites. At first I posted once or twice a week, and it was very steady growth, and grew to over 1500 followers in maybe just a month or two.

Now, LinkedIn has a reputation for more spammy lifecoach self promotion (for lack of a better description). I intentionally try to post somewhat technical material, but keep it brief and understandable. It is mostly things I am working on that I think will be of interest to crime analysts or the general academic community. Here is one of my recent posts on structured outputs:

Current follower count on LinkedIn for my business page (which in retrospect may have been a mistake, I think they promote business pages less than personal pages), is 3230, and I have fairly consistent growth of a few new followers per day.

I first started posting once a week, and with additional growth expanded to once every other day and at one point once a day. I have cut back recently (mostly just due to time). I did get more engagement, around 1000+ views per day when I was posting every day.

Probably the most important part though of advertising Crime De-Coder is the types of views I am getting. My followers are not just academic colleagues I was previously friends with, it is a decent outside my first degree network of police officers and other non-profit related folks. I have landed several contracts where I know those individuals reached out to me based on my LinkedIn posting. It could be higher, as my personal Crime De-Coder website ranks very poorly on Bing search, but my LinkedIn posts come up fairly high.

When I was first on Twitter I did have a few academic collaborations that I am not sure would have happened without it (a paper with Manne Gerell, and a paper with Gio Circo, although I had met Gio in real life before that). I do not remember getting any actual consulting work though.

I mentioned it is not only better for me for advertising my work, but also consuming other material. I did a quick experiment, just opened the home page and scrolled the first 3 non-advertisement posts on LinkedIn, X, and BlueSky. For LinkedIn

This is likely a person I do not want anything to do with, but their comment I agree with. Whenever I use Service Now at my day job I want to rage quit (just send a Teams chat or email and be done with it, LLMs can do smarter routing anymore). The next two are people are I am directly connected with. Some snark by Nick Selby (which I can understand the sentiment, albeit disagree with, I will not bother to comment though). And something posted by Mindy Duong I likely would be interested in:

Then another advert, and then a post by Chief Patterson of Raleigh, whom I am not directly connected with, but was liked by Tamara Herold and Jamie Vaske (whom I am connected with).

So annoying for the adverts, but the suggested (which the feeds are weird now, they are not chronological) are not bad. I would prefer if LinkedIn had a “general” and “my friends” sections, but overall I am happier with the content I see on LinkedIn than I am the other sites.

X & BlueSky

I first created a personal then Twitter account in 2018. Nadine Connell suggested it, and it was nice then. When I first joined I think it was Cory Haberman tweeted and said to follow my work, and I had a few hundred followers that first day. Then over the next two years, just posting blog posts and papers for the most part, I grew to over 1500 followers IIRC. I also consumed quite a bit of content from criminal justice colleagues. It was much more academic focused, but it was a very good source of recent research, CJ relevant news and content.

I then eventually deleted the Twitter account, due to a colleague being upset I liked a tweet. To be clear, the colleague was upset but it wasn’t a very big deal, I just did not want to deal with it.

I started a Crime De-Coder X account last year. I made an account to watch the Trump interview, and just decided to roll with it. I tried really hard to make X work – I posted daily, the same stuff I had been sharing on LinkedIn, just shorter form. After 4 months, I have 139 followers (again, when I joined Twitter in 2018 I had more than that on day 1). And some of those followers are porn accounts or bots. Majority of my posts get <=1 like and 0 reposts. It just hasn’t resulted in getting my work out there the same way in 2018 or on LinkedIn now.

So in terms of sharing work, the more recent X has been a bust. In terms of viewing other work, my X feed is dominated by short form video content (a mimic of TikTok) I don’t really care about. This is after extensively blocking/muting/saying I don’t like a lot of content. I promise I tried really hard to make X work.

So when I open up the Twitter home feed, it is two videos by Musk:

Then a thread by Per-Olof (whom I follow), and then another short video Death App joke:

So I thought this was satire, but clicking that fellows posts I think he may actually be involved in promoting that app. I don’t know, but I don’t want any part of it.

BlueSky I have not been on as long, but given how easy it was to get started on Twitter and X, I am not going to worry about posting so much. I have 43 followers, and posts similar to X have basically been zero interaction for the most part. The content feed is different than X, but is still not something I care that much about.

We have Jeff Asher and his football takes:

I am connected with Jeff on LinkedIn, in which he only posts his technical material. So if you want to hear Jeff’s takes on football and UT-Austin stuff then go ahead and follow him on BlueSky. Then we have a promotional post by a psychologist (this person I likely would be interested in following his work, this particular post though is not very interesting). And a not funny Onion like post?

Then Gavin Hales, whom I follow, and typically shares good content. And another post I leave with no comment.

My BlueSky feed is mostly dominated by folks in the UK currently. It could be good, but it currently just does not have the uptake to make it worth it like I had with Twitter in 2018. It may be the case given my different goals, to advertise my consulting business, Twitter in 2018 would not be good either though.

So for folks who subscribe to this blog, I highly suggest to give LinkedIn a try for your social media consumption and sharing.

The story of my dissertation

My dissertation is freely available to read on my website (Wheeler, 2015). I still open up my hardcover I purchased every now and then. No one cites it, because no one reads dissertations, but it is easily the work I am the most proud of.

Most of the articles I write there is some motivating story behind the work you would never know about just from reading the words. I think this is important, as the story often is tied to some more fundamental problem, which solving specific problems is the main way we make progress in science. The stifling way that academics write peer reviewed papers currently doesn’t allow that extra narrative in.

For example, my first article (and what ended up being my masters thesis, Albany at that time you could go directly into PhD from undergrad and get your masters on the way), was an article about the journey to crime after people move (Wheeler, 2012). The story behind that paper was, while working at the Finn Institute, Syracuse PD was interested in targeted enforcement of chronic offenders, many of whom drive around without licenses. I thought, why not look at the journey to crime to see where they are likely driving. When I did that analysis, I noticed a few hundred chronic offenders had something like a 5 fold number of home addresses in the sample. (If you are still wanting to know where they drive, they drive everywhere, chronic offenders have very wide spatial footprints.)

Part of the motivation behind that paper was if people move all the time, how can their home matter? They don’t really have a home. This is a good segue into the motivation of the dissertation.

More of my academic reading at that point had been on macro and neighborhood influences on crime. (Forgive me, as I am likely to get some of the timing wrong in my memory, but this writing is as best as I remember it.) I had a class with Colin Loftin that I do not remember the name of, but discussed things like the southern culture of violence, Rob Sampson’s work on neighborhoods and crime, and likely other macro work I cannot remember. Sampson’s work in Chicago made the biggest impression on me. I have a scanned copy of Shaw & McKay’s Juvenile Delinquency (2nd edition). I also took a spatial statistics class with Glenn Deane in the sociology department, and the major focus of the course was on areal units.

When thinking about the dissertation topic, the only advice I remember receiving was about scope. Shawn Bushway at one point told me about a stapler thesis (three independent papers bundled into a single dissertation). I just wanted something big, something important. I intentionally sought out to try to answer some more fundamental question.

So I had the first inkling of “how can neighborhoods matter if people don’t consistently live in the same neighborhood”? The second was that my work at the Finn Institute working with police departments, hot spots were the only thing any police department cared about. It is not uncommon even now for an academic to fit a spatial model at the neighborhood level to crime and demographics, and have a throwaway paragraph in the discussion about how it would help police better allocate resources. It is comically absurd – you can just count up crimes at addresses or street segments and rank them and that will be a much more accurate and precise system (no demographics needed).

So I wanted to do work on micro level units of analysis. But I had on my dissertation Glenn and Colin – people very interested in macro and some neighborhood level processes. So I would need to justify looking at small units of analysis. Reading the literature, Weisburd and Sherman did not have to me clearly articulated reasons to be interested in micro places, beyond just utility for police. Sherman had the paper counting up crimes at addresses (Sherman et al., 1989), and none of Weisburd’s work had to me any clear causal reasoning to look at micro places to explain crime.

To be clear wanting to look at small units as the only guidepost in choosing a topic is a terrible place to start. You should start from a more specific, articulable problem you wish to solve. (If others pursuing Phds are reading.) But I did not have that level of clarity in my thinking at the time.

So I set out to articulate a reason why we would be interested to look at micro level areas that I thought would satisfy Glenn and Colin. I started out thinking about doing a simulation study, similar to what Stan Openshaw did (1984) that was motivated by Robinson’s (1950) ecological fallacy. While doing that I realized there was no point in doing the simulation, you could figure it out all in closed form (as have others before me). So I proved that random spatial aggregation would not result in the ecological fallacy, but aggregating nearby spatial areas would, assuming there is a spatial covariance between nearby areas. I thought at the time it was a novel proof – it was not (Footnote 1 on page 9 were all things I read after this). Even now the Wikipedia page on the ecological fallacy has an unsourced overview of the issue, that cross-spatial correlations make the micro and macro equations not equal.

This in and of itself is not interesting, but in the process did clearly articulate to me why you want to look at micro units. The example I like to give is as follows – imagine you have a bar you think causes crime. The bar can cause crime inside the bar, as well as the bar diffusing risk into the nearby area. Think people getting in fights in the bar, vs people being robbed walking away from a night of drinking. If you aggregate to large units of analysis, you cannot distinguish between “inside bar crime” vs “outside bar crime”. So that is a clear causal reasoning for when you want to look at particular units of analysis – the ability to estimate diffusion/displacement effects are highly dependent on the spatial unit of analysis. If you have an intervention that is “make the bar hire better security” (ala John Eck’s work) that should likely not have any impact outside the bar, only inside the bar. So local vs diffusion effects are not entirely academic, they can have specific real world implications.

This logic does not explicitly always value smaller spatial units of analysis though. Another example I liked to give is say you are evaluating a city wide gun buy back. You could look at more micro areas than the entire city, e.g. see if it decreased in neighborhood A and increased in neighborhood B, but it likely does not invalidate the macro city wide analysis. Which is just an aggregate estimate over the entire city – which in some cases is preferable.

Glenn Deane at some point told me that I am a reductionist, which was the first time I heard that word, but it did encapsulate my thinking. You could always go smaller, there is no atom to stop at. But often it just doesn’t matter – you could examine the differences in crime between the front stoop and the back porch, but there is not likely meaningful causal reasons to do so. This logic works for temporal aggregation and aggregating different crime types as well.

I would need to reread Great American city, but I did not take this to be necessarily contradictory to Sampson’s work on neighborhood processes. Rob came to SUNY Albany to give a talk at the sociology department (I don’t remember the year). Glenn invited me to whatever they were doing after the talk, and being a hillbilly I said I need to go back to work at DCJS, I am on my lunch break. (To be clear, no one at DCJS would have cared.) I am sure I would have not been able to articulate anything of importance to him, but I do wish I had taken that opportunity in retrospect.

So with the knowledge of how aggregation bias occurs in hand, I had formulated a few different empirical research projects. One was the idea behind bars and crime I have already given an example of. I had a few interesting findings, one of which is that diffusion effects are larger than the local effects. I also estimated the bias of bars selecting into high crime areas via a non-equivalent dependent variable design – the only time I have used a DAG in any of my work.

I gave a job talk at Florida State before the dissertation was finished. I had this idea in the hotel room the night before my talk. It was a terrible idea to add it to my talk, and I did not prepare what I was going to say sufficiently, so it came out like a jumbled mess. I am not sure whether I would want to remember or forget that series of events (which include me asking Ted Chiricos if you can fish in the Gulf of Mexico at dinner, I feel I am OK in one-on-one chats, group dinners I am more awkward than you can possibly imagine). It also included nice discussions though, Dan Mear’s asked me a question about emergent macro phenomenon that I did not have a good answer to at the time, but now I would say simple causal processes having emergent phenomenon is a reason to look at micro, not the macro. Eric Stewart asked me if there is any reason to look at neighborhood and I said no at the time, but I should have said my example gun buy back analogy.

The second empirical study I took from broken windows theory (Kelling & Wilson, 1982). So the majority of social science theories some spatial diffusion is to be expected. Broken windows theory though had a very clear spatial hypothesis – you need to see disorder for it to impact your behavior. So you do not expect spatial diffusion, beyond someones line of site, to occur. To measure disorder, I used 311 calls (I had this idea before I read Dan O’Brien’s work, see my prospectus, but Dan published his work on the topic shortly thereafter, O’Brien et al. 2015).

I confirmed this to be the case, conditional on controlling for neighborhood effects. I also discuss how if the underlying process is smooth, using discrete neighborhood boundaries can result in negative spatial autocorrelation, which I show some evidence of as well.

This suggests that using a smooth measure of neighborhoods, like Hipp’s idea of egohoods (Hipp et al., 2013), I think is probably more reasonable than discrete neighborhood boundaries (which are often quite arbitrary).

While I ended up publishing those two empirical applications (Wheeler, 2018; 2019), which was hard, I was too defeated to even worry about posting a more specific paper on the aggregation idea. (I think I submitted this paper to Criminology, but it was not well received.) I was partially burned out from the bars and crime paper, which went at least one R&R at Criminology and was still rejected. And then I went through four rejections for the 311 paper. I had at that point multiple other papers that took years to publish. It is a slog and degrading to be rejected so much.

But that is really my only substantive contribution to theoretical criminology in any guise. After the dissertation, I just focused on either policy work or engineering/method applications. Which are much easier to publish.

References

Year in Review 2024

Past year in review posts I have made focused on showing blog stats. Writing this in early December, but total views will likely be down this year – I am projecting around 140,000 views in total for this site. But I have over 25k views for the Crime De-Coder site, so it is pretty much the same compared to 2023 combining the two sites.

I do not have a succinct elevator speech to tell people what I am working on. With the Crime De-Coder consulting gig, it can be quite eclectic. That Tukey quote being a statistician you get to play in everyone’s backyard is true. Here is a rundown of the paid work I conducted in the past year.

Evidence Based CompStat: Work with Renee Mitchell and the American Society of Evidence Based Policing on what I call Evidence Based CompStat. This mostly amounts to working directly with police departments (it is more project management than crime analysis) to help them get started with implementing evidence based practices. Reach out if that sounds like something your department would be interested in!

Estimating DV Violence: Work supported by the Council on CJ. I forget exactly the timing of events. This was an idea I had for a different topic (to figure out why stores and official reports of thefts were so misaligned). Alex approached me to help with measuring national level domestic violence trends, and I pitched this idea (use local NIBRS data and NCVS to get better local estimates).

Premises Liability: I don’t typically talk about ongoing cases, but you can see a rundown of some of the work I have done in the past. It is mostly using the same stats I used as a crime analyst, but in reference to civil litigation cases.

Patrol Workload Analysis: I would break workload analysis for PDs down into two categories, advanced stats and CALEA reports. I had one PD interested in the simpler CALEA reporting requirement (which I can do for quite a bit cheaper than the other main consulting firm that offers these services).

Kansas City Python Training: Went out to Kansas City for a few days to train their analysts up in using python for Focused Deterrence. If you think the agenda in the pic below looks cool get in touch, I would love to do more of these sessions with PDs. I make it custom for the PD based on your needs, so if you want “python and ArcGIS”, or “predictive models” or whatever, I will modify the material to go over those advanced applications. I have also been pitching the same idea (short courses) for PhD programs. (So many posers in private sector data science, I want more social science PhDs with stronger tech skills!)

Patterson Opioid Outreach: Statistical consulting with Eric Piza and Kevin Wolff on a street outreach intervention intended to reduce opioid overdose in Patterson New Jersey. I don’t have a paper to share for that at the moment, but I used some of the same synthetic control in python code I developed.

Bookstore prices: Work with Scott Jacques, supported by some internal GSU money. Involves scraping course and bookstore data to identify the courses that students spend the most on textbooks. Ultimate goal in mind is to either purchase those books as unlimited epubs (to save the students money), or encourage professors to adopt better open source materials. It is a crazy amount of money students pour into textbooks. Several courses at GSU students cumulatively spend over $100k on course materials per semester. (And since GSU has a large proportion of Pell grant recipients, it means the federal government subsidizes over half of that cost.)

General Statistical Consulting: I do smaller stat consulting contracts on occasion as well. I have an ongoing contract to help with Pam Metzger’s group at the SMU Deason center. Did some small work for AH Datalytics on behind the scenes algorithms to identify anomalous reporting for the real time crime index. I have several times in my career consulted on totally different domains as well, this year had a contract on calculating regression spline curves for some external brain measures.

Data Science Book: And last (that I remember), I published Data Science for Crime Analysis with Python. I still have not gotten my 100 sales I would consider it a success – so if you have not bought a copy go do that right now. (Coupon code APWBLOG will get you $10 off for the next few weeks, either the epub or the paperback.)

Sometimes this seems like I am more successful than I am. I have stopped counting the smaller cold pitches I make (I should be more aggressive with folks, but most of this work is people reaching out to me). But in terms of larger grant proposals or RFPs in that past year, I have submitted quite a few (7 in total) and have landed none of them to date! Submitted a big one to support surveys that myself and Gio won the NIJ competition on for place based surveys to NIJ in their follow up survey solicitation, and it was turned down for example. So it goes.

In addition to the paid work, I still on occasion publish peer reviewed articles. (I need to be careful with my time though.) I published a paper with Kim Rossmo on measuring the buffer zone in journey to crime data. I also published the work on measuring domestic violence supported by the Council on CJ with Alex Piquero.

I took the day gig in Data Science at the end of 2019. Citations are often used as a measure of a scholars influence on the field – they are crazy slow though.

I had 208 citations by the end of 2019, I now have over 1300. Of the 1100 post academia, only a very small number are from articles I wrote after I left (less than 40 total citations). A handful for the NIJ recidivism competition paper (with Gio), and a few for this Covid and shootings paper in Buffalo. The rest of the papers that have a post 2019 publishing date were entirely written before I left academia.

Always happy to chat with folks on teaming up on papers, but it is hard to take the time to work on a paper for free if I have other paid work at the moment. One of the things I need to do to grow the business is to get some more regular work. So if you have a group (academic, think tank, public sector) that is interested in part time (or fractional I guess is what the cool kids are calling it these days), I would love to chat and see if I could help your group out.

GenAI is not a serious solution to California’s homeless problem

This is a rejected op-ed (or at least none of the major papers in California I sent it to bothered to respond and say no-thanks, it could be none of them even looked at it). Might as well post it on personal blog and have a few hundred folks read it.


Recently Gov. Newsom released a letter of interest (LOI) for different tech companies to propose how the state could use GenAI (generative artificial intelligence) to help with California’s homeless problem. The rise in homelessness is a major concern, not only for Californian’s but individuals across the US. That said, the proposal is superficial and likely to be a waste of time.

A simple description of GenAI, for those not aware, are tools to ask the machine questions in text and get a response. So you can ask ChatGPT (a currently popular GenAI tool) something like “how can I write a python function to add two numbers together” and it will dutifully respond with computer code (python is a computer programming language) that answers your question.

As someone who writes code for a living, this is useful, but not magic. Think of it more akin to auto-complete on your phone than something truly intelligent. The stated goals of Newsom’s LOI are either mostly trivial without the help of GenAI, or are hopeless and could never be addressed with GenAI.

For the first stated goal, “connecting people to treatment by better identifying available shelter and treatment beds, with GenAI solutions for a portable tool that the local jurisdictions can use for real-time access to treatment and shelter bed availability”. This is simply describing a database — one could mandate state funded treatment providers to provide this information on a daily basis. The technology infrastructure to accomplish this is not much more complex than making a website. Mandating treatment providers report that information accurately and on a timely basis is the hardest part.

For the second stated goal, “Creating housing with more data and accountability by creating clearer insights into local permitting and development decisions”. Permitting decisions are dictated by the state as well as local ordinances. GenAI solutions will not uncover any suggested solution that most Californian’s don’t already know — housing is too expensive and not enough is being built. This is in part due to the regulatory structure, as well as local zoning opposition for particular projects. GenAI cannot change the state laws.

For the last stated goal of the program, “Supporting the state budget by helping state budget analysts with faster and more efficient policy”. Helping analysts generate results faster is potentially something GenAI can help with, more efficient policy is not. I do not doubt the state analysts can use GenAI solutions to help them write code (the same as I do now). But getting that budget analysis one day quicker will not solve any substantive homeless problem.

I hate to be the bearer of bad news, but there are no easy answers to solve California’s homeless crisis. If a machine could spit out trivial solutions to solve homelessness in a text message, like the Wizard of Oz gifting the Scarecrow brains, it would not be a problem to begin with.

Instead of asking for ambiguous GenAI solutions, the state would be better off thinking more seriously about how they can accomplish those specific tasks mentioned in the LOI. If California actually wants to make a database of treatment availability, that is something they could do right now with their own internal capacity.

Solutions to homelessness are not going to miraculously spew from a GenAI oracle, they are going to come from real people accomplishing specific goals.


If folks are reading this, check out my personal consulting firm, Crime De-Coder. I have experience building real applications. Most of the AI stuff on the market now is pure snake oil, so better to articulate what you specifically want and see if someone can help build that.

Crime De-Coder consulting

Hillbilly Lament

I recently read JD Vance’s Hillbilly Elegy. I grew up in very rural Pennsylvania in the Appalachian mountains, and I am the same age as Vance. So I was interested in hearing his story. My background was different, of course not all rural people are a monolith, but I commiserated with many of his experiences and feelings.

I think it is a good representation of rural life, more so than reading any sociology book. The struggles of rural people are in many ways the same as individuals living in poverty in urban areas. Vance highly empathized with Wilson’s Truly Disadvantaged, and in the end he focuses on cultural behaviors (erosion of core family, domestic violence, drug use). These are not unique to rural culture. Personally I view much of the current state of rural America through Murray’s Bell Curve, which Vance also discusses.

This is not a book review. I will tell some of my stories, and relate them to a bit of what Vance says. I think Murray’s demographic segregation (brain drain) is a better way to frame why rural America looks the way it does than Vance’s focus on cultural norms. This is not a critique of Vance’s perspective though, it is just a different emphasis. I hope you enjoy my story, the same way I enjoyed reading Vance’s story. I think they give a peak into rural life, but they aren’t a magic pill to really understand rural people. I do not even really understand rural people, I can only relate some of my personal experiences and feelings at the time.

It also is not any sort of political endorsement. I do endorse people to read his book – even if you do not like Vance’s current politics the book has what I would consider zero political commentary.

Farming

Where I grew up is more akin to Jackson Kentucky than Middletown Ohio – I grew up in Bradford county Pennsylvania. The town I went to school in has a population of under 2,000 individuals, and my class size was around 80 students. A Subway opened in town when I was a teenager and it was a big deal (the only fast food place in town at the time).

My grandfather on my mother’s side had a small dairy farm. One of the major cultural perspectives I have on rural people is somewhat contra to Vance – farmers have an incredible work ethic. Again this is not a critique of Vance’s book, I could find lazy people like Vance discusses as well. I just knew more farmers.

Farming is Sisyphean – you get up and you milk the cows. It does not matter if you are sick, does not matter if it is raining, does not matter if you are tired. It would be like you not feeding your pets in the morning, and you do not get paid.

There was always an expectation of working. I was doing farm work at an early age. I always remember being in the barn, but even before I was 10 years old I was taught to operate heavy machinery (skidsteer, tractors). So I was doing what I would consider “real work” at that young, not just busy work you give a child to keep them preoccupied.

My grandfather retired and sold his farm when I was 11, but I went and worked for a neighbors farm shortly there after. We had a four-wheeler that I would drive up the road in the wee hours of the morning to go work. I was not an outlier among other kids my age. The modal kid in my school was not a farmer, but there were more than a dozen of my classmates who had the same schedule.

Farming, and manual labor in general, is brutal. When Vance talked about the floor tile company not being able to fill positions (despite the reasonable pay), I can hardly blame people for not taking those jobs. They are quite literally backbreaking.

Farming has become more consolidated/automated over time. The dairy farms I worked on were very small, having less than 100 cows. One of my memories is being exhausted stacking small square bales of hay. There is a machine that bundles up the hay and shoots it onto a wagon. I would be in the wagon stacking the bales. Then we would have to load the bales up a conveyor belt and stack them in the barn. The bales weighed around 50 pounds, it was crazy hard work.

Round bales were only starting to become more common in the area when I was a teenager. You can only move the larger round bales using farm equipment. I believe almost no one does the small square bales like I am describing anymore. The smaller square bales were also a much larger fire hazard. If the hay was wet when baled it would go through a fermentation process and potentially get hot enough in the center of the stack to catch fire.

This is one of the reasons I do not have much worry about automation taking jobs. A farmer not needing to hire extra labor to stack hay and instead use a tractor to do the same work is a good thing. This will apply to many manual labor jobs.

The Culture of Danger

One thing I had always been skeptical of when reading in sociological texts was the southern culture of violence. Saying southern people have an honor culture and then showing homicide rates in the south are higher is very weak evidence supporting that theory.

The first person who had told me of that hypothesis in my PhD program at Albany was a professor named Colin Loftin. I think I laughed when he said it in class and I straight up asked “is this a real thing.” He was from Alabama and assured me it is, and told some story of a student fighting after bumping into another person walking in a hallway as an example. For those who do not know Colin, imagine a nerdy grandpa accountant with a well groomed white beard stepped out of a Hallmark movie. And he tells you southern people are prone to violence over trivial matters of honor – I was not convinced. As I said I grew up in a very rural area; Pennsylvania is sometimes described as Philadelphia, Pittsburgh, and Alabama in between. But I did not experience that type of violence at all.

Vance’s book is the first confirmation of the southern culture of violence that I believed. The area I grew up in was substantively less violent. I would consider it more northeastern vibes than what Vance describes as southern honor culture. I am sure I could find some apocryphal violent stories similar to what Vance describes if I prodded my relatives enough, but I did not take that to be a core part of our identity the same way Vance does.

Culture is hard to define. The ideal of “you work at all costs” I take as part of farming culture. It was not an ideal more generally held among the broader population though. I definitely was familiar with adults who were lazy, or adults who had the volatile lifestyle of Vance’s mother he describes in his book. But I was personally familiar with more people who day in and day out performed incredibly hard manual labor.

Another aspect of growing up, which I have coined the culture of danger, is harder to associate specific behaviors with. But it is something that I now recognize was constantly in the background, something we all took for granted as a given, but is not something that is more broadly accepted in other parts of society.

Farming is incredibly dangerous. There are steel pipes that transport the milk from the cow to the bulk tank. After you are done with a round of milking, the interior of the pipeline gets washed with a round of acid. Then the pipes are rinsed with a second round of a much more concentrated caustic solution to get rid of the residual acid. When I was around 6 years old, me and my older sister were playing in the barn and I spilled the more caustic solution on myself. They were housed in large plastic containers that had a squirt top on them (no different than the pump on your liquid hand soap). All it took was a push, and it melted my nipple off (through the shirt I was wearing at the time). Slightly different trajectory and I would be missing half of my face.

Like I said previously, I learned to operate heavy machinery before I was 10. I distinctly remember doing things in both the skidsteer and the tractor when I was young that scared me. Driving them in areas and in situations where I was concerned I would roll the machines. Skidsteers are like mini tanks with a bucket on the front. I do not think it had a seat belt, but if I rolled the skidsteer I think my chances were better than 50/50 to make it out (it had a roll cage, but if ejected and it rolled over you, you are going to be a pancake). If I rolled a tractor (these are very old tractors, no cab), I think your chances are less than 50% you get out without serious injury.

Learn to operate here meant I was given a short lesson and then expected to go do work, by myself without supervision. Looking back it was crazy, but of course when I was a kid it did not seem crazy.

Another example was climbing up silos. At the farm that I worked on, he did not have a machine to empty out the silo. So I would need to climb the silo, and fork out the silage for the cows feed. Imagine you took your lawn mower over a corn field, the clippings (both the cobs and the stalks) are what silage is.

I would climb up a ladder, around 50 feet, carrying a pitchfork. When I was 12. This was not a rare occurrence; this was one of my main responsibilities as a farmhand, I did this twice a day at least.

The cows were also fed a grain like mixture (that is similar to cereal, it did not taste bad). I have mixed feelings about the grass fed craze now, since the cows really enjoyed the grain and the silage (although I do not doubt a grass fed diet could be better for their overall health). And I do not know if feeding them just baled hay counts as grass fed, or if they need to be entirely fed with fresh grass from the field.

Some may read this and think the child labor was the issue. I do not think that that was a problem at all. To me there is no bright line between doing chores around the house and doing chores in the barn. I was paid when I worked on the neighbors farm, and it was voluntary. It was hard work, but it did not directly impact my schooling in any obvious way. No more than when a kid does sports or scouts. The operating heavy machinery when I was a child was crazy though, and the working conditions were equally dangerous for everyone.

Even more broadly, just driving around the area I lived was dangerous. There were six males I went to high school with that died in traffic accidents. So in a group of less than 300 males nearby me in age (+/- two years), six of them died. I can not remember what the roads MPH was graded at, but they were winding. I am not sure it matters what the MPH is, there is no way reasonable way to enforce driving standards on all those back roads.

Death and danger was just part of life. Johnny died in a car accident, Mary is having a yard sale, and Bobby rolled the tractor, his femur broke the skin but he was able to crawl to the road where Donny happened to be driving by, so he will be ok. So it goes.

The culture of danger as I remember it does not have as many direct manifest negative behaviors as does “honor culture” or “I am too lazy and too high on drugs to keep a job”. So maybe some real ethnographers would quibble with my description of it as a culture.

I do think this is distinct from how individuals in certain areas have a constant specter of interpersonal violence. I do not have any PTSD type symptoms from my experience, like Vance describes based on his experience with child abuse. In the end I suspect the area I grew up had worse early death rates than many urban areas with violence on a per capita basis, but the nature of it doesn’t have quite the same effect on the psyche. Ignorance is bliss I suppose, you get used to driving tractors on steep hills.

Steaks are for rich people

One of the places I remember eating at growing up was Hoss’s Steakhouse in Williamsport. So if we went to get clothes for school at the Lycoming Mall as a family, we would often eat there on the way home. It is a place with a nice salad bar. You could load up a salad with a pound of bacon bits, and have a soft serve ice cream on the side if you wanted.

Walking into the restaurant before you are seated are pictures on the walls of the meals. I was maybe 14 at the time, and waiting to be seated I pointed to one of the steak meals and said “I would like to get that”. The immediate response from my mother was “No you are not. You are getting the salad bar like always.” One of the ironic parts of this story to me is that, because I was working as an independent farmhand, I had my own money. I could have certainly paid for a steak dinner with my own cash.

After I was 16 and could drive, I did end up doing the majority of my own clothes shopping. I remember splurging on some really ugly yellow and green Puma sneakers one year (totally worth it, they were $80 and did get the “whoa nice shoes” comments at school as intended).

I have only recently developed a palette for steak. My son likes it and requests we go to the local steak house on occasion. I have for awhile actively encouraged him to buy whatever expensive steak is on the menu when we go out to eat (Waygu beef at the sushi place, that sounds interesting you should try that). Makes me feel like the god damn king of the world.

Soda and Beer

When I was young (less than 10 years old), during summers I would spend most of my time on the farm, but once a week visit my other grandparents. I would do two activities with my grandfather: either golf or go fishing. For golf we had a par three 9 hole course in my town. I cannot hit a driver straight to save my life, but my iron game is at the level where I would not embarrass myself.

Fishing was just in little ponds in the area, mostly sunfish and bass. We would bring a sandwich, two Pepsis for myself, and two beers for Grandpa.

I tell this story both because it is a fond memory, as well as it highlights one of the stark differences in my lifestyle now (in terms of healthy eating) vs back then. I am pretty sure I drank more soda than water growing up. Our well water was sulfurous, so it was quite unpleasant to drink. Soda was just a regular thing I remember everyone having, including kids.

Charles Murray in his Bell Curve proactively addresses most of the negative commentary you hear about right in the writing. But one thing I thought was slightly cringe at the time I read the book (sometime while getting my PhD) was his “middle class values index”. To be clear these were not things about healthy eating, but were more basic “received a high school diploma” and “have a job”. The items I did not object to, but the moniker of “middle class” I thought was an unnecessary jab for something that was so subjective.

In retrospect though, “feeding kids soda like it is water” and “driving around with open containers of beer” is the most apropos “not middle class values” I can think of. So now I do not hold that name against Murray. These are not idiosyncratic to rural areas, you can identify people in poor urban areas who behave like this as well. But you definitely do not need to worry about being pulled over for an open container while driving where I grew up.

In Pennsylvania at this time, to buy beer you needed to go to a distributor. You could not get a six pack at the gas station. You needed to get at least a 24 pack. I figure this limited the number of people purchasing beer, but I do wonder for those who did buy beer if it increased the amount of binge drinking.

Role Models and Choices

For a crazy dichotomy with how I grew up versus what I know now, one of the only role model career choices I remember individuals talking about growing up were teachers. Getting a job as a teacher at the school district was a well respected (and well paying) career option in my town.

An aspect of this that can only be understood when you are outside of it is how insular this perspective is. A kid wants to be a teacher, because that is pretty much the only career they are exposed to. This is from the perspective “I may need to go to school, but I will come back here and get a job as a teacher”.

I personally did not have much respect for my teachers in high school, so I never seriously considered that as a career option. I was an incredibly snarky and sarcastic kid. My older sister (and then later my younger sister) were salutatorians of their classes. I (quite intentionally) did not try very hard.

For one story, my physics teacher (who was friends with my father) called home to ask if I was doing ok since I slept in class. My mother asked what my grade was, and since it was an A, she did not care. For another, which I am embarrassed about now, I would intentionally give wrong answers (it was history or civics I believe) because the teacher would get upset. I found it hilarious at the time, but I realize this is incredibly churlish now (he cared that we were learning, which I cannot say for all of my teachers). Sorry Mr. Kirby.

So, I was not a super engaged student.

Vance talks about it seeming like the choices you make do not matter. I can understand that, it did not seem to me I was making any choices at all when I think back on it. My parents did always have an expectation that I would go to college for myself and my siblings. Working as a farmhand (for other peoples farms at that point) was never a serious option.

Both my parents had associates degrees, and my sister (who was two years older than me) went to Penn State for accounting. That was about the extent of college advice I got – you should go. I never had a serious conversation about where I should go or what I should go for. Choose your own adventure.

I remember signing myself up for the SAT. I took the test on a Saturday morning in a testing center a few towns over. I finished each of the sections very fast, and I scored decently for not practicing at all (a 1200 I believe, out of a possible 1600). I have consistently done excellent on math and poorly on the English portions of tests in my life; I think I had 700 in math and 500 in English for the SAT.

One funny part of this is that, until graduate school, I actually did not understand algebra. Given my area of expertise (statistical analysis and optimization now), many people think I am quite a math geek. I did have good teachers in high school, but I was able to score this high on the SAT through memorization. For example I knew the rule for derivatives for polynomials, but if you asked me to do a simple proof at that point I would not have had any clue how to do that.

When I say I did not understand algebra, I mean when I was given a word problem that I needed to use mathematical concepts to solve, I just figured it out in my head. It was not until graduate school that at some point I realized you can take words and translate them into mathematical equations.

I know now that this is somewhat common for intelligent people when learning math. I home school my son, and I noticed the same thing for him. So it took active engagement (forcing him to write down the algebraic equivalent, even when he could figure out the answer in his head). But just rote memorization can get you quite a good score on the SAT.

Individuals who want to get rid of standardized testing because poor people score worse on average is the wrong way to think about it. You should want to improve the opportunities for individuals to get better education, not stick your head in the sand and pretend those inequalities do not exist.

SAT results in hand, I remember asking the guidance counselor about school information for Bloomsburg University (I chose Bloomsburg because it was not Mansfield and not Penn State, and was cheaper). And her response was simply to hand me a single page flier.

I can understand my parents not giving decent college advice; they did not know about scholarships or what opportunities were available. The high school guidance counselor in her sloth though makes me angry in retrospect. Our grade had less than 80 kids – she could have spent a few minutes reviewing each of those kids backgrounds, and provided more tailored advice.

I am positive none of the kids in my class went to undergrad at any more advance school than Penn State (and even that may have only been one student in my class). Cornell (an ivy league school in Ithaca, New York) is actually closer than Penn State to where I lived – I did not know it existed when I was in high school. To be fair, I do not know if the guidance counselor knew my SAT score, but she could have asked. She certainly had access to my grades, could see I did well in STEM courses, and could have easily given suggestions like “you can apply for partial scholarships to many different places.”

This goes both ways, I knew several of my classmates that should not have gone to college. My best friend in high school was a solid B/C student, went to Mansfield for journalism, and stopped going in his junior year. He is doing fine now as a foreman for a natural gas company. Going to get a four year BA degree for him was a bad idea and waste of money. And it was doubly bad going for journalism.

The brain drain had not happened yet in my high school. The level of discourse in my high school classes was excellent. I noticed a significant regression in the level of discussion in my first classes at Bloomsburg relative to those in my high school. (Bloomsburg had an average SAT score for entrants of 1000.)

I am intelligent, but I was not the most intelligent in my class. There were easily 10 other people in my class that had comparable intelligence to me. All of whom would have likely qualified for at least partial Pell grant assistance.

For those in my class that did go to college, pretty much everyone went to one of the PASSHE schools (these are state schools in Pennsylvania, originally founded as normal schools that were intentionally spread out in rural areas). Most went to the closest nearby (Mansfield), but a few spread out to various institutions across the state (Lock Haven, Shippensburg, Indiana, etc.).

I have hobnobbed with ivy league individuals (professors and students) since getting my PhD. There is no fundamental difference between kids who go to ivy league schools and the kids I went to high school with. With a semester of SAT test prep, and a not lazy guidance counselor helping apply to scholarships, we could have had double digit number of kids accepted to prestigious institutions for zero cost.

Do Not Talk About Money

I remember asking my grandfather why barns were red. He said it was because red paint was cheaper. That was the only conversation I can remember in my childhood that discussed money in any form.

When going to college I was filling out my FAFSA form, and asked my father how much money he made. His response was “enough”. Vance brings up the idea that, ironically, going to nicer colleges is cheaper for poor people. But they have no clue about that. I am fairly sure I would have qualified for partial Pell grant assistance – I just left the section on your parents income blank on the FAFSA form.

Besides inept counsel on college, even though I had worked all these different jobs I only remember actively thinking about pay when I was working different jobs in college. The floor board factory was over $8 per hour. The ribbon factory was $11 per hour. Later when I worked as a security guard for Bloomsburg University I made $13 per hour.

Similar to Vance’s experience in Middletown, $13 per hour is quite decent to get an apartment and put food on a table for a family in that area of PA (at least at that time, 2004-2008). You are not saving up for retirement, but you shouldn’t need to live in the dregs and go hungry either.

In retrospect the advice I needed at the time (but never received) was real talk about pursuing careers. This is wrapped up in college, you go to college to prepare yourself for a career (the expectation I go to college was certainly not only to obtain a liberal arts education!)

I ended up choosing Bloomsburg University because I knew many of my classmates were going to a closer school (Mansfield) and just wanted to be different. There was no thought into choosing criminal justice as a major either. When folks ask me the question “what did you want to be when you grew up”, I can not remember actively thinking about any specific career. Even when I was young and hitting baseballs in the back yard, I knew that I was not going to be a professional baseball player.

I remember at one point in the middle of undergrad at Bloomsburg realizing that a criminal justice degree is not really vocational, and I could quit if I really wanted to just go and be a police officer (the only vocation I likely associated with the major). Which I did not really want to do. So I was debating on transferring to Bucknell for sociology, or some community college for whatever degree you get to work on HVAC systems. (I do not know where the Bucknell idea came from, I must of thought “it was fancy” or something relative to Bloomsburg.)

I could have used other advice, like “you can negotiate your wage”, but likely understanding my career options was the one thing that negatively effected my long term career progression. I do not mean to denigrate the HVAC job (given my background and what I know now, I am pretty sure that would have been a better return on investment than sociology at Bucknell!)

Not that I would go back in time and change anything (I received an excellent education at Bloomsburg in criminal justice, and ditto at SUNY Albany). But if someone somewhere said “hey Andy, you are pretty good at math, you should look into engineering”, my life trajectory would likely be very different.

Factory Jobs Suck

After I was able to drive at 16 I started to take other jobs outside of being a farmhand. These included being a line cook and dishwasher at a local restaurant where my aunt-in-law was a chef, and working for a company that did paving and seal-coating before I was 18. Cooking wasn’t bad. The restaurant was actually a mildly fancy steak and seafood place you needed a membership to eat. Thinking back, I am honestly confused how that many people where I grew up could afford a membership that would make that business model work.

Paving and seal-coating was comparable to the level of effort of farming. It was safer in the short term than farming (in a “I probably will not be maimed way”), but breathing in the fumes I am guessing would be worse long term. I do not remember my hourly wage (it may have just been the minimum wage), but I did get a bunch of overtime in summer which was nice.

When I went to college I then did various jobs as well. I worked at Kentucky Fried Chicken at the cash register at one point. On campus, jobs intended for undergraduate college students through the university, I worked as a carpenter building theater sets and as a tutor for the stats classes in the criminal justice department.

The last time I moved home though over summer break (after sophomore year at Bloomsburg) I got a job stacking uncut floorboards in a factory. This was my first factory job – we would stand on a conveyor belt down the machine that cut the boards. Our palettes would be stacked with a single size and we would rotate sizes after a while. So sometimes I am stacking 4 inch wide boards, another time I am stacking 12 inch wide boards, etc.

This was monotonous and hard work, but not crazy bad and I enjoyed my coworkers. Stacking hay bales was harder. Many of the people I worked with were on work release from the county jail. I got paid $8 an hour, they only got $4. But they were happy to do the work and not be sitting in jail. It is absurd that they did not receive the same pay. Most of them were in jail for DUIs.

After about a month of doing this job I had inflammation in my elbow. My elbow only had minor pain, but my arm would fall asleep when I slept and I would wake up in quite a lot of pain from that (so lack of sleep was really the bigger issue than my arm hurting). I asked to take the day off to go to the doctor (one of my coworkers said he had the same issue, but still worth working rather than sitting in jail). The owner mistakenly thought I was trying to get workers compensation, so told me no to the day off and I would be fired if I went to the doctor. (That was not my intention, I just wanted to get some pain medication.) So I just ended up quitting. Doctor said it was “tennis elbow,” and that it would only go away with rest, so I would have needed to quit anyway.

I then moved back to Bloomsburg for the summer (the town I grew up in was incredibly boring, hanging out in an empty college town was certainly a step up for a 20 year old). I got a job at a ribbon factory in Berwick (a neighboring town) for $11 dollars an hour. You could consider Berwick a doppelganger for Middletown as Vance describes it.

Working at the ribbon factory was barely manual labor. I would sit on a conveyor belt and either count bags to fill in boxes, or look at ribbons as they rolled by only to throw away malformed ones. This was soul sucking work. There was only one other younger person I befriended while working there, most everyone else was middle aged. I wondered to myself how these people survived this existence. Of all the jobs I have had in my lifetime this was easily the worst.

Despite having the “work every day” mentality from when I was young, I just stopped going to this job alittle over a month after I started worked there. I did not tell my boss I quit, just literally stopped going. It wasn’t a hard job, the opposite, it was easy. A second grader could do the job.

So this is often what I think about when people say “the factory jobs are going away,” or Vance’s example that the tile company that cannot get people to work for them. You have a choice, break your back or watch ribbons go by on a conveyor belt. I recognize that having people just take a paycheck from the government is not good for people long term. I think people need something to work to strive for and take pride in. Working in a factory is not that.

It was at this point (in between being a sophomore and junior) I went from just doing the bare minimum to get by for my classes at Bloomsburg to being actively engaged in my course work and putting in real effort. Working at the ribbon factory was the nadir. Having the more advanced upper level classes did make me more engaged. It was around this time I began working as a security guard for the university. I made the most for that position of any job I had at that point in my life, $13 per hour.

I worked night shift for the security guard job. There was one point in my schedule where I needed to stay awake for over 48 hours. I would get off at 4AM, and if I went to sleep I would not wake back up (even with an alarm) for a 9AM class. So I would have to stay up, go to class, and then sleep for an extended period of time.

By my senior year of college, I was back in the working crazy all the time stage. At one point I had three jobs (college tutor for statistics classes, working as a security guard, and even had work to help with statistical analysis for the local school). This is in addition to being a full time student.

Trailers and Going to Grad School

In the summer between junior and senior year at Bloomsburg University, I had an internship with state parole. The officer I shadowed had an area that covered the counties around Scranton, so a mix of rundown rust belt towns (like Berwick) but also more rural areas. There were more people living in trailers on single lots than trailers in trailer parks.

The first house call I shadowed was an individual who only had a few more weeks on his sentence. He was very nervous and sweaty (which was my first house call I witnessed, so I did not think much of it at the time). The parole officer had the individual do a urine sample. I found out later that he failed (heroin), and the parole officer said the reason he was nervous is that they take multiple officers to arrest individuals, so he likely thought he was being taken back to prison. It probably was not the failed drug test, which him being that close to finished would just result in a warning.

Shadowing parole was an eye opening experience. I had lived in rural areas, but I had been mostly sheltered from the decrepit lifestyle some people lived. Some houses the parole officer would make his parolees meet us outside, as he would refuse to go inside the house. People would not let their animals out (so the house smelled of the strong ammonia scent from the urine, much worse than the barn). Houses with kids sleeping on mattresses in the living room and fleas. I knew people like this existed, but seeing them firsthand was different.

Matthew Desmond’s book, Evicted, in which he follows the lifestyle of various individuals trying to scrape by in Milwaukee, reminded me quite a bit of my time with parole. A bunch of people who could not make two good decisions in a row if their life depended on it. I presume getting drunk before your scheduled parole visit or not cleaning your sink and getting evicted are consequences of the same inability to make good long term decisions.

All of those individuals had no fundamental reason they needed to live in filthy conditions. You can take the cat litter out. People dig on trailers and trailer parks, but living in a trailer is not fundamentally bad. It is no different than living in a small apartment.

So I had planned on applying to be a parole officer after this experience. It was likely I would not be assigned the field area I did my internship, but either assigned a position in a prison (they have officers inside state prison help with offenders release plans). Or maybe be assigned in the field in the Philadelphia area. So I took the civil service exam to be a parole officer in the fall semester of my senior year.

I had made a mistake though, I had taken the exam too early. They called and asked if I could go to training in the spring. I said I could not do that, as I wanted to finish my degree. (The parole officer I shadowed had quit one semester early, and he did say he regretted that.) The civil service exams in Pennsylvania had a rule that you could not re-take them within a certain time frame, so I did not have the ability to take them again when the timing would have worked out better.

So at this point in the fall semester I decided to apply to graduate school, not really knowing what I was getting into. The other option was applying to different police departments in the spring (I remembered Allentown and Baltimore had come to classes to recruit). I did well on the GREs, and was accepted into SUNY Albany (my top choice) quite early. I had also applied to Delaware. SUNY was somewhat unique, in that you could apply straight into the PhD program from undergrad. I did not realize this at the time, but this was very fortunate, as PhD programs were funded. I would have racked up a bill for the masters degree at Delaware.

When I ended up getting into grad school at SUNY Albany, Julie Horney called me in the afternoon of one of my binge sleep sessions in my night security guard schedule to say I was invited for orientation. I do not remember what I said on the phone call, I remember getting up later and not being sure if that was a dream or it had really happened.

Later that spring when I visited Albany I headed straight up from my night shift to the orientation day. I remember being confused, thinking this was an interview and still not 100% guaranteed I was in. I said something to Dana Peterson and her response was along the lines of “you do not have to worry Andy, you have gotten in”.

Going to Albany ended up being one of the greatest decisions of my life. The academic atmosphere of a PhD program was totally different and fundamentally changed me. It would be a lie though if I said it was something I intentionally pursued, as opposed to a series of random happenstance factors in my life that pushed me to do that. I really had no clue what I was getting into.

Drugs

Growing up in Bradford county I had very little exposure to drugs. My friends and I would pinch beer and liquor from our parents on occasion, but in the grand scheme of things I was a pretty square kid. I knew some individuals smoked pot, but I did not know anyone in high school who did heroin or other opiates. Unlike Vance, serious drug or alcohol abuse was not something I personally witnessed in my family.

It was around the beginning of 2000’s that the gradual increase in opioid overdose deaths started to happen across the US. In the town with the ribbon factory, Berwick, heroin usage was an issue. My sister in law (who grew up in Bloomsburg) ultimately died due to long term heroin usage. I worked on a grant to help Columbia county analyze the Monitoring the Future survey (a behavioral health survey that all students in Pennsylvania took). There were a few students in each grade cohort (as young as seventh grade) who stated they used heroin.

If you look at maps of drug overdose deaths in this time period, you can see a cluster start to form around the Scranton area by around 2005. This area in western Pennsylvania is close enough to commute to New York City. It is possible the supply networks for heroin from more urban areas were established that way.

I suspect it is also related to working labor jobs though. One reason my grandfather retired from farming was because he had chronic shoulder pain. I would drive him to his visits to the VA hospital in Harrisburg on occasion, in a full size van, when I was a teenager. He was prescribed oxycodone, but knew that it was addictive, so he would take them one week and then abstain the following week.

I do not know how you work these labor jobs and not have some type of chronic pain. It is hard for me to imagine working these jobs for thirty years without them killing you. I recently had a kidney stone, and I was stuck waiting for several hours in the ER before I was able to get a fentanyl drip. Went from pulsating pain and throwing up to relief almost instantly. I was prescribed oxycodone to use at home before I passed the stone. I did not take it.

When I was a professor at the University of Texas at Dallas, a well respected qualitative criminologist came to give a talk. He discussed his recent work, an ethnography of methamphetamine users in rural Alabama. His main thesis in the talk, which is something I think only an academic sociologist could come up with, is that women were influenced by their boyfriends to begin taking meth. (This is true, but you could do the same talk and say men were introduced to drugs by their female partners.) I asked at the end of his talk whether he thought his findings extended to heroin users, and his response was that heroin is an urban drug problem.

Another main point of his talk was to take pictures of his subjects in realistic settings. The idea being that most drug users are depicted in the media in a negative light. So we should take pictures of them so people do not think they are monsters. The lecture was mostly pictures of peoples trailers, and pillow talk his interviews discussed that resulted in snickers from the audience at various points.

Taking pictures of trailers does not humanize poor people. It makes you look like you are Steve Irwin describing wildlife in the outback – I personally thought it was incredibly degrading.

The idea that you shouldn’t show the negative impacts of hard drug use is such a comically white knight perspective I am not sure whether it makes me want to laugh or cry. I did not take that oxycodone because I have seen, with my own eyes, what happens to people who are addicted to opioids. The most recent wave of fentanyl laced with xylazine can result in open sores and losing phalanges. I do not believe those people are monsters (does anyone?) but it is grotesque what drug addiction can do to people.

I suggest to read Vance’s discussion of his life growing up, over any sociologist, because of this. When what we have to offer is “some people think people who take drugs are monsters” and “you should take nice pictures of them”, people are well justified to ignore academics as out of touch and absurd.

Lament

If I had the chance to sit down with Vance, one thing I would ask him is his choice of elegy in the title of his book. I name my blogpost lament. I have moved on with my life, my work focuses on working with police departments on public safety. This work entirely focuses on urban areas.

I do not think anything my research relates to could, even at the margins, help materially improve the lives of people I grew up with. There are things I think could marginally improve individuals outcomes, such as getting better advice about colleges. But there is nothing reasonable to be done to prevent traffic accidents or improve farm safety. I mean you could attempt stricter safety regulations but realistically enforcing them is another matter.

Vance in his book does not really talk about politics, but towards the end gives some examples of policies he thinks could on the margins help – such as restricting the number of section 8 vouchers in a neighborhood (to prevent segregation). He is right that you cannot magically subsidize factory jobs and all will be well – it will not. These jobs, as I said above, suck.

I view the current state of rural America, as I experienced it, via Murray’s Bell Curve. Specifically the idea of brain drain, and more broadly intellectual segregation. One of Murray’s theses was that, historically, rural communities had a mix of intelligent (and not so intelligent) individuals. Gradually over time, the world has become more globalized, so it is easier to move from the farm to industrialized areas.

This results in intelligent people – those who can go to college and keep a job – to move away. What is left over is a higher proportion of the types of people Vance more focuses on in his book – individuals with persistent life problems. Criminologists will recognize this process as fundamentally the same with urban blight areas described by the Chicago school of crime. Vance focuses on the culture that is the end result of this demographic process – the only people who don’t move away are the ones who live hard lives that Vance describes in his book.

Automation is the long term progression of farming in America. Farming in rural areas will eventually be just the minimal number of humans needed to oversee the automated machinery. I am not sure the town I grew up in will exist in 100 years.

And this to me is not a bad thing. I left to pursue opportunities that were not available to me if I stayed in Bradford county. I am not sad that I do not need to sling square bales of hay. My suggestion, to help give better advice to students about pursuing college and careers, will only hasten the demise of rural areas, not save them. This is the lament.

Politics aside, I found Vance’s biography of his life growing up worth reading. If you find my stories interesting, I suspect you will find his as well.

Aoristic analysis, ebooks vs paperback, website footer design, and social media

For a few minor updates, I have created a new Twitter/X account to advertise Crime De-Coder. I do not know if there is some setting that people ignore all unverified accounts, but would appreciate the follow and reshare if you are still on the platform.

I also have an account on LinkedIn, and sometimes comment on the Crime Analysis Reddit.

I try to share cool data visualizations and technical posts. I know LinkedIn in particular can be quite vapid self-help guru type advice, which I avoid. I know being more technical limits the audience but that is ok. So appreciate the follow if you are on those platforms and resharing the work.

Ebooks vs Paperbacks

Part of the reason to start X account back up is to just try more advertising. I have sold not quite 80 to date (including pre-sales). My baseline goal was 100.

For the not pre-sales, I have sold 35% ebooks and 65% paperbacks. So spending some time to distribute your book paperback seems to me to be worth it.

Again feel like most academics who publish technical books self-publishing is a very good idea. So read the above linked post about some of the logistics of self-publishing.

Aoristic analysis in python

On the CRIME De-Coder blog, check out my post on Aoristic analysis. It has links to python code on github for those who just want the end result. It has several methods though to do hour of day and hour by day of week breakdowns. With the ability to do it by categories in data. And post hoc generate a few graphs. I like the line graphs the best:

But the more common heatmap I can understand why people like it

Website Design

I have a few minor website design updates. The homepage is more svelte. Wife suggested that it should be easier to see what I do right when you are on homepage, so put the jumbotron box at the bottom and the services tiles (with no pictures) at the top.

It does not look bad on mobile either (I only recently figured out that in Chrome’s DevTools they have a button to do turn on mobile view, very helpful!)

Final part is that I made a footer for my pages:

I am not real happy with this. One of the things you notice when you start doing web-design is everyone’s web-page looks the same. There are some basic templates for WordPress or Wix (and probably other CMS generators). Here is Supabases’s footer for example:

And now that I have shown you, you will see quite a few websites have that design. So I did the svg links to social media, but I may change that. (And go with no footer again, there is not a real obvious need for it.) So open to suggestions!

In intentionally made many of the decisions for the way the Crime De-Coder site looks not only to make it usable but to make it at least somewhat different. Avoid super long scrolls, sticky header (that still works quite well on phones). The header is quite dense with many sub-pages (I like it though).

I think alot of public sector agencies that are doing data dashboards now do not look very nice. Many are just iframed Tableau dashboards. If you want help with those data visualizations embedded in a more organic way in your site, that is something Crime De-Coder can help with.

LinkedIn posting and link promotion: impression vs reality

For folks who are interested in following my work, my advice is either email or RSS. This site you should see ‘follow blog via email’ and the RSS link on the right hand side. I sometimes post a note here on crimede-coder stuff, but not always, so just do the same (RSS, or use if-this-than-that service to turn RSS into email) on that site if you want to keep abreast of all my posts.

Another way to follow my work though is on LinkedIn. So feel free to connect with me or follow my content:

I post short form blogs/reactions on occasion (plus share my other posts/work). Social media promoting your work is often cringy, but I try to post informative and technical content (and not totally vapid self-help stuff). And I write things for people to view them, so I think it is important to promote my work.

One of the most recent things I have heard a few influencers mention how embedding links directly in LinkedIn posts they think de-promotes their work. See this discussion on HackerNews, or this person’s advice for two examples.

I formed a few opinions based on my regular postings over the past year+, but impressions of things over extended periods can often be wrong. So I actually downloaded the data to see! In terms of the thing about links and being de-promoted, I don’t see that in my data at all – this is a table of impressions broken down by the domain I linked to (for domains with at least 2+ posts over the prior year):

I did notice however two different domains – youtube and newsobserver (the Raleigh newspaper) tend to not have much engagement. So it may be certain domains are not as promoted. It is of course possible that particular content was not popular (I thought my crim observations on the Mark Rober glitterbombs would be more popular, but maybe not). But I think this is a large enough sample to at least give a good hint that they are not promoted in the same way my other links are. My no URL posts have slightly less engagement than my posts to this blog or the crimede-coder site, so overall the idea that links are penalized doesn’t appear to me to be true without more conditional statements.

Data is important, as again I think impressions can be bad for things that repeatedly happen over a long period of time. So offhand I though Tue/Thu I had less engagement, so stopped posting on those days. What does the data say?

| Day | Avg Impressions | Number Posts |
----------------------------------------
| Sun |         1,860   |         32   |
| Mon |         1,370   |         44   |
| Tue |         1,220   |         35   |
| Wed |         1,273   |         41   |
| Thu |         1,170   |         34   |
| Fri |         1,039   |         39   |
| Sat |         1,602   |         38   |

The data says Sun/Sat have higher impressions, and days of the week are lower. If anything Friday is the low day, not Tuesday/Thursday.

I have had other examples of practitioners argue with me in crime analysis or academic circles in my career that strike me as similar. In that perceptions (that people strongly believed in), did not align with actual data. So I just don’t think this idea of ‘taking the average of impressions over posts over the past year’ is something that you can really know just based on passive observation. Your perceptions are likely to be dominated by a few examples, which may be off the mark. Ditto for knowing how much crime happens at a particular location, or knowing how much different things impact survival rates for gunshots.

It is definately possible that my small page experience (currently at a few over 2700 followers on LinkedIn) is not the same as the large influencers. But without looking at actual data, I don’t trust peoples instincts on aggregate metrics all that much.

Another meta LinkedIn tip (I received from Rob Fornango) is to post tall images, so when people are scrolling your content stays on the screen longer. Here is an example post from Rob’s

It is hard for me to test this though, the links on LinkedIn sometimes expand the link to bigger images and sometimes not (and sometimes I edit the image it displays as well). And I think after a while they turn them into tiny images as well. Someone tell the folks on LinkedIn to allow us to use markdown!

So I mean I could spend a full time job tinkering, but looking at the data I have at hand I don’t plan on changing much. Just posting links to my work, and having an occasional comment as well if I think it will be of interest to more people than myself. Content over micro optimization that is (since the algorithm could change tomorrow anyway).

One of the things I have debated on is buying adverts to promote my python book. I think they are just on the cusp of a net loss though given clickthrough rates and margins on my book. So for example, LinkedIn estimates if I spend $140 to promote a post, I will get 23-99 clicks. My buy rate on the site is around 5%, so that would generate 1-5 book sales. My margins are not that high on a sale, so I would not make money on that.

I have been wondering if I posted direct adverts on Reddit for the book to the learn python forum how that would go. But I think it would be much of the same as LinkedIn (too low of clickthrough to make it worth it). But if I do those tests in the future will write up a blog post on my experience!


LinkedIn I can only find how to download my stats on the company crimede-coder page, not my personal page. Here is the script I used to convert the LinkedIn short urls back to the original domains I linked, plus the analysis:

'''
python Code to parse the domains from my
crimede-coder linkedin posts
run on 7/24/2024, so only has posts
from that date through the prior year
'''

import requests
import traceback
import pandas as pd
import time
from urllib.parse import urlparse

errors = {}

def get_link(url):
    time.sleep(2)
    try:
        res = requests.get(url)
    except Exception:
        er = traceback.format_exc()
        print(f'Error message is \n\n{er}')
        return ''
    if res.ok:
        it = res.text.split()
        it = [i for i in it if i[:4] == 'href']
        rl = it[3]
    else:
        print(f'Not ok, {url}, response: {r2.reason}')
        errors[url] = res
        return ''
    return rl[6:].replace('/">','').replace('">','')

# more often than not, linkedin converts the link in the post
# to a lnkd.in short url
def get_refer(txt):
    rs = txt.split()
    rs = [i for i in rs if i[:8] == 'https://']
    if rs:
        url = rs[0]  # if more than one link, only grabs the first
        if url[:15] == 'https://lnkd.in':
            return get_link(url)
        else:
            return url
    else:
        return ''


# this is data exported from LinkedIn on my Crime De-Coder page only goes back one year
df = pd.read_excel('crime-de-coder_content_1721834275879.xls',sheet_name='All posts',header=1)

# only need to keep a few columns
keep_cols = ['Post title','Post link','Created date','Impressions','Clicks','Likes','Comments','Reposts']
df = df[keep_cols].copy()

df['url'] = df['Post title'].apply(get_refer)

def domain(url):
    if url == '':
        return 'NO URL'
    else:
        pu = urlparse(url)
        return pu.netloc

df['domain'] = df['url'].apply(domain)

# caching out file, so do not need to reget url info
df.to_csv('ParseInfo.csv',index=False)

# Can aggregate to domain
agg_stats = df.groupby('domain',as_index=False)['Impressions'].describe()
agg_stats.sort_values(by=['count','mean'],ascending=False,ignore_index=True,inplace=True)
count_cols = list(agg_stats)[1:]
agg_stats[count_cols] = agg_stats[count_cols].fillna(0).astype(int)

# This is a nice way to print/view the results in terminal
print('\n\n' + agg_stats.head(22).to_markdown() + '\n\n')

Some notes on self-publishing a tech book

So my book, Data Science for Crime Analysis with Python, is finally out for purchase on my Crime De-Coder website. Folks anywhere in the world can purchase a paperback or epub copy of the book. You can see this post on Crime De-Coder for a preview of the first two chapters, but I wanted to share some of my notes on self publishing. It was some work, but in retrospect it was worth it. Prior books I have been involved with (Wheeler 2017; Wheeler et al. 2021) plus my peer review experience I knew I did not need help copy-editing, so the notes are mostly about creating the physical book and logistics of selling it.

Academics may wish to go with a publisher for prestige reasons (I get it, I was once a professor as well). But it is quite nice once you have done the legwork to publish it yourself. You have control of pricing, and if you want to make money you can, or have it cheap/free for students.

Here I will detail some of the set up of compiling the book, and then the bit of work to distribute it.

Compiling the documents

So the way I compiled the book is via Quarto. I posted my config notes on how to get the book contents to look how I wanted on GitHub. Quarto is meant to run code at the same time (so works nicely for a learning to code book). But even if I just wanted a more typical science/tech book with text/images/equations, I would personally use Quarto since I am familiar with the set up at this point. (If you do not need to run dynamic code you could do it in Pandoc directly, not sure if there is a way to translate a Quarto yaml config to the equivalent Pandoc code it turns into.)

One thing that I think will interest many individuals – you write in plain text markdown. So my writing looks like:

# Chapter Heading

blah, blah blah

## Subheading

Cool stuff here ....

In a series of text files for each chapter of the book. And then I tell Quarto quarter render, and it turns my writing in those text files into both an Epub and a PDF (and other formats if you cared, such as word or html). You can set up the configuration for the book to be different for the different formats (for example I use different fonts in the PDF vs the epub, nice fonts in one look quite bad in the other). See the _quarto.yml file for the set up, in particular config options that are different for both PDF and Epub.

One thing is that ebooks are hard to format nicely – if I had a book I wanted to redo to be an epub, I would translate it to markdown. There are services online that will translate, they will do a bad job though with scientific texts with many figures (and surely will not help you choose nice fonts). So just learn markdown to translate. Folks who write in one format and save to the other (either Epub/HTML to PDF, or PDF to Epub/HTML) are doing it wrong and the translated format will look very bad. Most advice online is for people who have just books with just text, so science people with figures (and footnotes, citations, hyperlinks, equations, etc.) it is almost all bad advice.

So even for qualitative people, learning how to write in markdown to self-publish is a good skill to learn in my opinion.

Setting up the online store

For awhile I have been confused how SaaS companies offer payment plans. (Many websites just seem to copy from generic node templates.) Looking at the Stripe API it just seems over the top for me to script up all of my own solution to integrate Stripe directly. If I wanted to do a subscription I may need to figure that out, but it ended up being for my Hostinger website I can set up a sub-page that is WordPress (even though the entire website is not), and turn on WooCommerce for that sub-page.

WooCommerce ends up being easy, and you can set up the store to host web-assets to download on demand (so when you purchase it generates a unique URL that obfuscates where the digital asset is saved). No programming involved to set up my webstore, it was all just point and click to set things up one time and not that much work in the end.

I am not sure about setting up any DRM for the epub (so in reality people will purchase epub and share it illegally). I don’t know of a way to prevent this without using Amazon+Kindle to distribute the book. But the print book should be OK. (If there were a way for me to donate a single epub copy to all libraries in the US I would totally do that.)

I originally planned on having it on Amazon, but the low margins on both plus the formatting of their idiosyncratic kindle book format (as far as I can tell, I cannot really choose my fonts) made me decide against doing either the print or ebook on Amazon.

Print on Demand using LuLu

For print on demand, I use LuLu.com. They have a nice feature to integrate with WooCommerce, the only thing I wish shipping was dynamically calculated. (I need to make a flat shipping rate for different areas around the globe the way it is set up now, slightly annoying and will change the profit margins depending on area.)

LuLu is a few more dollars to print than Amazon, but it is worth it for my circumstance I believe. Now if I had a book I expected to get many “random Amazon search buys” I could see wanting it on Amazon. I expect more sales will be via personal advertising (like here on the blog, social media, or other crime analyst events). My Crime De-Coder site (and this blog) will likely be quite high in google searches for some of the keywords fairly quickly, so who knows, maybe just having on personal site is just as many sales.

LuLu does has an option to turn on distribution to other wholesalers (like Barnes & Noble and Amazon) – have not turned that on but maybe I will in the future.

LuLu has a pricing calculator to see how much to print on their website. Paperback and basically the cheapest color option for letter sized paper (which is quite large) is just over $17 for my 310 page book (Amazon was just over $15). For folks if you are less image heavy and more text, you could get away with a smaller size book (and maybe black/white) and I suspect will be much cheaper. LuLu’s printing of this book is higher quality compared to Amazon as well (better printing of the colors and nicer stock for the paperback cover).

Another nice thing about print on demand is I can go in and edit/update the book as I see fit. No need to worry about new versions. Not sure what that exactly means for citing the work (I could always go and change it), you can’t have a static version of record and an easy way to update at the same time.

Other Random Book Stuff

I purchased ISBNs on Bowker, something like 10 ISBNs for $200. (You want a unique ISBN for each type of the book, so you may want three in the end if you have epub/paperback/hardback.) Amazon and LuLu though have options to have them give you an ISBN though, so that may have not been necessary. I set the imprint to be my LLC though in Bowker, so CRIME De-Coder is the publisher.

You don’t technically need an ISBN at all, but it is a simple thing, and there may be ways for me to donate to libraries in the future. (If a University picks it up as a class text, I have been at places you need at least one copy for rent at the Uni library.)

I have not created an index – I may have a go at feeding my book through LLMs and seeing if I can auto-generate a nice index. (I just need a list of key words, after that can just go and find-replace the relevent text in the book to fill in so it auto-compiles an index.) I am not sure that is really necessary though for a how-to book, you should just look at the table of contents to see the individual (fairly small) sections. For epub you can just doing a direct text search, so not sure if people use an index at all in epubs.

Personal Goals

So I debated on releasing the book open source, I do want to try and see if I can make some money though. I don’t have this expectation, but there is potential to get some “data science” spillover, and if that is the case sales could in theory be quite high. (I was surprised in searching the “data science python” market on Amazon, it is definitely not saturated.) Personally I will consider at least 100 sales to be my floor for success. That is if I can sell at least 100 copies, I will consider writing more books. If I can’t sell 100 copies I have a hard time justifying the effort – it would just be too few of people buying the book to have the types of positive spillovers I want.

To make back money relative to the amount of work I put in, I would need more than 1000 sales (which I think is unrealistic). I think 500 sales is about best case, guesstimating the size of the crime analyst community that may be interested plus some additional sales for grad students. 1000 sales it would need to be in the multiple professors using it for a class book over several years. (Which if you are a professor and interested in this for a class let me know, I will give your class a discount.)

Another common way for individuals to make money off of books is not for sales, but to have training’s oriented around the book. I am hoping to do more of that for crime analysts directly in the future, but those opportunities I presume will be correlated with total sales.

I do enjoy writing, but I am busy, so cannot just say “I am going to drop 200 hours writing a book”. I would like to write additional python topics oriented towards crime analysts/criminology grad students like:

  • GIS analysis in python
  • Regression
  • Machine Learning & Optimization
  • Statistics for Crime Analysis
  • More advanced project management in python

Having figured out much of this grunt work definitely makes me more motivated, but ultimately in the end need to have a certain level of sales to justify the effort. So please if you like the blog pick up a copy and tell a friend you like my work!

References

My word template for Quarto

I have posted on Github my notes on creating a word template to use with quarto. And since Quarto is just feeding into pandoc, those who are just using pandoc (so not doing intermediate computations), should maybe find that template worthwhile as well.

So first, why word? Quarto by default looks pretty nice for HTML. That is fine for them to prioritize that, but the majority of reports I want to use quarto for HTML is not the best format. Many times I want a report that can be emailed in PDF and/or printed. And sometimes I (or my clients) want a semi-automated report that can be edited after the fact. In those cases word is a good choice.

Editing LaTeX is too hard, and I am pretty happy with the this template for small reports. I will be sharing my notes on writing my python book in Quarto soonish, but for now wanted to share how I created a word template.

Note some of the items may seem gratuitous (why so many CRIME De-Coder logos?). Part of those are just notes though (like how to insert an image after your author name, I have done this to insert my signature in reports for example). The qmd file has most of the things I am interested in doing in documents, such as how to format markdown tables in python, doings sections/footnotes, references, table/figure captions, etc.

I do like my logo though in the header (it is hyperlinked even, so in subsequent PDFs if you click the logo it will go to my website), and the footer page numbers I commonly need in reports as well. And my title page and TOC do not look bad as well IMO. I am not one to discuss fonts, but I like small caps for titles and the Verdana font is nice to make it look somewhat different.

Creating the Template

So first, you can do from the command line:

quarto pandoc -o custom-reference-doc.docx --print-default-data-file reference.docx

From there, you should edit that reference.docx file to get what you want. So for example, if you want to change the font used for code snippets, in Word you can open up Styles, and on the right hand side select different elements and edit them:

Here for example to change the font for code snippets, you modify the HTML code style (I like Consolas):

There ended up being a ton of things I edited, I did not keep a list. Offhand you will want to modify the Title, Headings 1 & 2, First Paragraph, Body Text. And then you can edit things like the page numbers and header/footer.

So when rendering a document, you can sometimes click on the element in the rendered document and figure out what style it inherits from. Here for example you can see in the test.docx file that the quote section uses the “Block Text” style:

This does not always work though, and it can take some digging/experimentation in the original template file to get the right style modifier. (If you are having a real hard problem, convert the word document format to .zip, and dig into the XML documents. You can see the style formats in inherits from in the XML tree.) It doesn’t work for the code segments for example. Do not render a document and edit the style in that document, only edit the original --print-default-data-file reference.docx that was generated from the command line to update your template.

I have placed a few notes in the readme on Github, but one of my main things was making tables look nice. So this plays nicely with markdown tables, which I can use python to render directly. Here is an example of spreading tables across multiple pages.

One thing to note though is that this has limits – different styles are interrelated, so sometimes I would change one and it would propagate errors to different elements. (I can’t figure out how to change the default bullets to squares instead of circles for example without having bullets in places they should not be in tables – try to figure that one out. I also cannot figure out how to change the default font in tables, I would use monospace, without changing the font for other text elements in normal blocks.) So this template was the best I could figure without making other parts broken.

I have a few notes in the qmd file as well, showing how to use different aspects of markdown, as well as some sneaky things to do extra stuff (like formatting fourth level headings to produce a page break, I do not think I will need that deep of headings).

Even for those not using Quarto for computational workflows, writing in markdown is a really useful skill. You write in plain text, and can then have the output in different formats. Even for qualitative folks (or people in industry creating documents), I think many people would be well served by writing content in plain text markdown, and then rendering to whatever output they wanted.