How much do students pay for textbooks at GSU?

Given I am a big proponent of open data, replicable scientific results, and open access publishing, I struck up a friendship with Scott Jacques at Georgia State University. One of the projects we pursued was a pretty simple, but could potentially save students a ton of money. If you have checked out your universities online library system recently, you may have noticed they have digital books (mostly from academic presses) that you can just read. No limits like the local library, they are just available to all students.

So the idea Scott had was identify books students are paying for, and then see if the library can negotiate with the publisher to have it for all students. This shifts the cost from the student to the university, but the licensing fees for the books are not that large (think less than $1000). This can save money especially if it is a class with many students, so say a $30 book with 100 students, that is $3000 students are ponying up in toto.

To do this we would need course enrollments and the books they are having students buy. Of course, this is data that does exist, but I knew going in that it was just not going to happen that someone just nicely gave us a spreadsheet of data. So I set about to scrape the data, you can see that work on Github if you care too.

The github repo in the data folder has fall 2024 and spring 2025 Excel spreadsheets if you want to see the data. I also have a filterable dashboard on my crime de-coder site.

You can filter for specific colleges, look up individual books, etc. (This is a preliminary dashboard that has a few kinks, if you get too sick of the filtering acting wonky I would suggest just downloading the Excel spreadsheets.)

One of the aspects though of doing this analysis, the types of academic publishers me and Scott set out to identify are pretty small fish. The largest happen to be Academic textbook publishers (like Pearson and McGraw Hill). The biggest, coming in at over $300,000 students spend on in a year is a Pearson text on Algebra.

You may wonder why so many students are buying an algebra book. It is assigned across the Pre-calculus courses. GSU is a predominantly low income serving institution, with the majority of students on Pell grants. Those students at least will get their textbooks reimbursed via the Pell grants (at least before the grant money runs out).

Being a former professor, these course bundles in my area (criminal justice) were comically poor quality. I accede the math ones could be higher quality, I have not purchased this one specifically, but this offers two solutions. One, Universities should directly contract with Pearson to buy licensing for the materials at a discount. The bookstore prices are often slightly higher than just buying from other sources (Pearson or Amazon) directly. (Students on Pell Grants need to buy from the bookstore though to be reimbursed.)

A second option is simply to pay someone to create open access materials to swap out. Universities often have an option for taking a sabbatical to write a text book. I am pretty sure GSU could throw 30k at an adjunct and they would write just as high (if not higher) quality material. For basic material like that, the current LLM tools could help speed the process by quite a bit.

For these types of textbooks, professors use them because they are convenient, so if a lower cost option were available that met the same needs, I am pretty sure you could convince the math department to have those materials as the standard. If we go to page two of the dashboard though, we see some new types of books pop up:

You may wonder, what is Conley Smith Publishing? It happens to be an idiosyncratic self publishing platform. Look, I have a self published book as well, but having 800 business students a semester buy your self published $100 using excel book, that is just a racket. And it is a racket that when I give that example to friends almost everyone has experienced in their college career.

There is no solution to the latter professors ripping off their students. It is not illegal as far as I’m aware. I am just guessing at the margins, that business prof is maybe making $30k bonus a semester forcing their students to buy their textbook. Unlike the academic textbook scenario, this individual will not swap out with materials, even if the alternative materials are higher quality.

To solve the issue will take senior administration in universities caring that professors are gouging their (mostly low income) students and put a stop to it.

This is not a unique problem to GSU, this is a problem at all universities. Universities could aim to make low/no-cost, and use that as advertisement. This should be particularly effective advertisement for low income serving universities.

If you are interested in a similar analysis for your own university, feel free to get in touch with either myself or Scott. We would like to expand our cost saving projects beyond GSU.

Musings on Project Organization, Books and Courses

Is there a type of procrastination via which people write lists of things? I have that condition.

I have been recently thinking about project organization. At work we have been using the Cookie Cutter Data Science project set up – and I really hate it. I have been thinking about this more recently, as I have taken over several other data scientists models at work. The Cookie Cutter Template is waaaay too complicated, and mixes logic of building python packages (e.g. setup.py, a LICENSE folder) with data science in production code (who makes their functions pip installable for a production pipeline?). Here is the Cookie Cutter directory structure (even slightly cut off):

Cookie cutter has way too many folders (data folder in source, and data folder itself), multiple nested folders (what is the difference between external data, interim, and raw data?, what is the difference between features and data in the src folder?) I can see cases for individual parts of these needed sometimes (e.g. an external data file defining lookups for ICD codes), but why start with 100 extra folders that you don’t need. I find this very difficult taking over other peoples projects in that I don’t know where there are things and where there are not (most of these folders are empty).

So I’ve reorganized some of my projects at work, and they now look like this:

├── README.md           <- High level overview of project + any special notes
├── requirements.txt    <- Default python libraries we often use (eg sklearn, sqlalchemy)
├                         + special instructions for conda environments in our VMs
├── .gitignore          <- ignore `models/*.pkl`, `*.csv`, etc.
├── /models             <- place to store trained and serialized models
├── /notebooks          <- I don't even use notebooks very often, more like a scratch/EDA folder
├── /reports            <- Powerpoint reports to business (using HMS template)
├── /src                <- Place to store functions

And then depending on the project, we either use secret environment variables, or have a YAML file that has database connection strings etc. (And that YAML is specified in .gitignore.)

And then over time in the root folder it will typically have shell scripts call whatever production pipeline or API we are building. All the function files in source is fine, although it can grow to more modules if you really want it to.

And this got me thinking about how to teach this program management stuff to new data scientists we are hiring, and if I was still a professor how I would structure a course to teach this type of stuff in a social science program.

Courses

So in my procrastination I made a generic syllabi for what this software developement course would look like, Software & Project Development For Social Scientists. It would have a class/week on using the command prompt, then a week on github, then a few weeks building a python library, then ditto for an R package. And along the way sprinkle in literate programming (notebooks and markdown and Latex), unit testing, and docker.

And here we could discuss how projects are organized. And social science students get exposed to way more stuff that is relevant in a typical data science role. I have over the years also dreamt up other data science related courses as well.

Stats Programming for CJ. This goes through the basics of data manipulation using statistical programming. I would likely have tutorials for R, python, SPSS, and Stata for this. My experience with students is that even if they have had multiple stats classes in grad school, if you ask them “take this incident dataset with dates, and prepare a weekly level file with counts of crimes per week” they don’t know how to do even that simple task (an aggregation). So students need an entry level data manipulation course.

Optimization for Criminal Justice (or alt title Operations Research and Machine Learning for CJ). This one is not as developed as some of my other courses, but I think I could make it work for a semester. I think learning linear programming is a really great skill not taught at all in any CJ program I am aware of. I have some small notes on machine learning in my Research Design class for PhD students, but that could be expanded out (week for decision trees/forests, week for boosting, week for neural networks, etc.).

And last, I have made syllabi for the one credit entry level course for undergrad students, and the equivalent course for the new PhD students, College Prep. These classes I had I don’t think did a very good job. My intro one at Bloomsburg for undergrad had a textbook lol! The only thing I remember about my PhD one was fear mongering over publications (which at that point I had no idea what was going on), and spending the last class with Julie Horney and David McDowell at whatever the place next to the Washington Tavern in Albany was called (?Gingerbread?).

These are of course just in my head at the moment. I have posted my course materials over the years that I have delivered.

I have pitched to a few programs to hire me as a semi teaching professor (and still keep my private sector gig). This set up is not that uncommon in comp sci departments, but no CJ ones I think are interested. Even though I like musing about courses, adjunct pay is way too low to justify this investment, and should be paid to both develop the material as well as deliver the class.

Books

I have similarly made outlines for books over the years as well. One is Data Science for Crime Analysis with Python. I think there is an opening in the crime analysis market to advance to more professional coding, and so a python book would be good. But the market is overall tiny, my high end guesstimates are only around 800, so hard to justify the effort. (It would be mainly just a collection of my blog posts, but all in a nicer format for everyone to walk through/replicate.)

Another is a reader book, Handbook of Advanced Crime Analysis. That may not be needed though, as Cory Haberman and Liz Groff did a recent book that has quite a bit of overlap (can’t find it at the moment, maybe it is not out yet). Many current advanced techniques are scattered and sometimes difficult to replicate, I figured a reader that also includes code walkthroughs would help quite a few PhD students.

And again if I was still in the publishing game I would like to turn my Poisson course notes into a little Sage green book.

If I was still a professor, this would go hand in hand with developing courses. I know Uni’s do sometimes have grants to develop open source teaching materials, and these would probably best fit those molds. These aren’t going to generate revenue directly from sales.

So complaints and snippets on blog posts are all you are going to get for now from me.