Busy, busy, busy! Hopefully I will have some time in the near future to write up some more data science posts. But for now, here is a small python snippet to help you build interaction variables between two sets of numpy arrays/dataframes.
import numpy as np
def np_int(a,b):
rows = a.shape[0]
cols = a.shape[1]*b.shape[1]
return np.einsum('ij,ik->ijk', a, b).reshape((rows,cols))
This works for pytorch as well (just replace np.einsum
with torch.einsum
). So coming up (eventually) I will illustrate encoding interaction between hidden layers in a deep learning model. But for now some quicker updates.
Shout out #1: Scott Jacques has continued to push the charge for open access to criminology journals. He has two recent posts about post-prints, and how our main journal (Criminology) has an excessive policy of not allowing authors to post post prints for over two years (whereas the majority of criminology journals allow you to post immediately).
- Open (Access) Letter to Criminologists: Why and How Criminology Should Be Free
- Open Letter to Criminology Units: Sidestep Embargoes on Green Access by Adopting and Taking Advantage of a Rights-Retention Policy
Several aspects of open science are tricky – posting pre-prints/post-prints is not. If we can come together as a group this is an easy, no cost way to greatly improve the accessibility of our work to the greater public.
Shout out #2: The folks at Police Rewired have hosted a hackathon intended to Hack Hate. It is too late to participate, but they will be displaying the results this Sunday. I have not had the chance to participate in any code hackathons, I will need to make a concerted effort in the future to give at least one a shot. (It seems hard, how can you do any work in only a day or a week or two!? But the proof is in the pudding so to speak, I’ve have seen some pretty cool things come out of various hackathons in the past.)
Shout out #3: My workplace, HMS, is involved in a data sharing collaborative called the Digital Health DRC. They also have a hackathon coming up, but this is related to Telehealth use. The Digital Health DRC is pretty cool though, it is basically a way for HMS (and several other private sector entities) to share various datasets with researchers over the globe.
The scope of HMS’s data is somewhat outside the realm of my old stomping grounds of criminology (but not entirely, a big part of my job is identifying potentially fraudulent patterns in claims data). But for folks who have a research question that could be answered using health insurance claims data, this is a good resource to look into. (HMS has pretty good coverage of Medicare claims across the US.)
Finally, I experimented a few days on the site with hosting ads. I managed to serve up a few thousand and make 10 cents. So I will turn that off for now. I debated on putting the button for folks to donate a coffee, but even that is not necessary. (I can afford the few bucks for the domain, and I use dropbox to back up my files anyway, so hosting extra materials is not a big deal.) I rather folks just take my nerdy notes and make your own cool stuff (and share them with me!) I may need to figure out a better hosting solution for images though — google photos is continuing to give me troubles I see (so if you see an image is not coming through feel free to let me know in the comments or send me an email).