pre-commit hooks and github actions

In keeping up with learning about code development in python and R, two things I have added to my retenmod python package recently are pre-commit hooks and github actions. I am not going to give a code example here, you can google pre-commit python black flake and get like a dozen different blog posts to describe the process. Ditto for github actions. I think it is good to do some googling on the processes, and then you can see the final yaml files I have made to do either process:

The idea behind pre-commit, is that it runs a set of commands to check your py files before you commit your changes to github on your local system. So here the pre-commit I created for this package does three things:

  • runs black code formatter for python (formats whitespace nicely where it can)
  • runs flake8 checks (checks whether py files meet pep standards)
  • updates my readme document

Note one thing I want to be able to do but cannot currently with pre-commit is to update all jupyter notebooks in place as well. Unfortunately this does not work, as notebooks generate some time meta-data under the hood (so the file is modified, and fails the test). There is probably a way to fix this (maybe some smart config via nbdime), but it is not a big deal for me at the moment. But it works fine executing notebooks and saving to different files, so the readme hook in that example works just fine. (And won’t be as painful as say for Rmarkdown as for Jupyter with that time meta-data.)

Pre-commit hooks run on your local system, so you might not want to have it do pytests. My retenmod package though is tiny (intentionally did it as a very simple example to learn). At work I do development on both a windows machine and Red Hat virtual machines, and fortunately so far have not had issues with needing different yaml files, although I could potentially seeing that happen in more complicated set ups. Although if that were the case, I would like just not install on windows (I use local windows laptop to edit powerpoint presentations, and use Red Hat for pretty much everything else).

Github actions does not run on your personal machine, but runs after you have pushed your changes to github. And then github basically spins up virtual environments and does whatever tests. This makes sense for package development, to make sure your package can be installed on multiple operating systems. And you can run other unit tests at this stage on those systems as well. But for certain tests that only make sense on your local system (say functions to generate database connections) github actions will not make sense.

Next up I will need to learn whether it makes sense to generate artifacts via github actions (for that Jupyter notebook example where pre-commit is not working so well?). Also I have never really figured out sphinx docs for python. So I will see if I can get that up and running as well for this retenmod package.

Git excluding specific files when merging branches

The other day at work I had a mildly annoying problem – merging only selected files between a test and production branch in github. My particular use case was I had a test branch that needed to only interact with a test database, and the master branch needed to talk to the prod database. So I had particular config files with essentially different SQLAlchemy connection strings, but nothing else. Note I did not want these files ignored, just not merged between branches. (If I edit them I will need to make sure to edit both master & test branches in the end.)

I often use the GitHub desktop GUI to commit changes (when working on my local laptop). You can use the GUI to make a pull request, but when accepting the pull request in the browser I think it is all or nothing. I also need an entire command line solution for when I am working on a remote headerless machine with no GUI as well anyway. So here are my notes on how I solved the issue.

So just for illustration I added a test branch to my Blog_Code repository, and then some junk files just to illustrate. Via the git bash shell, if you navigate to your repository and do git diff master test --name-only, it shows you the different files in the two branches:

So you can see that I have 5 different files in total. Two config files and three different text files. If we do git diff master test -- special_config1, we can see the more specific differences between those two config files:

So you can see that the test branch version (in red) and the master branch version (in green), just have a minor difference. But in the end I want to keep those two files different between the branches, and not merge this config file (along with the other config file).

So here is the particular logic I put together, piping a bunch of commands together:

git switch master
git diff master test --name-only |
grep -v 'special_config1' |
grep -v 'Python/special_config2' |
sed 's/.*/"&"/' |
xargs git checkout test

The first line git switch is pretty self explanatory – I switch to the master branch (I will typically be doing work on test). Second I grab all the files that are different using git diff branch1 branch2, and only print out the file names. Third/Fourth lines I use grep to get rid of my specific config files out of that resulting list of files. You could also do grep -v 'file1.txt|file2.txt' |, but in this case this was giving me fits (maybe due to the forward slash not being escaped the right way for grep?).

The fifth sed line I wrap the files in quotes (if you have a file that has a space it will cause problems otherwise).

Sixth line I then use xargs to pass git checkout from the test environment, and pass in all of my files (minus my two config files). This is advice taken from this blog, just a slicker way to grab all of the files that are different minus a few specific config files. So instead of typing git checkout test file1.txt file2.txt etc. and typing the files by hand, I just grab all the files that are different and check them all out together.

Then once that is done it is the usual to commit the updated files. And then here in the end I switch the active environment back to test.

git commit -m 'Example only merging select files'
git push
git switch test

Maybe one of these days I will entirely ditch the GUI behind. But for now will just have to get by with my limited command line fu compared to these real computer programmers I work with more regularly.