Setting conda environments in crontab

I prefer using conda environments to manage python (partly out of familiarity). Conda is a bit different though, in that it is often set up locally for a users environment, and not globally as an installed package. This makes using it in bash scripts (or on windows .bat files) somewhat tricky.

So first, in a Unix environment, you can choose where to install conda. Then it adds into your .bashrc profile a line that looks something like:

__conda_setup="$('/mnt/miniconda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/lib/miniconda/etc/profile.d/conda.sh" ]; then
        . "/lib/miniconda/etc/profile.d/conda.sh"
    else
        export PATH="/lib/miniconda/bin:$PATH"
    fi
fi
unset __conda_setup

Where here I installed it in /lib. This looks complicated at first glance, but really all it is doing is sourcing the conda.sh script and pre-pending miniconda/bin to the path.

Now to be able to run python code on a regular basis in crontab, I typically have crontab run shell scripts, not python directly, so say that is a file run_code.sh:

#!/bin/sh

# Example shell script
time_start=`date +%Y_%m_%d_%H:%M`
echo "Begin script $time_start"

# Sourcing conda
source /lib/miniconda/etc/profile.d/conda.sh

# activating your particular environment
# may need to give full path, not just the name
conda activate your_env

# if you want to check environment
python --version

# you may need to change the directory at this point
echo "Current Directory is set to $PWD"
cd ...

# run your python script
log_file="main_log_$time_start.txt"
python main.py > $log_file 2>&1

I do not need to additionally add to the path in my experience, just sourcing that script is sufficient. Now edit your crontab (via crontab -e and using the VI editor) to look something like:

20 3 * * * bash /.../run_code.sh >> /.../cron_log.txt 2>&1

Where /.../ is shorthand for an explicit path to where the shell script and cron log lives.

This will run the shell script at 3:20 AM and append all of the stuff. In crontab if you just want conda available for all jobs, I believe you could do something like:

# global environment, can set keys, run scripts
some_key=abc
export some_key
source /lib/miniconda/etc/profile.d/conda.sh

20 3 * * * bash /.../run_code.sh >> /.../cron_log.txt 2>&1

But I have not tested this. If this works, you could technically run python scripts directly, but if you need to change environments you would still really need a shell script. It is good to know to be able to inject environment variables though in the crontab environment.

About the only other gotcha is file permissions. Sometimes in business applications you have service accounts running things, so a crontab as the service account. And you just need to make sure to chmod files so the service account has appropriate permissions. I tend to have more issues with log files by accident than I do conda environments though.

Note for people setting up scheduled jobs on windows, I have an example of setting a conda environment in a windows bat file.

Additional random pro-tip with conda environments while I am here – if you by default don’t want conda to set up new environments in your home directory (due to space or production processes), as well as download packages into a different cache location, you can do something like:

conda config --add pkgs_dirs /lib/py_packages
conda config --add envs_dirs /lib/conda_env

Have had issues in the past of having too much junk in home.

Git excluding specific files when merging branches

The other day at work I had a mildly annoying problem – merging only selected files between a test and production branch in github. My particular use case was I had a test branch that needed to only interact with a test database, and the master branch needed to talk to the prod database. So I had particular config files with essentially different SQLAlchemy connection strings, but nothing else. Note I did not want these files ignored, just not merged between branches. (If I edit them I will need to make sure to edit both master & test branches in the end.)

I often use the GitHub desktop GUI to commit changes (when working on my local laptop). You can use the GUI to make a pull request, but when accepting the pull request in the browser I think it is all or nothing. I also need an entire command line solution for when I am working on a remote headerless machine with no GUI as well anyway. So here are my notes on how I solved the issue.

So just for illustration I added a test branch to my Blog_Code repository, and then some junk files just to illustrate. Via the git bash shell, if you navigate to your repository and do git diff master test --name-only, it shows you the different files in the two branches:

So you can see that I have 5 different files in total. Two config files and three different text files. If we do git diff master test -- special_config1, we can see the more specific differences between those two config files:

So you can see that the test branch version (in red) and the master branch version (in green), just have a minor difference. But in the end I want to keep those two files different between the branches, and not merge this config file (along with the other config file).

So here is the particular logic I put together, piping a bunch of commands together:

git switch master
git diff master test --name-only |
grep -v 'special_config1' |
grep -v 'Python/special_config2' |
sed 's/.*/"&"/' |
xargs git checkout test

The first line git switch is pretty self explanatory – I switch to the master branch (I will typically be doing work on test). Second I grab all the files that are different using git diff branch1 branch2, and only print out the file names. Third/Fourth lines I use grep to get rid of my specific config files out of that resulting list of files. You could also do grep -v 'file1.txt|file2.txt' |, but in this case this was giving me fits (maybe due to the forward slash not being escaped the right way for grep?).

The fifth sed line I wrap the files in quotes (if you have a file that has a space it will cause problems otherwise).

Sixth line I then use xargs to pass git checkout from the test environment, and pass in all of my files (minus my two config files). This is advice taken from this blog, just a slicker way to grab all of the files that are different minus a few specific config files. So instead of typing git checkout test file1.txt file2.txt etc. and typing the files by hand, I just grab all the files that are different and check them all out together.

Then once that is done it is the usual to commit the updated files. And then here in the end I switch the active environment back to test.

git commit -m 'Example only merging select files'
git push
git switch test

Maybe one of these days I will entirely ditch the GUI behind. But for now will just have to get by with my limited command line fu compared to these real computer programmers I work with more regularly.