Saving data files in wheel packages

As a small update, I continue to use my retenmod python package as a means to illustrate various python packaging and CICD tricks. Most recently, I have added in an example of saving local data files inside of the wheel package. So instead of just packaging the .py files in the wheel package, it also bundles up two data files: a csv file and a json data file for illustration.

The use case I have seen for this, sometimes I see individual .py files in peoples packages that have thousands of lines – they just are typically lookup tables. It is is better to save those lookup tables in more traditional formats than it is to coerce them into python objects.

It is not too difficult, but here are the two steps you need:

Step 1, in setup.cfg in the root, I have added this package_data option.

* = *.csv, *.json

Step 2, create a set of new functions to load in the data. You need to use pkg_resources to do this. It is simple enough to just copy-paste the entire file here in a blog post to illustrate:

import json
import numpy as np
import pkg_resources

# Reading the csv data
def staff():
    stream = pkg_resources.resource_stream(__name__, "agency.csv")
    df = np.genfromtxt(stream, delimiter=",", skip_header=1)
    return df

# Reading the metadata
def metaf():
    stream = pkg_resources.resource_stream(__name__, "notes.json")
    res = json.load(stream)
    return res

# just having it available as object
metadata = metaf()

So instead of doing something like pd.read_csv('agency.csv') (or here I use numpy, as I don’t have pandas as a package dependency for retenmod). You create a stream object, and the __name__ is just the way for python to figure out all of the relative path junk. Depending on the different downstream modules, you may need to, but here for both json and numpy you can just pass them to their subsequent read functions and it works as intended.

And again you can checkout the github actions to see in the works of testing the package, and generating the wheel file all in one go.

If you install the latest via the github repo:

pip install

And to test out this, you can do:

from retenmod import data_funcs

data_funcs.metadata # dictionary from json
data_funcs.staff() # loading in numpy array from CSV file

If you go check out wherever you package is installed to on your machine, you can see that it will have the agency.csv and the notes.json file, along with the .py files with the functions.

Next on the todo list, auto uploading to pypi and incrementing minor tags via CICD pipeline. So if you know of example packages that do that already let me know!

Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: