Saving data files in wheel packages

As a small update, I continue to use my retenmod python package as a means to illustrate various python packaging and CICD tricks. Most recently, I have added in an example of saving local data files inside of the wheel package. So instead of just packaging the .py files in the wheel package, it also bundles up two data files: a csv file and a json data file for illustration.

The use case I have seen for this, sometimes I see individual .py files in peoples packages that have thousands of lines – they just are typically lookup tables. It is is better to save those lookup tables in more traditional formats than it is to coerce them into python objects.

It is not too difficult, but here are the two steps you need:

Step 1, in setup.cfg in the root, I have added this package_data option.

[options.package_data]
* = *.csv, *.json

Step 2, create a set of new functions to load in the data. You need to use pkg_resources to do this. It is simple enough to just copy-paste the entire data_funcs.py file here in a blog post to illustrate:

import json
import numpy as np
import pkg_resources

# Reading the csv data
def staff():
    stream = pkg_resources.resource_stream(__name__, "agency.csv")
    df = np.genfromtxt(stream, delimiter=",", skip_header=1)
    return df

# Reading the metadata
def metaf():
    stream = pkg_resources.resource_stream(__name__, "notes.json")
    res = json.load(stream)
    return res

# just having it available as object
metadata = metaf()

So instead of doing something like pd.read_csv('agency.csv') (or here I use numpy, as I don’t have pandas as a package dependency for retenmod). You create a stream object, and the __name__ is just the way for python to figure out all of the relative path junk. Depending on the different downstream modules, you may need to stream.read(), but here for both json and numpy you can just pass them to their subsequent read functions and it works as intended.

And again you can checkout the github actions to see in the works of testing the package, and generating the wheel file all in one go.

If you install the latest via the github repo:

pip install https://github.com/apwheele/retenmod/blob/main/dist/retenmod-0.0.1-py3-none-any.whl?raw=true

And to test out this, you can do:

from retenmod import data_funcs

data_funcs.metadata # dictionary from json
data_funcs.staff() # loading in numpy array from CSV file

If you go check out wherever you package is installed to on your machine, you can see that it will have the agency.csv and the notes.json file, along with the .py files with the functions.

Next on the todo list, auto uploading to pypi and incrementing minor tags via CICD pipeline. So if you know of example packages that do that already let me know!

Andrew Wheeler

Saving data files in wheel packages

Leave a comment Cancel reply

Recent Posts

Categories

Site RSS Feeds

Follow Blog via Email

Top Posts & Pages

Stack Exchange

Andrew Wheeler

Saving data files in wheel packages

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Categories

Site RSS Feeds

Follow Blog via Email

Top Posts & Pages

Stack Exchange