Improving the MLModel Base Class

Or, how to make ML models easier to install, document, and release

In general, I want to show how to make ML code easier to install and use.

When I was doing research for this blog post I found a great blog post by Mateusz Bednarski showing how to build machine learning models as python packages. There are some similarities between what I will show here and that blog post, however, this post focuses more on the deployment of ML models into production systems, whereas Mateusz’s post focuses on packaging the training code.

Making the Iris Model into a Python Package

Another improvement that we can make to the example code is to make it into a full-fledged Python package. This makes it easier to use and install in other projects. The goal here is to treat ML models as just another python package, this makes it possible to leverage all of the tools that Python has for packaging and reusing code. A good guide for structuring python packages can be found here.

A common pattern in ML code is that it is almost always hard to use and deploy.

A common pattern in ML code is that it is almost always hard to use and deploy. This is something that teams that do machine learning know very well, since the code written by a Data Scientist almost always needs to be rewritten by a software engineer before it is possible to deploy it into production systems. Luckily, we have a lot of tools to make the transition from experimental model to production model a smoother process. In this section I will show a few simple steps that will make the example model from the last blog post into an installable Python package. To accomplish this, we will add version information to the package, add a command line interface to the training script, add Sphinx documentation, and add a file to the project. As an additional touch, we will automate the documentation process for the interface of the ML model.

- project_root
- docs (a folder, package documentation will goes in here)
- iris_model (a folder, iris package code will goes in here)
- model_files (a folder, the model files go in here)
- (the prediction code goes here)
- ( the training script goes here)
- tests (unit tests for iris_model package go here)
- (the MLModel base class goes here)
- requirements.txt
- (the package installation script goes here)

Adding Package Versioning

Python packages are usually versioned using semantic versioning. Software packages that use semantic versioning must declare a public API. This is complicated when we want to do versioning of ML models because we have two APIs: the API for making model predictions and the API for training the model. We can deal with this complexity by tying the different components of the semantic version of the package to the prediction API and the training API of the package.

__version_info__ = (0, 1, 0)
__version__ = “.”.join([str(n) for n in __version_info__])

Adding a CLI interface to the Training Script

When building ML models, the training code is often written in jupyter notebooks, while there are ways to automate the training process with notebooks it’s a lot easier to do it through the command line. To do this we will add a simple command line interface to the Iris model training script. We will create the interface using the argparse package and then create a function that calls the train() function when the script is called from the command line.

def argument_parser():
parser = argparse.ArgumentParser(
description=’Command to train the Iris model.’)
parser.add_argument(‘-gamma’, action=”store”, dest=”gamma”,
type=float, help=’Gamma value used to train the SVM model.’)
parser.add_argument(‘-c’, action=”store”, dest=”c”,
type=float, help=’C value used to train the SVM model.’)
return parser
def main():
parser = argument_parser()
results = parser.parse_args()
if results.gamma is None and results.c is None:
elif results.gamma is not None and results.c is None:
elif results.gamma is None and results.c is not None:
train(gamma=results.gamma, c=results.c)
except Exception as e:

Adding Sphinx Documentation

One of the great parts of working in the Python ecosystem is the Sphinx package, which is used for creating documentation from source files. There are a lot of great guides for documenting your package using Sphinx, so I won’t go through it again here. For this blog post, I followed these guides to create a simple documentation page and hosted it on Github pages. Adding documentation is a simple process and it is done by almost all Python packages that have more than a few users. After putting together the basic documentation, I followed a few simple extra steps to fully automate the creation of the documentation for the model.

.. jsonschema:: ../build/input_schema.json
.. argparse::
:module: iris_model.iris_train
:func: argument_parser
:prog: iris_train

Adding a File

Now that we have the ML model code structured as a Python package, versioned, and documented, we’ll add a file to the project folder. The file is used by the setuptools package to install python packages and makes the ML model easily installable in a virtual environment. A great guide for writing the file for your package can be found here.

‘console_scripts’: [
‘iris_train=iris_model.iris_train:main’, ]
mkdir example
cd example
# creating a virtual environment
python3 -m venv venv
# activating the virtual environment, on a mac computer
source venv/bin/activate
# installing the iris_model package from the github repository
pip install git+
>>> from iris_model.iris_predict import IrisModel
>>> model = IrisModel()
>>> model
<iris_model.iris_predict.IrisModel object at 0x105d1e940>
>>> model.input_schema
Schema({‘sepal_length’: <class ‘float’>, ‘sepal_width’: <class ‘float’>, ‘petal_length’: <class ‘float’>, ‘petal_width’: <class ‘float’>})
>>> model.output_schema
Schema({‘species’: <class ‘str’>})
iris_train -c=10.0 -gamma=0.01

Model Metadata in the MLModel Base Class

In the previous blog post we showed an MLModel base class with two required abstract properties: “input_schema” and “output_schema”. These two properties were required to be provided by any class that derived from the MLModel base class and were used to publish schema metadata about the input and output data of the model. In order to keep things simple, I chose not to expose more metadata through class properties, however there are several other pieces of metadata that would be useful to expose to the outside world. For example:

  • qualified_name, a property that returns the qualified name of the model, a qualified name is an unambiguous identifier for the model
  • description, a property that returns a description of the model
  • major_version, this property returns the model’s major version as a string
  • minor_version, this property returns the model’s minor version as a string
class MLModel(ABC):
def display_name(self):
raise NotImplementedError()

def qualified_name(self):
raise NotImplementedError()

def description(self):
raise NotImplementedError()

def major_version(self):
raise NotImplementedError()

def minor_version(self):
raise NotImplementedError()

def input_schema(self):
raise NotImplementedError()

def output_schema(self):
raise NotImplementedError()

def __init__(self):
raise NotImplementedError()

def predict(self, data):
raise NotImplementedError()
# a display name for the model 
__display_name__ = “Iris Model”
# returning the package name as the qualified name for the model __qualified_name__ = __name__.split(“.”)[0]# a description of the model
__description__ = “A machine learning model for predicting the species of a flower based on its measurements.”
from ml_model_abc import MLModel
from iris_model import __version_info__, __display_name__, / __qualified_name__, __description__
class IrisModel(MLModel):
# accessing the package metadata
display_name = __display_name__
qualified_name = __qualified_name__
description = __description__
major_version = __version_info__[0]
minor_version = __version_info__[1]
# stating the input schema of the model as a Schema object
input_schema = Schema({‘sepal_length’: float,
‘sepal_width’: float,
‘petal_length’: float,
‘petal_width’: float})
# stating the output schema of the model as a Schema object
output_schema = Schema({‘species’: str})
def __init__(self):
dir_path = os.path.dirname(os.path.realpath(__file__))
file = open(os.path.join(dir_path,
“model_files”, “svc_model.pickle”), ‘rb’)
self._svm_model = pickle.load(file)
def predict(self, data):
except Exception as e:
raise MLModelSchemaValidationException("Failed to validate input data: {}".format(str(e)))
X = array([data[“sepal_length”],
data[“petal_width”]]).reshape(1, -1)
y_hat = int(self._svm_model.predict(X)[0])
targets = [‘setosa’, ‘versicolor’, ‘virginica’]
species = targets[y_hat]
return {“species”: species}
>>> from iris_model.iris_predict import IrisModel
>>> iris_model = IrisModel()
>>> iris_model.qualified_name
>>> iris_model.display_name
‘Iris Model’

Future Improvements

In this blog post we showed how to do versioning of an ML model using standard conventions of python packages, however the model parameters of the Iris model also need to be versioned over time and metadata about them also needs to be kept. This is a problem that I will tackle in a future blog post.

Coder and machine learning enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store