A RESTful ML Model Service

Building a Service for Deploying ML Models

Introduction

In previous blog posts we’ve built many different types of services that can host ML models, in this blog post we’ll aim at building a reusable service that can host an ML model behind a RESTful API. APIs are called RESTful when they follow the guidelines of the REST standard. REST stands for Representational State Transfer and is a subset of the HTTP protocol that is useful for building web applications. RESTful APIs are widely used in production systems and are an industry standard for integrating different systems.

The features that we want this reusable service to have are simple. We want to be able to install the service code as a package, that is to say through the pip python package manager. We want the API of the service to follow well-established standards, in this case we’ll follow the REST standard for web APIs. We want to be able to configure the service to host any number of ML models. Lastly, we want to make the service be self-documenting, so that we don’t have to create OpenAPI documentation for the service manually.

All of these things are possible and indeed easy to implement because we will be relying on a common interface for all the ML models that the service will host. This interface is the MLModel interface and it is defined in another package that we’ve already created. This interface and the package are fully described in a previous blog post. By requiring every model that we want to host in the service fulfill the requirements of the interface, we are able to write the service one time and reuse it.

The MLModel interface is very simple. It requires that a model class be created that contains two methods: an __init__ method that initializes the model object and a predict method that actually makes a prediction. This approach is very similar to the approach taken by Uber in their internal ML platform, they describe how they structure their ML model code here. SeldonCore is an open source project for deploying ML models which also takes a similar approach which is described here. In this blog post we will leverage the standardization that the MLModel interface makes possible to write a RESTful service that can host any model that follows the standard.

Package Structure

- rest_model_service
- __init__.py
- configuration.py # data models for configuration
- generate_openapi.py # script to generate an openapi spec
- main.py # entry point for service
- routes.py # controllers for routes
- schemas.py # service schemas
- tests
- requirements.txt
- setup.py
- test_requirements.txt

This structure can be seen in the github repository.

FastAPI

We’ll build up our understanding of how the service works by exploring the individual endpoints of the service. An endpoint is simply a point through which the service interacts with the outside world. The service has two types of endpoints: the metadata endpoint and all of the model endpoints. We’ll talk about the metadata endpoint first.

Model Metadata Endpoint

class ModelMetadata(BaseModel):
“””Metadata of a model.”””
display_name: str = Field(description=”The display name of the model.”)
qualified_name: str = Field(description=”The qualified name of the model.”)
description: str = Field(description=”The description of the model.”)
version: str = Field(description=”The version of the model.”)

The code above can be found here.

The ModelMetadata object represents one model that is being hosted by the service. We actually want to be able to host many models within the service, so we need to create a “collection” data model that can hold many ModelMetadata objects:

class ModelMetadataCollection(BaseModel):
“””Collection of model metadata.”””
models: List[ModelMetadata] = Field(description=”A collection of model description.”)

The code above can be found here.

Now that we have the data models, we can build the function that the client will interact with to get the model metadata:

async def get_models():
try:
model_manager = ModelManager()
models_metadata_collection = model_manager.get_models()
models_metadata_collection = ModelMetadataCollection(
**{“models”: models_metadata_collection}).dict()
return JSONResponse(status_code=200,
content=models_metadata_collection)
except Exception as e:
error = Error(type=”ServiceError”, message=str(e)).dict()
return JSONResponse(status_code=500, content=error)

The code above can be found here.

The function does not accept any parameters because we don’t need to select any specific model, we want to return metadata about all of the models. The first thing the function does is instantiate the ModelManager singleton. The ModelManager is a simple utility that we use to manage model instances, we described how it operates in a previous blog post. The ModelManager object should already contain instances of models, and by calling the get_models() method, we can get the metadata that we will return to the client.

The model_metadata_collection object is instantiated using the data model we created above, and returned as a JSONResponse to the client. If anything goes wrong, the function catches the exception object and returns a JSONResponse with the error details and a 500 status code.

Prediction Endpoint

class PredictionController(object):
def __init__(self, model: MLModel) -> None:
self._model = model

The code above can be found here.

The class is initialized with a reference to the instance of the model that it will be hosting. In this way, we can instantiate one controller object for each model that is living inside of the model service. To make predictions with the model, we’ll add a method:

def __call__(self, data):
try:
prediction = self._model.predict(data).dict()
return JSONResponse(status_code=200, content=prediction)
except MLModelSchemaValidationException as e:
error = Error(type=”SchemaValidationError”,
message=str(e)).dict()
return JSONResponse(status_code=400, content=error)
except Exception as e:
error = Error(type=”ServiceError”, message=str(e)).dict()
return JSONResponse(status_code=500, content=error)

The code above can be found here.

The method is a dunder method named “__call__”. This type of dunder method makes an object instantiated from the class behave like a function, which means that once we instantiate it, we’ll be able to register it as an endpoint on the service.

The method is pretty simple, it takes the data object and sends it to the model to make a prediction. It then returns a JSONResponse that contains the prediction and a 200 status code. This response will be returned by the service if everything goes well. If the model raises an MLModelSchemaValidationException, then the method will return a JSONResponse with the 400 status code. For any other exceptions the method will return a 500 status code.

In the next section we’ll see how this class is instantiated in order to allow the service to host any number of MLModel instances. We’ll also see how we use the input and output models provided by each model object to create the documentation automatically.

Application Startup

if os.environ.get(“REST_CONFIG”) is not None:
file_path = os.environ[“REST_CONFIG”]
else:
file_path = “rest_config.yaml”
if path.exists(file_path) and path.isfile(file_path):
with open(file_path) as file:
configuration = yaml.full_load(file)
configuration = Configuration(**configuration)
app = create_app(configuration.service_title,
configuration.models)
else:
raise ValueError(“Could not find configuration file
‘{}’.”.format(file_path))

The code above can be found here.

The default configuration file path is “rest_config.yaml” which is used if no other path is provided to the service. To provide an alternative path, we can set it in the “REST_CONFIG” environment variable. Once we have the yaml file loaded, we can call the create_app() function which creates the FastAPI application object.

def create_app(service_title: str, models: List[Model]) -> FastAPI:
app: FastAPI = FastAPI(title=service_title, version=__version__)
app.add_api_route(“/”,
get_root,
methods=[“GET”])
app.add_api_route(“/api/models”,
get_models,
methods=[“GET”],
response_model=ModelMetadataCollection,
responses={
500: {“model”: Error}
})

The code above can be found here.

The create_app() function first creates the app object with the service title that we loaded from the configuration file and the version. We then add two routes to the app: the root route and the model metadata route. The root route simply reroutes the request to the /docs route which hosts the auto-generated documentation. The model metadata route returns metadata for all of the models hosted by the service.

The next thing the function does is actually load the models:

model_manager = ModelManager()
for model in models:
model_manager.load_model(model.class_path)
if model.create_endpoint:
model = model_manager.get_model(model.qualified_name)
controller = PredictionController(model=model)
controller.__call__.__annotations__[“data”] =
model.input_schema
app.add_api_route(“/api/models/{}/prediction” \
.format(model.qualified_name),
controller,
methods=[“POST”],
response_model=model.output_schema,
description=model.description,
responses={
400: {“model”: Error},
500: {“model”: Error}
})
else:
logger.info(“Skipped creating an endpoint for model
{}”.format(model.qualified_name))
return app

The code above can be found here.

The first thing we do is instantiate the ModelManager singleton. Next, we’ll process each model in the configuration. For each model, we’ll load it into the ModelManage and then create an endpoint for it. An endpoint is only created for a model if the configuration sets the “create_endpoint” option to true for that model.

Creating an endpoint for a model is a little tricky because we need to dynamically create an endpoint and add all of the options that FastAPI supports.

To create an endpoint for a model, we first need to get a reference to the model from the ModelManager singleton. We then instantiate the PredictionController class and pass the reference to the model to the __init__() method of the class. We now have a function that we can register with the FastAPI application as an endpoint controller. Before we can do that, we need to add an annotation to the function that will allow FastAPI to automatically create documentation for the endpoint. We’ll annotate the controller function with the pydantic type that the model accepts as input. Now we are ready to register the function as a controller, when we do that we also provide the FastAPI app with the HTTP method, response pydantic model, description, and error response models. All of these options give the FastAPI app information about the endpoint which will be used later to auto-generate the documentation.

Creating a Package

To enable all of this, the rest_model_service package is available as a package that can be installed from PyPi using the pip package manager. To install the package into your project you can execute this command:

pip install rest_model_service

Once the service package is installed, we can use it within an ML model project to create a RESTful service for the model.

Using the Service

class IrisModelInput(BaseModel):
sepal_length: float = Field(gt=5.0, lt=8.0, description=”Length
of the sepal of the flower.”)
sepal_width: float = Field(gt=2.0, lt=6.0, description=”Width of
the sepal of the flower.”)
petal_length: float = Field(gt=1.0, lt=6.8, description=”Length
of the petal of the flower.”)
petal_width: float = Field(gt=0.0, lt=3.0, description=”Width of
the petal of the flower.”)
class Species(str, Enum):
iris_setosa = “Iris setosa”
iris_versicolor = “Iris versicolor”
iris_virginica = “Iris virginica”.
class IrisModelOutput(BaseModel):
species: Species = Field(description=”Predicted species of the
flower.”)
class IrisModel(MLModel):
display_name = “Iris Model”
qualified_name = “iris_model”
description = “Model for predicting the species of a flower
based on its measurements.”
version = “1.0.0”
input_schema = IrisModelInput
output_schema = IrisModelOutput
def __init__(self):
pass
def predict(self, data):
return IrisModelOutput(species=”Iris setosa”)

The code above can be found here.

The mock model class works just like any other MLModel class, but it always returns a prediction of “Iris setosa”. As you can see, the model references the IrisModelInput and IrisModelOutput pydantic models for its input and output.

Once we have a model, we’ll need a configuration file to your project that will be used by the model service to find the models that you want to deploy. The configuration file should look like this:

service_title: REST Model Service
models:
- qualified_name: iris_model
class_path: tests.mocks.IrisModel
create_endpoint: true

This file can be found in the examples folder here.

To start up the service locally, we need to point the service at the configuration file using an environment variable and then execute the uvicorn command:

export PYTHONPATH=./
export REST_CONFIG=examples/rest_config.yaml
uvicorn rest_model_service.main:app — reload

The service should start and we can view the documentation page on port 8000:

As you can see, the root endpoint and model metadata endpoint are part of the API. We also have an automatically generated endpoint for the iris_model mocked model that we added to the service through the configuration. The model’s input and output data models are also added to documentation:

We can even try a prediction out:

Of course, the prediction will always be the same because it’s a mocked model.

Generating the OpenAPI Contract

export PYTHONPATH=./
export REST_CONFIG=examples/rest_config.yaml
generate_openapi — output_file=example.yaml

The script uses the same configuration that the service uses, but it doesn’t run the webservice. It instead uses the FastAPI framework to generate the contract and saves it to the output file.

The generated contract will look like this:

info:
title: REST Model Service
version: <version_placeholder>
openapi: 3.0.2
paths:
/:
get:
description: Root of API.
operationId: get_root__get
responses:
‘200’:
content:
application/json:
schema: {}
description: Successful Response
summary: Get Root
/api/models:
get:
description: List of models available.

Closing

The service currently does not allow any extra code that is not model code to be hosted by the service. When deploying a model into a production setting, we often have extra logic that we need to deploy alongside the model that is not technically part of the model. This is usually called the “business logic” of the solution. The service currently does not support the ability to add the business logic alongside the model logic. Granted, it is possible to throw the business logic into the model class and just deploy, but this combines the code together into one class and it makes it harder to test the code and reason about it correctly. To fix this shortcoming, we can add “plugin points” that allow us to add our own logic before and after the model executes where we can add the business logic.

One of the ways in which we could improve the service in the future is to allow more configuration of the models when they are instantiated by the service.It’s not possible to customize the model when it is created by the service at startup time right now. In this future, it would be nice to allow the configuration of the service to hold parameters that would be passed to the model classes when they are instantiated.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store