Step by Step: Building a RAG Chatbot with Minor Hallucinations
In the rapidly evolving landscape of artificial intelligence, Retrieval Augmented Generation (RAG) has emerged as a groundbreaking technique that enhances...
As your data science team grows and you start deploying models to production, the need for proper ML infrastructure becomes crucial – a standard way to design, train and deploy models.
In this guide, together we will build a basic ML Platform using open-source tools like Cookiecutter, DVC, MLFlow, FastAPI, Pulumi and more.
Keep in mind that this type of project can be huge – often taking a lot of time and resources – therefore our toy ML Platform won’t have tons of features – just the basics, but it should teach you the basic principles of how to build your own ML platform
This how-to guide is based on the following workshop I did for the MLOps Community:
Our toy ML Platform will use DVC for data versioning, MLFlow for experiments management, and FastAPI for model serving.
We’re going to build all of this on top of AWS, but in theory you could also use Azure, Google Cloud or any other cloud provider.
It’s important to note that when building your own machine learning platform, you should NOT take these tools for granted. You should evaluate the alternatives – as they may be more appropriate for your specific use case.
The first component in our machine learning platform is going to be the model template, which we’re going to build using Cookiecutter for templating, and Poetry for package management.
The idea is that when a data scientist starts working on a new project, they will clone our model template (which contains a standard folder structure, Python linting, etc), develop their model, and easily deploy it when it’s ready for production.
The machine learning models template will contain basic training and serving code
The training code in the model template will use the MLFlow client to track experiments.
Those experiments will be sent to the MLFlow Server that we’ll run on top of Kubernetes (EKS).
The model artifact itself is going to be saved in a S3 Bucket (the Artifact Storage), and metadata about experiments will be saved in a PostgreSQL database.
We’ll also track versions of the dataset using DVC, in a S3 bucket.
For model serving, we will build a FastAPI server that’ll be responsible for preprocessing, making predictions, etc.
These model servers are going to run on Kubernetes, and we’ll expose them to the internet using Traefik.
All our infrastructure is going to be deployed using Pulumi, which is an Infrastructure as Code tool similar to Terraform.
Here are some major advantages of using this method:
With Pulumi, you can choose to write your infrastructure in a real programming language, such as TypeScript, Python, C# and more.
Even though the natural choice for an ML platform would be Python, I chose TypeScript because at the time of writing this post (July 2021), Pulumi’s implementation of TypeScript is more feature complete.
We’re going to have 2 GitHub repositories:
For CI/CD we’re going to use GitHub Actions.
Let’s start by setting up Kubernetes with MLFlow server.
Create a new GitHub repo called mlplatform-infra. This repo will contain the underlying infrastructure that’s shared between models.
Clone it and run:
pulumi new
Open index.ts and write the code for creating a new EKS cluster:
// Create a Kubernetes cluster.
const cluster = new eks.Cluster('mlplatform-eks', {
createOidcProvider: true,
});
export const kubeconfig = cluster.kubeconfig;
The createOidcProvider is required because MLFlow is going to access the artifact storage (see architecture), which is a S3 bucket, so we need to create a Kubernetes ServiceAccount that can access S3 buckets.
There are more interesting arguments in this API, you should definitely check them out before deploying to production.
To install packages like MLFlow on Kubernetes, we’re going to use Helm, which is a very popular package manager for Kubernetes.
Specifically, we’re going to use the labirras/mlflow Helm chart:
// Install MLFlow
const mlflow = new k8s.helm.v3.Chart("mlflow", {
chart: "mlflow",
fetchOpts: { repo: "https://larribas.me/helm-charts" },
}, { provider: cluster.provider });
This should install an MLFlow server, but unfortunately it’s not going to work because we still need to connect it to an artifact storage and a model metadata database.
Let’s create an S3 bucket for the artifact storage, and configure MLFlow to use it using the Helm chart’s values:
// Create S3 bucket for MLFlow artifact storage
const artifactStorage = new aws.s3.Bucket("artifact-storage", {
acl: "public-read-write",
});
// Install MLFlow
const mlflow = new k8s.helm.v3.Chart("mlflow", {
chart: "mlflow",
values: {
defaultArtifactRoot: artifactStorage.bucket.apply((bucketName: string) => `s3://${bucketName}`),
},
fetchOpts: { repo: "https://larribas.me/helm-charts" },
}, { provider: cluster.provider });
This should almost work, but there’s one problem: the MLFlow server, which is running on Kubernetes, isn’t going to have access to S3 buckets (see comment above on createOidcProvider).
To fix the permissions issue, let’s create a ServiceAccount that can access S3 buckets and use it:
// Create S3 bucket for MLFlow artifact storage
const artifactStorage = new aws.s3.Bucket("artifact-storage", {
acl: "public-read-write",
});
// Create a k8s ServiceAccount that can access S3 buckets for MLFlow
const mlflowServiceAccount = new S3ServiceAccount('mlflow-service-account', {
namespace: "default",
oidcProvider: cluster.core.oidcProvider!,
readOnly: false,
}, { provider: cluster.provider });
// Install MLFlow
const mlflow = new k8s.helm.v3.Chart("mlflow", {
chart: "mlflow",
values: {
defaultArtifactRoot: artifactStorage.bucket.apply((bucketName: string) => `s3://${bucketName}`),
serviceAccount: {
create: false,
name: mlflowServiceAccount.name,
}
},
fetchOpts: { repo: "https://larribas.me/helm-charts" },
}, { provider: cluster.provider });
The S3ServiceAccount is available here.
Finally, we need to set up the Postgres database for model metadata. We’ll use RDS for that:
// Create Postgres database for MLFlow
const dbPassword = new random.RandomPassword('mlplatform-db-password', { length: 16, special: false });
const db = new aws.rds.Instance('mlflow-db', {
allocatedStorage: 10,
engine: "postgres",
engineVersion: "11.10",
instanceClass: "db.t3.medium",
name: "mlflow",
password: dbPassword.result,
skipFinalSnapshot: true,
username: "postgres",
// Make sure EKS has access to this db
vpcSecurityGroupIds: [cluster.clusterSecurityGroup.id, cluster.nodeSecurityGroup.id],
});
// Install MLFlow
const mlflow = new k8s.helm.v3.Chart("mlflow", {
chart: "mlflow",
values: {
....
backendStore: {
postgres: {
username: db.username,
password: db.password,
host: db.address,
port: db.port,
database: "mlflow"
}
},
},
fetchOpts: { repo: "https://larribas.me/helm-charts" },
}, { provider: cluster.provider });
That’s it, run pulumi up and you should see MLFlow running on your new Kubernetes cluster!
Even though MLFlow is running successfully, you have no way of accessing it. Let’s open MLFlow to the internet!
Security Warning
In real life you shouldn’t just expose MLFlow to the Internet of course 🙂 Instead, think of an authentication & authorization mechanisms that work for your organization.
To expose Kubernetes applications to the Internet, we are going to use Traefik – an open-source API gateway.
To install Traefik:
const traefik = new k8s.helm.v3.Chart('traefik', {
chart: 'traefik',
fetchOpts: { repo: 'https://containous.github.io/traefik-helm-chart' },
}, { provider: cluster.provider })
To expose MLFlow on the Traefik we just installed:
// Expose MLFlow in Traefik as /mlflow
new TraefikRoute('mlflow', {
prefix: '/mlflow',
service: mlflow.getResource('v1/Service', 'mlflow', 'mlflow'),
namespace: 'default',
}, { provider: cluster.provider, dependsOn: [mlflow] });
The TraefikRoute component is available here.
To get Traefik’s public hostname, add the following line:
export const hostname = traefik.getResource('v1/Service', 'traefik').status.loadBalancer.ingress[0].hostname;
After running pulumi up again, you should be able to get that hostname by running:
pulumi stack output hostname
You can now create a CNAME record in your domain’s DNS provider.
You should now be able to access MLFlow at http://yourdomain.com/mlflow 🙂
Security Warning
In real-life you would want to set up HTTPS and remove HTTP in Traefik’s Helm chart values. In this guide we use HTTP only.
Let’s start building our model template. Clone the model-template repo and start a new Poetry package:
poetry new --src my_model
We’ll call it my_model for now, and change to Cookiecutter variables later on.
Create a my_model.training package and add your training starting point. Here we’ll use a simple LightGBM example:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, log_loss
import pandas as pd
import lightgbm as lgb
# Prepare training data
df = pd.read_csv('data/iris.csv')
flower_names = {'Setosa': 0, 'Versicolor': 1, 'Virginica': 2}
X = df[['sepal.length', 'sepal.width', 'petal.length', 'petal.width']]
y = df['variety'].map(flower_names)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
train_data = lgb.Dataset(X_train, label=y_train)
def main():
# Train model
params = {
"objective": "multiclass",
"num_class": 3,
"learning_rate": 0.2,
"metric": "multi_logloss",
"feature_fraction": 0.8,
"bagging_fraction": 0.9,
"seed": 42,
}
model = lgb.train(params, train_data, valid_sets=[train_data])
# Evaluate model
y_proba = model.predict(X_test)
y_pred = y_proba.argmax(axis=1)
loss = log_loss(y_test, y_proba)
acc = accuracy_score(y_test, y_pred)
if __name__ == "__main__":
main()
To make it work, you will need to iris.csv dataset from here.
You can also add a Poetry script to make training easy to run. Add this to your pyproject.toml file:
[tool.poetry.scripts]
train = "src.my_model.training.train:main"
And now you can run:
poetry run train
OK, let’s add MLFlow client to this training code. I’ve highlighted the changes:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, log_loss
import pandas as pd
import lightgbm as lgb
import mlflow
import mlflow.lightgbm
# Enable auto logging
mlflow.set_tracking_uri('http://yourdomain.com/mlflow')
mlflow.lightgbm.autolog()
# Prepare training data
df = pd.read_csv('data/iris.csv')
flower_names = {'Setosa': 0, 'Versicolor': 1, 'Virginica': 2}
X = df[['sepal.length', 'sepal.width', 'petal.length', 'petal.width']]
y = df['variety'].map(flower_names)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
train_data = lgb.Dataset(X_train, label=y_train)
def main():
with mlflow.start_run() as run:
# Train model
params = {
"objective": "multiclass",
"num_class": 3,
"learning_rate": 0.2,
"metric": "multi_logloss",
"feature_fraction": 0.8,
"bagging_fraction": 0.9,
"seed": 42,
}
model = lgb.train(params, train_data, valid_sets=[train_data])
# Evaluate model
y_proba = model.predict(X_test)
y_pred = y_proba.argmax(axis=1)
loss = log_loss(y_test, y_proba)
acc = accuracy_score(y_test, y_pred)
# Log custom metrics if you want
mlflow.log_metrics({
"log_loss": loss,
"accuracy": acc
})
print("Run ID:", run.info.run_id)
if __name__ == "__main__":
main()
Try to train the model again and go to your MLFlow server. You should now see the new experiment and download the model file from the browser 🙂
Let’s implement the model server based on FastAPI.
Create a my_model.serving package, and create a basic FastAPI server that can make predictions for your basic model. In our case:
import os
import uvicorn
import mlflow
import numpy as np
import pandas as pd
from fastapi import FastAPI, Request
from pydantic import BaseModel
class FlowerPartSize(BaseModel):
length: float
width: float
class PredictRequest(BaseModel):
sepal: FlowerPartSize
petal: FlowerPartSize
app = FastAPI()
# Load model
model = mlflow.lightgbm.load_model(f'runs:/{os.environ["MLFLOW_RUN_ID"]}/model')
flower_name_by_index = {0: 'Setosa', 1: 'Versicolor', 2: 'Virginica'}
@app.post("/predict")
def predict(request: PredictRequest):
df = pd.DataFrame(columns=['sepal.length', 'sepal.width', 'petal.length', 'petal.width'],
data=[[request.sepal.length, request.sepal.width, request.petal.length, request.petal.width]])
y_pred = np.argmax(model.predict(df))
return {"flower": flower_name_by_index[y_pred]}
def main():
uvicorn.run(app, host="0.0.0.0", port=8000)
if __name__ == "__main__":
main()
Note that the model is loaded from the Artifact Storage (the S3 bucket) through MLFlow.
OK, we’ll now need to do some DevOps to make this server run on our Kubernetes. Let’s start by containerizing it.
Add a Dockerfile:
FROM python:3.8-slim
WORKDIR /my_model
STOPSIGNAL SIGINT
ENV LISTEN_PORT 80
# System dependencies
RUN apt update && apt install -y libgomp1
RUN pip3 install poetry
# Project dependencies
COPY poetry.lock pyproject.toml ./
RUN poetry config virtualenvs.create false
RUN poetry install --no-interaction --no-ansi --no-dev
COPY . .
WORKDIR /my_model/src
ENTRYPOINT uvicorn my_model.serving.main:app --host 0.0.0.0 --port $LISTEN_PORT --workers 2
Next, let’s create a Pulumi package for the model that can build & push this Docker image to ECR, and deploy it to Kubernetes.
Create an infra directory inside the model template and run pulumi new, as before.
The Pulumi code should look something like:
import * as pulumi from '@pulumi/pulumi';
import * as awsx from '@pulumi/awsx';
import * as k8s from '@pulumi/kubernetes';
import * as kx from '@pulumi/kubernetesx';
import TraefikRoute from './TraefikRoute';
const config = new pulumi.Config();
const baseStack = new pulumi.StackReference(config.require('baseStackName'))
// Connect to the Kubernetes we created in mlplatform-infra
const provider = new k8s.Provider('provider', {
kubeconfig: baseStack.requireOutput('kubeconfig'),
})
// Build & push Docker image to ECR
const image = awsx.ecr.buildAndPushImage('my-model-image', {
context: '../',
});
const podBuilder = new kx.PodBuilder({
containers: [{
image: image.imageValue,
ports: { http: 80 },
env: {
'LISTEN_PORT': '80',
'MLFLOW_TRACKING_URI': 'http://yourcompany.com/mlflow',
'MLFLOW_RUN_ID': config.require('runID'),
}
}],
serviceAccountName: baseStack.requireOutput('modelsServiceAccountName'),
});
const deployment = new kx.Deployment('my-model-serving', {
spec: podBuilder.asDeploymentSpec({ replicas: 3 })
}, { provider });
const service = deployment.createService();
// Expose model in Traefik
new TraefikRoute('my-model', {
prefix: '/models/my-model',
service,
namespace: 'default',
}, { provider, dependsOn: [service] });
Note that on line 30 we use a special Kubernetes ServiceAccount that can read from S3 buckets.
This is similar to the ServiceAccount we created for MLFlow, but it is read-only – you can just copy-paste that piece of code in mlplatform-infra and change readOnly to true.
In the rapidly evolving landscape of artificial intelligence, Retrieval Augmented Generation (RAG) has emerged as a groundbreaking technique that enhances...
In May 2023, Samsung employees unintentionally disclosed confidential source code by inputting it into ChatGPT, resulting in a company-wide ban...
As organizations rush to implement Retrieval-Augmented Generation (RAG) systems, many struggle at the production stage, their prototypes breaking under real-world...