Using the Custom Model in ThanoSQL¶
- Tutorial Difficulty: ★★☆☆☆
- 10 min read
- Languages: SQL (50%), Python (50%)
- File location: tutorial_en/thanosql_ml/udm_tutorial.ipynb
- References: Beans Dataset
Tutorial Introduction¶
The corresponding feature works seamlessly in paid versions.
ThanoSQL provides a feature to upload models you have created to the ThanoSQL workspace and database and use them for prediction.
In This Tutorial
👉 This tutorial uses the Beans dataset. This dataset is of leaf images taken in the field in different districts in Uganda by the Makerere AI lab in collaboration with the National Crops Resources Research Institute(NaCRRI), the national body in charge of research in agriculture in Uganda.
#. Prepare the Model and Dataset Using Python¶
Prepare Dataset¶
Download and Unzip Data¶
import os
from shutil import unpack_archive
from urllib.request import urlretrieve
url = "https://storage.googleapis.com/ibeans"
for split in ["train", "validation", "test"]:
urlretrieve(f"{url}/{split}.zip", f"{split}.zip")
unpack_archive(f"{split}.zip", ".")
os.remove(f"{split}.zip")
Install Necessary Packages¶
!pip install torch torchvision
from torch.utils.data import DataLoader
from torchvision import transforms as T
from torchvision.datasets import ImageFolder
data_transforms = {
"train": T.Compose(
[
T.RandomResizedCrop(224),
T.RandomHorizontalFlip(),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
]
),
"validation": T.Compose(
[
T.Resize(224),
T.CenterCrop(224),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
]
),
}
image_datasets = {
split: ImageFolder(split, data_transforms[split])
for split in ["train", "validation"]
}
dataloaders = {
split: DataLoader(image_datasets[split], batch_size=8, shuffle=split == "train")
for split in ["train", "validation"]
}
dataset_sizes = {split: len(image_datasets[split]) for split in ["train", "validation"]}
Prepare the Model¶
Create a Model Training Function¶
import time
import copy
import torch
def train_model(model, criterion, optimizer, num_epochs=3):
start_time = time.time()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
best_model_weights = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print(f"Epoch {epoch}/{num_epochs - 1}")
print("-" * 10)
# Every epoch goes through a training and validation phase
for phase in ["train", "validation"]:
if phase == "train":
model.train()
else:
model.eval()
running_loss = 0.0
running_corrects = 0
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
optimizer.zero_grad()
# Forward propagation
with torch.set_grad_enabled(phase == "train"):
outputs = model(inputs)
preds = torch.argmax(outputs, dim=1)
loss = criterion(outputs, labels)
# Backward propagation during training phase only
if phase == "train":
loss.backward()
optimizer.step()
# Statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects / dataset_sizes[phase]
print(f"{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}")
# Save if the model accuracy is higher than the previous accuracy
if phase == "validation" and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_weights = copy.deepcopy(model.state_dict())
print()
time_elapsed = time.time() - start_time
print(f"Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s")
print(f"Best val Acc: {best_acc:4f}")
model.load_state_dict(best_model_weights)
return model
Load the Model¶
This tutorial uses mobilevit v2 as it has a high accuracy for a lightweight model.
model = torch.hub.load("rwightman/pytorch-image-models", "mobilevitv2_050", pretrained=True, num_classes=3)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = torch.nn.CrossEntropyLoss()
Train and Save a Model¶
trained_model = train_model(model, criterion, optimizer, num_epochs=1)
Epoch 0/0 ---------- train Loss: 0.5641 Acc: 0.8008 validation Loss: 0.2618 Acc: 0.8947 Training complete in 1m 6s Best val Acc: 0.894737
torch.save(trained_model, "trained_model.pth")
Create a Dataframe to Insert into ThanoSQL¶
import numpy as np
import pandas as pd
test_dataset = ImageFolder("test", data_transforms["validation"])
data = np.stack([img.numpy() for img, _ in test_dataset])
df = pd.DataFrame(pd.Series(data.tolist()), columns=["image"]) # column name must be an "image"
df.to_pickle("test_data.pkl")
0. Prepare Dataset¶
As mentioned in the ThanoSQL Workspace, you must create an API token and run the query below to execute the query of ThanoSQL.
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>
Prepare Dataset¶
%%thanosql
COPY beans_test
OPTIONS (if_exists='replace')
FROM 'test_data.pkl'
Success
Query Details
- "COPY" specifies the name of the dataset to be saved as a database table.
- "OPTIONS" specifies the option values to be used for the COPY clause.
- "if_exists": determines how the function should handle the case where the table already exists, it can either raise an error, append to the existing table, or replace the existing table (str, optional, 'fail'|'replace'|'append', default: 'fail')
1. Check Dataset¶
For this tutorial, we use the beans_test table located in the ThanoSQL workspace database. Run the query below to check the contents of the table.
%%thanosql
SELECT *
FROM beans_test
LIMIT 5
image | |
---|---|
0 | [[[-0.028684020042419434, -0.04580877348780632... |
1 | [[[-0.0629335269331932, -0.0629335269331932, -... |
2 | [[[1.9577873945236206, 1.8721636533737183, 1.7... |
3 | [[[0.21106265485286713, 0.0569397434592247, -0... |
4 | [[[-1.3815395832061768, -1.432913899421692, -1... |
Understanding the Data Table
The beans_test table contains the following information.
- image: image saved in numpy format
2. Upload Custom Model¶
To upload a model created using Python in the previous step, run the following query and save the model as beans_mobilevit.
%%thanosql
UPLOAD MODEL beans_mobilevit
OPTIONS (
framework='pytorch',
overwrite=True
)
FROM 'trained_model.pth'
Success
Query Details
- "UPLOAD MODEL" upload the model with a name of beans_mobilevit to the ThanoSQL workspace.
- "OPTIONS" specifies the option values to be used for the UPLOAD MODEL clause.
- "framework": specifies the model framework (str, default: 'pytorch')
- "overwrite": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)
As of right now, ThanoSQL only supports Pytorch model for UPLOAD MODEL clause.
3. Predict Using a Custom Model¶
To predict class of the beans using a custom model, run the following query.
%%thanosql
PREDICT USING beans_mobilevit
OPTIONS (
result_col='predicted'
)
AS (
SELECT *
FROM beans_test
ORDER BY RANDOM()
LIMIT 5
)
image | predicted | |
---|---|---|
0 | [[[-0.7650483846664429, -0.7821731567382812, -... | [-2.5088071823120117, -0.03282929211854935, 2.... |
1 | [[[1.4097952842712402, 1.3926706314086914, 1.3... | [-1.7204804420471191, -1.7354539632797241, 3.5... |
2 | [[[-1.1760425567626953, -1.1931673288345337, -... | [-0.5441469550132751, 2.5831964015960693, -2.0... |
3 | [[[-1.2445416450500488, -1.278791069984436, -1... | [-1.5955406427383423, -2.174574613571167, 3.78... |
4 | [[[0.4165596663951874, 0.33093592524528503, 0.... | [-1.5648517608642578, -1.0658249855041504, 2.6... |
Query details
- "PREDICT USING" predicts the outcome using the beans_mobilevit.
- "OPTIONS" specifies the option values to be used for prediction.
- "result_col": the column that contains the predicted results (str, optional, default: 'predict_result')
pred_df = _ # get the object that has been used last
pred_df["predict_result"] = pred_df["predict_result"].apply(np.argmax)
pred_df["predict_result"] = pred_df["predict_result"].apply(test_dataset.classes.__getitem__)
pred_df
image | predicted | |
---|---|---|
0 | [[[-0.7650483846664429, -0.7821731567382812, -... | healthy |
1 | [[[1.4097952842712402, 1.3926706314086914, 1.3... | healthy |
2 | [[[-1.1760425567626953, -1.1931673288345337, -... | bean_rust |
3 | [[[-1.2445416450500488, -1.278791069984436, -1... | healthy |
4 | [[[0.4165596663951874, 0.33093592524528503, 0.... | healthy |
4. In Conclusion¶
In this tutorial, we uploaded a custom model to ThanoSQL and used that model for prediction of the classes of beans. You can refer back to this tutorial to upload your own custom model and use it within ThanoSQL.
- How to Upload My Data to the ThanoSQL Workspace
- How to Create a Table Using My Data
- How to Upload My Model to the ThanoSQL Workspace
Inquiries About Deploying a Model for Your Own Service
If you have any difficulties creating your own model using ThanoSQL or applying it to your service, please feel free to contact us below😊
For inquiries regarding building an user defined model: contact@smartmind.team