{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "dcba9de7-327b-425a-81c8-22bebfdf88ca", "metadata": {}, "source": [ "# __Create a Classification Model Using AutoML__" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c088a892", "metadata": {}, "source": [ "- Tutorial difficulty: ★☆☆☆☆\n", "- 4 min read\n", "- Languages: [SQL](https://en.wikipedia.org/wiki/SQL) (100%)\n", "- File location: tutorial_en/thanosql_ml/classification/automl_classification.ipynb\n", "- References: [(Kaggle) Titanic - Machine Learning from Disaster](https://www.kaggle.com/competitions/titanic/overview)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4a197985", "metadata": {}, "source": [ "## Tutorial Introduction\n", "\n", "
Classification is a type of Machine Learning that predicts which category(Category or Class) the target belongs to. For example, both binary classifications(used for classifying men or women) and multiple classifications(used to predict animal species such as dogs, cats, rabbits, etc.) are included in the classification tasks.
👉 Create a classification model for survivors using the Titanic: Machine Learning from Disaster dataset from the machine learning contest platform Kaggle. The goals of this competition are as follows:\n", " (For reference, the data for the event is a list of real passengers who were on board during the Titanic incident on April 15, 1912.)
\n", "\n", " | passengerid | \n", "survived | \n", "pclass | \n", "name | \n", "sex | \n", "age | \n", "sibsp | \n", "parch | \n", "ticket | \n", "fare | \n", "cabin | \n", "embarked | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "0 | \n", "3 | \n", "Braund, Mr. Owen Harris | \n", "male | \n", "22.0 | \n", "1 | \n", "0 | \n", "A/5 21171 | \n", "7.2500 | \n", "None | \n", "S | \n", "
1 | \n", "2 | \n", "1 | \n", "1 | \n", "Cumings, Mrs. John Bradley (Florence Briggs Th... | \n", "female | \n", "38.0 | \n", "1 | \n", "0 | \n", "PC 17599 | \n", "71.2833 | \n", "C85 | \n", "C | \n", "
2 | \n", "3 | \n", "1 | \n", "3 | \n", "Heikkinen, Miss. Laina | \n", "female | \n", "26.0 | \n", "0 | \n", "0 | \n", "STON/O2. 3101282 | \n", "7.9250 | \n", "None | \n", "S | \n", "
3 | \n", "4 | \n", "1 | \n", "1 | \n", "Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n", "female | \n", "35.0 | \n", "1 | \n", "0 | \n", "113803 | \n", "53.1000 | \n", "C123 | \n", "S | \n", "
4 | \n", "5 | \n", "0 | \n", "3 | \n", "Allen, Mr. William Henry | \n", "male | \n", "35.0 | \n", "0 | \n", "0 | \n", "373450 | \n", "8.0500 | \n", "None | \n", "S | \n", "
The tianic_train dataset contains the following columns.
\n", "\n", " | metric | \n", "score | \n", "
---|---|---|
0 | \n", "Accuracy | \n", "0.923681 | \n", "
1 | \n", "ROCAUC | \n", "0.927237 | \n", "
2 | \n", "Recall | \n", "0.939103 | \n", "
3 | \n", "Precision | \n", "0.856725 | \n", "
4 | \n", "F1-Score | \n", "0.896024 | \n", "
5 | \n", "Kappa | \n", "0.835941 | \n", "
6 | \n", "MCC | \n", "0.838139 | \n", "
Normally, train datasets should not be used for evaluation. However, for this tutorial, the train dataset is used for convenience.
\n", "\n", " | passengerid | \n", "pclass | \n", "name | \n", "sex | \n", "age | \n", "sibsp | \n", "parch | \n", "ticket | \n", "fare | \n", "cabin | \n", "embarked | \n", "predict_result | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "892 | \n", "3 | \n", "Kelly, Mr. James | \n", "male | \n", "34.5 | \n", "0 | \n", "0 | \n", "330911 | \n", "7.8292 | \n", "None | \n", "Q | \n", "0 | \n", "
1 | \n", "893 | \n", "3 | \n", "Wilkes, Mrs. James (Ellen Needs) | \n", "female | \n", "47.0 | \n", "1 | \n", "0 | \n", "363272 | \n", "7.0000 | \n", "None | \n", "S | \n", "0 | \n", "
2 | \n", "894 | \n", "2 | \n", "Myles, Mr. Thomas Francis | \n", "male | \n", "62.0 | \n", "0 | \n", "0 | \n", "240276 | \n", "9.6875 | \n", "None | \n", "Q | \n", "0 | \n", "
3 | \n", "895 | \n", "3 | \n", "Wirz, Mr. Albert | \n", "male | \n", "27.0 | \n", "0 | \n", "0 | \n", "315154 | \n", "8.6625 | \n", "None | \n", "S | \n", "0 | \n", "
4 | \n", "896 | \n", "3 | \n", "Hirvonen, Mrs. Alexander (Helga E Lindqvist) | \n", "female | \n", "22.0 | \n", "1 | \n", "1 | \n", "3101298 | \n", "12.2875 | \n", "None | \n", "S | \n", "0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
413 | \n", "1305 | \n", "3 | \n", "Spector, Mr. Woolf | \n", "male | \n", "NaN | \n", "0 | \n", "0 | \n", "A.5. 3236 | \n", "8.0500 | \n", "None | \n", "S | \n", "0 | \n", "
414 | \n", "1306 | \n", "1 | \n", "Oliva y Ocana, Dona. Fermina | \n", "female | \n", "39.0 | \n", "0 | \n", "0 | \n", "PC 17758 | \n", "108.9000 | \n", "C105 | \n", "C | \n", "1 | \n", "
415 | \n", "1307 | \n", "3 | \n", "Saether, Mr. Simon Sivertsen | \n", "male | \n", "38.5 | \n", "0 | \n", "0 | \n", "SOTON/O.Q. 3101262 | \n", "7.2500 | \n", "None | \n", "S | \n", "0 | \n", "
416 | \n", "1308 | \n", "3 | \n", "Ware, Mr. Frederick | \n", "male | \n", "NaN | \n", "0 | \n", "0 | \n", "359309 | \n", "8.0500 | \n", "None | \n", "S | \n", "0 | \n", "
417 | \n", "1309 | \n", "3 | \n", "Peter, Master. Michael J | \n", "male | \n", "NaN | \n", "1 | \n", "1 | \n", "2668 | \n", "22.3583 | \n", "None | \n", "C | \n", "0 | \n", "
418 rows × 12 columns
\n", "If you have any difficulties creating your own model using ThanoSQL or applying it to your service, please feel free to contact us below😊
\n", "For inquiries regarding building a classification model: contact@smartmind.team
\n", "