{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "dcba9de7-327b-425a-81c8-22bebfdf88ca",
   "metadata": {},
   "source": [
    "# __Create a Classification Model Using AutoML__"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c088a892",
   "metadata": {},
   "source": [
    "- Tutorial difficulty: ★☆☆☆☆\n",
    "- 4 min read\n",
    "- Languages: [SQL](https://en.wikipedia.org/wiki/SQL) (100%)\n",
    "- File location: tutorial_en/thanosql_ml/classification/automl_classification.ipynb\n",
    "- References: [(Kaggle) Titanic - Machine Learning from Disaster](https://www.kaggle.com/competitions/titanic/overview)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4a197985",
   "metadata": {},
   "source": [
    "## Tutorial Introduction\n",
    "\n",
    "<div class=\"admonition note\">\n",
    "    <h4 class=\"admonition-title\">Understanding Classification</h4>\n",
    "    <p>Classification is a type of <a href=\"https://en.wikipedia.org/wiki/Machine_learning\">Machine Learning</a> that predicts which category(Category or Class) the target belongs to. For example, both binary classifications(used for classifying men or women) and multiple classifications(used to predict animal species such as dogs, cats, rabbits, etc.) are included in the classification tasks. <br></p>\n",
    "</div>\n",
    "\n",
    "To predict whether or not a potential customer will react positively to a particular marketing promotion in your company, you can use your customer's [Customer Relationship Management (CRM)](https://en.wikipedia.org/wiki/Customer_relationship_management) data(demographic information, customer behavior/search data, etc.). In this case, the <a href=\"https://en.wikipedia.org/wiki/Feature_(machine_learning)\">features</a> expressed in the CRM data are used as the input data, and the target value, which is the value to be predicted, is whether the target customer's response to the promotion is positive(1 or True) or negative(0 or False). By using this classification model, you can predict the reaction of customers who have not been exposed to advertisements and target the appropriate customers, thereby continuously increasing marketing efficiency.\n",
    "\n",
    "__The following are examples and applications of the ThanoSQL classification model.__\n",
    "\n",
    "- The classification model enables early detection of current user deviations and allows proactive response to problems(deviations). Collected data can help you identify the features of leaving customers and allow you to take appropriate action by discovering leaving customers in advance. This can help prevent customer defections and increase sales.\n",
    "- You can predict the [Market Segmentation](https://en.wikipedia.org/wiki/Market_segmentation) involved in your online platform. Most service users have different characteristics, behaviors, and needs. Classification models utilize the users' features to identify granular groups and enable them to develop strategies tailored to them.  \n",
    "\n",
    "<div class=\"admonition note\">\n",
    "    <h4 class=\"admonition-title\">In This Tutorial</h4>\n",
    "    <p>👉 Create a classification model for survivors using the  <strong>Titanic: Machine Learning from Disaster</strong> dataset from the machine learning contest platform <a href=\"https://www.kaggle.com/\">Kaggle</a>. The goals of this competition are as follows:\n",
    "    (For reference, the data for the event is a list of real passengers who were on board during the Titanic incident on April 15, 1912.)</p>\n",
    "</div>\n",
    "\n",
    "__Predicting Passengers Who Would Survive The Titanic Incident__\n",
    "\n",
    "ThanoSQL provides automated machine learning(__AutoML__) tools. This tutorial uses __AutoML__ to predict passengers who would survive the Titanic incident. ThanoSQL's __AutoML__ automates the process for model development and enables data collection and storage along with machine learning model development and distribution(end-to-end machine learning pipelines) using a single language.\n",
    "\n",
    "__Automated ML has the following advantages:__\n",
    "\n",
    "1. Implementation and deployment of machine learning solutions without extensive programming or data science knowledge\n",
    "2. Saving time and resources for deployment of development models\n",
    "3. Quickly solve problems using the data you have for decision-making\n",
    "\n",
    "Now let's use ThanoSQL to create a classification model that predicts passengers who would survive the Titanic incident."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1051840d-421e-4cab-9629-bcf6e37b7db6",
   "metadata": {},
   "source": [
    "## __0. Prepare Dataset__\n",
    "\n",
    "As mentioned in the [ThanoSQL Workspace](https://docs.thanosql.ai/1.5/en/getting_started/paas/workspace/lab/), you must create an API token and run the query below to execute the query of ThanoSQL. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32fbea16-ab9a-45cb-948b-dfecf06902a0",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%load_ext thanosql\n",
    "%thanosql API_TOKEN=<Issued_API_TOKEN>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "073a6182",
   "metadata": {},
   "source": [
    "### __Prepare Dataset__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "5163faa7",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-01-19T10:46:23.842758Z",
     "iopub.status.busy": "2023-01-19T10:46:23.842511Z",
     "iopub.status.idle": "2023-01-19T10:46:25.104465Z",
     "shell.execute_reply": "2023-01-19T10:46:25.103878Z",
     "shell.execute_reply.started": "2023-01-19T10:46:23.842740Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Success\n"
     ]
    }
   ],
   "source": [
    "%%thanosql\n",
    "GET THANOSQL DATASET titanic_data\n",
    "OPTIONS (overwrite=True)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1ec2d024",
   "metadata": {},
   "source": [
    "<div class=\"admonition note\">\n",
    "    <h4 class=\"admonition-title\">Query Details</h4>\n",
    "    <ul>\n",
    "        <li>\"<strong>GET THANOSQL DATASET</strong>\" downloads the specified dataset to the workspace.</li>\n",
    "        <li>\"<strong>OPTIONS</strong>\" specifies the option values to be used for the <strong>GET THANOSQL DATASET</strong> clause.\n",
    "        <ul>\n",
    "            <li>\"overwrite\": determines whether to overwrite a dataset if it already exists. If set as True, the old dataset is replaced with the new dataset (bool, optional, True|False, default: False)</li>\n",
    "        </ul>\n",
    "        </li>\n",
    "    </ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "c7b0c89c-8bdb-4065-9394-6f57e33be899",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-01-19T10:46:25.105455Z",
     "iopub.status.busy": "2023-01-19T10:46:25.105262Z",
     "iopub.status.idle": "2023-01-19T10:46:27.233283Z",
     "shell.execute_reply": "2023-01-19T10:46:27.232486Z",
     "shell.execute_reply.started": "2023-01-19T10:46:25.105436Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Success\n"
     ]
    }
   ],
   "source": [
    "%%thanosql\n",
    "COPY titanic_train \n",
    "OPTIONS (if_exists='replace')\n",
    "FROM 'thanosql-dataset/titanic_data/titanic_train.csv'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "26e63b28-7b06-4378-a789-2c56929df59a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-01-19T10:46:27.234273Z",
     "iopub.status.busy": "2023-01-19T10:46:27.234100Z",
     "iopub.status.idle": "2023-01-19T10:46:28.205829Z",
     "shell.execute_reply": "2023-01-19T10:46:28.205140Z",
     "shell.execute_reply.started": "2023-01-19T10:46:27.234257Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Success\n"
     ]
    }
   ],
   "source": [
    "%%thanosql\n",
    "COPY titanic_test \n",
    "OPTIONS (if_exists='replace')\n",
    "FROM 'thanosql-dataset/titanic_data/titanic_test.csv'"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "984aefd3",
   "metadata": {},
   "source": [
    "<div class=\"admonition note\">\n",
    "    <h4 class=\"admonition-title\">Query Details</h4>\n",
    "    <ul>\n",
    "        <li>\"<strong>COPY</strong>\" specifies the name of the dataset to be saved as a database table.</li>\n",
    "        <li>\"<strong>OPTIONS</strong>\" specifies the option values to be used for the <strong>COPY</strong> clause.\n",
    "        <ul>\n",
    "           <li>\"if_exists\": determines how the function should handle the case where the table already exists, it can either raise an error, append to the existing table, or replace the existing table (str, optional, 'fail'|'replace'|'append', default: 'fail')</li>\n",
    "        </ul>\n",
    "        </li>\n",
    "    </ul>\n",
    "</div>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "eff7350d-3b29-44a2-84d6-132d3b41a989",
   "metadata": {},
   "source": [
    "## __1. Check Dataset__\n",
    "\n",
    "To create the survivor classification model, we use the <strong>titanic_train</strong> table located in the ThanoSQL workspace database. Run the query below to check the contents of the table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "14b95522-168a-482c-a039-5af17deafc6a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-01-19T10:46:28.207403Z",
     "iopub.status.busy": "2023-01-19T10:46:28.207212Z",
     "iopub.status.idle": "2023-01-19T10:46:29.196745Z",
     "shell.execute_reply": "2023-01-19T10:46:29.196027Z",
     "shell.execute_reply.started": "2023-01-19T10:46:28.207386Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>passengerid</th>\n",
       "      <th>survived</th>\n",
       "      <th>pclass</th>\n",
       "      <th>name</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>ticket</th>\n",
       "      <th>fare</th>\n",
       "      <th>cabin</th>\n",
       "      <th>embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>None</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>None</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>None</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   passengerid  survived  pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                name     sex   age  sibsp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   parch            ticket     fare cabin embarked  \n",
       "0      0         A/5 21171   7.2500  None        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250  None        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500  None        S  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%thanosql\n",
    "SELECT * \n",
    "FROM titanic_train\n",
    "LIMIT 5 "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "cd8c9617",
   "metadata": {},
   "source": [
    "<div class=\"admonition note\">\n",
    "    <h4 class=\"admonition-title\">Understanding the Data Table</h4>\n",
    "    <p>The <strong>tianic_train</strong> dataset contains the following columns.</p>\n",
    "    <ul>\n",
    "        <li>passengerid: passenger ID</li>\n",
    "        <li>survived: whether the passenger on board survived</li>\n",
    "        <li>pclass: passenger ticket class</li>\n",
    "        <li>name: passenger name</li>\n",
    "        <li>sex: passenger gender</li>\n",
    "        <li>age: passenger age</li>\n",
    "        <li>sibsp: number of siblings or spouses on board</li>\n",
    "        <li>parch: number of parents or children on board</li>\n",
    "        <li>ticket: ticket number</li>\n",
    "        <li>fare: fare</li>\n",
    "        <li>cabin: cabin number</li>\n",
    "        <li>embarked: boarding location or port</li>\n",
    "    </ul>\n",
    "</div>\n",
    "\n",
    "In this tutorial, we will exclude the name, ticket, cabin, and passengerid columns since they require additional data preprocessing."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "af169f51-b815-4230-b398-2f9f3e46b7bd",
   "metadata": {},
   "source": [
    "## __2. Build a Classification Model__\n",
    "\n",
    "To create a survivor classification model with the name __titanic_automl_classification__ using the __titanic_train__ table, run the following query.  \n",
    "(Estimated duration of query execution: 8 min)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "2fb6601a-81af-4c99-af73-4c6894e0ac72",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-01-19T10:46:29.197533Z",
     "iopub.status.busy": "2023-01-19T10:46:29.197366Z",
     "iopub.status.idle": "2023-01-19T10:51:37.798108Z",
     "shell.execute_reply": "2023-01-19T10:51:37.797504Z",
     "shell.execute_reply.started": "2023-01-19T10:46:29.197517Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Success\n"
     ]
    }
   ],
   "source": [
    "%%thanosql\n",
    "BUILD MODEL titanic_automl_classification\n",
    "USING AutomlClassifier \n",
    "OPTIONS (\n",
    "    target_col='survived', \n",
    "    impute_type='iterative',  \n",
    "    features_to_drop=['name', 'ticket', 'passengerid', 'cabin'],\n",
    "    time_left_for_this_task=300,\n",
    "    overwrite=True\n",
    "    ) \n",
    "AS \n",
    "SELECT * \n",
    "FROM titanic_train"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6b0fce36",
   "metadata": {},
   "source": [
    "<div class=\"admonition note\">\n",
    "    <h4 class=\"admonition-title\">Query Details</h4>\n",
    "    <ul>\n",
    "        <li>\"<strong>BUILD MODEL</strong>\" creates and trains a model named <strong>titanic_automl_classification</strong>.</li>\n",
    "        <li>\"<strong>OPTIONS</strong>\" specifies the option values used to create the model.\n",
    "        <ul> \n",
    "            <li>\"target_col\": the name of the column containing the target value of the classification model (str, default: 'target') </li>\n",
    "            <li>\"impute_type\": determines how empty values ​​(NaNs) are handled (str, optional, 'simple'|'iterative' , default: 'simple') </li>\n",
    "            <li>\"features_to_drop\": selects columns that cannot be used for training (list[str], optional)</li>\n",
    "            <li>\"time_left_for_this_task\": the total time given to find a suitable classification model in seconds (int, optional, default: 60)</li>\n",
    "            <li>\"overwrite\": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)</li>\n",
    "        </ul>\n",
    "        </li>\n",
    "    </ul>\n",
    "</div>\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "63a96457-78da-4d01-a760-953febcf08cd",
   "metadata": {},
   "source": [
    "## __3. Evaluate the Model__\n",
    "\n",
    "To evaluate the performance of the model created in the previous step, run the following query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "67159796-2489-4576-942d-5939f24d962f",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-01-19T10:51:37.799075Z",
     "iopub.status.busy": "2023-01-19T10:51:37.798895Z",
     "iopub.status.idle": "2023-01-19T10:51:39.868382Z",
     "shell.execute_reply": "2023-01-19T10:51:39.867844Z",
     "shell.execute_reply.started": "2023-01-19T10:51:37.799059Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>metric</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Accuracy</td>\n",
       "      <td>0.923681</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>ROCAUC</td>\n",
       "      <td>0.927237</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Recall</td>\n",
       "      <td>0.939103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Precision</td>\n",
       "      <td>0.856725</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>F1-Score</td>\n",
       "      <td>0.896024</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Kappa</td>\n",
       "      <td>0.835941</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>MCC</td>\n",
       "      <td>0.838139</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      metric     score\n",
       "0   Accuracy  0.923681\n",
       "1     ROCAUC  0.927237\n",
       "2     Recall  0.939103\n",
       "3  Precision  0.856725\n",
       "4   F1-Score  0.896024\n",
       "5      Kappa  0.835941\n",
       "6        MCC  0.838139"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%thanosql \n",
    "EVALUATE USING titanic_automl_classification \n",
    "OPTIONS (\n",
    "    target_col='survived'\n",
    "    )\n",
    "AS\n",
    "SELECT *\n",
    "FROM titanic_train"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b4df22cc",
   "metadata": {},
   "source": [
    "<div class=\"admonition note\">\n",
    "    <h4 class=\"admonition-title\">Query Details</h4>\n",
    "    <ul>\n",
    "        <li>\"<strong>EVALUATE USING</strong>\" evaluates the <strong>titanic_automl_classification model</strong>. </li>\n",
    "        <li>\"<strong>OPTIONS</strong>\" specifies the option values used to evaluate the model.\n",
    "        <ul> \n",
    "            <li>\"target_col\": the name of the column containing the target value of the classification model (str, default: 'target') </li>\n",
    "        </ul>\n",
    "        </li>\n",
    "    </ul>\n",
    "</div>\n",
    "\n",
    "<div class=\"admonition warning\">\n",
    "    <h4 class=\"admonition-title\">Dataset for Evaluation</h4>\n",
    "    <p>Normally, train datasets should not be used for evaluation. However, for this tutorial, the train dataset is used for convenience.</p>\n",
    "</div>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "aa047726-113b-48ae-8247-135a37592945",
   "metadata": {},
   "source": [
    "## __4. Predict Survivors__\n",
    "\n",
    "To use the classification model created in the previous step for prediction of <strong>titanic_test</strong>, run the following query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "1df25db0-3352-4893-b0af-6e10cea8f7b0",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-01-19T10:51:39.869671Z",
     "iopub.status.busy": "2023-01-19T10:51:39.869135Z",
     "iopub.status.idle": "2023-01-19T10:51:41.669690Z",
     "shell.execute_reply": "2023-01-19T10:51:41.669020Z",
     "shell.execute_reply.started": "2023-01-19T10:51:39.869649Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>passengerid</th>\n",
       "      <th>pclass</th>\n",
       "      <th>name</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>ticket</th>\n",
       "      <th>fare</th>\n",
       "      <th>cabin</th>\n",
       "      <th>embarked</th>\n",
       "      <th>predict_result</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>892</td>\n",
       "      <td>3</td>\n",
       "      <td>Kelly, Mr. James</td>\n",
       "      <td>male</td>\n",
       "      <td>34.5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>330911</td>\n",
       "      <td>7.8292</td>\n",
       "      <td>None</td>\n",
       "      <td>Q</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>893</td>\n",
       "      <td>3</td>\n",
       "      <td>Wilkes, Mrs. James (Ellen Needs)</td>\n",
       "      <td>female</td>\n",
       "      <td>47.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>363272</td>\n",
       "      <td>7.0000</td>\n",
       "      <td>None</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>894</td>\n",
       "      <td>2</td>\n",
       "      <td>Myles, Mr. Thomas Francis</td>\n",
       "      <td>male</td>\n",
       "      <td>62.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>240276</td>\n",
       "      <td>9.6875</td>\n",
       "      <td>None</td>\n",
       "      <td>Q</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>895</td>\n",
       "      <td>3</td>\n",
       "      <td>Wirz, Mr. Albert</td>\n",
       "      <td>male</td>\n",
       "      <td>27.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>315154</td>\n",
       "      <td>8.6625</td>\n",
       "      <td>None</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>896</td>\n",
       "      <td>3</td>\n",
       "      <td>Hirvonen, Mrs. Alexander (Helga E Lindqvist)</td>\n",
       "      <td>female</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>3101298</td>\n",
       "      <td>12.2875</td>\n",
       "      <td>None</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>413</th>\n",
       "      <td>1305</td>\n",
       "      <td>3</td>\n",
       "      <td>Spector, Mr. Woolf</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>A.5. 3236</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>None</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>414</th>\n",
       "      <td>1306</td>\n",
       "      <td>1</td>\n",
       "      <td>Oliva y Ocana, Dona. Fermina</td>\n",
       "      <td>female</td>\n",
       "      <td>39.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17758</td>\n",
       "      <td>108.9000</td>\n",
       "      <td>C105</td>\n",
       "      <td>C</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>415</th>\n",
       "      <td>1307</td>\n",
       "      <td>3</td>\n",
       "      <td>Saether, Mr. Simon Sivertsen</td>\n",
       "      <td>male</td>\n",
       "      <td>38.5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>SOTON/O.Q. 3101262</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>None</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>416</th>\n",
       "      <td>1308</td>\n",
       "      <td>3</td>\n",
       "      <td>Ware, Mr. Frederick</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>359309</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>None</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>417</th>\n",
       "      <td>1309</td>\n",
       "      <td>3</td>\n",
       "      <td>Peter, Master. Michael J</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2668</td>\n",
       "      <td>22.3583</td>\n",
       "      <td>None</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>418 rows × 12 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     passengerid  pclass                                          name  \\\n",
       "0            892       3                              Kelly, Mr. James   \n",
       "1            893       3              Wilkes, Mrs. James (Ellen Needs)   \n",
       "2            894       2                     Myles, Mr. Thomas Francis   \n",
       "3            895       3                              Wirz, Mr. Albert   \n",
       "4            896       3  Hirvonen, Mrs. Alexander (Helga E Lindqvist)   \n",
       "..           ...     ...                                           ...   \n",
       "413         1305       3                            Spector, Mr. Woolf   \n",
       "414         1306       1                  Oliva y Ocana, Dona. Fermina   \n",
       "415         1307       3                  Saether, Mr. Simon Sivertsen   \n",
       "416         1308       3                           Ware, Mr. Frederick   \n",
       "417         1309       3                      Peter, Master. Michael J   \n",
       "\n",
       "        sex   age  sibsp  parch              ticket      fare cabin embarked  \\\n",
       "0      male  34.5      0      0              330911    7.8292  None        Q   \n",
       "1    female  47.0      1      0              363272    7.0000  None        S   \n",
       "2      male  62.0      0      0              240276    9.6875  None        Q   \n",
       "3      male  27.0      0      0              315154    8.6625  None        S   \n",
       "4    female  22.0      1      1             3101298   12.2875  None        S   \n",
       "..      ...   ...    ...    ...                 ...       ...   ...      ...   \n",
       "413    male   NaN      0      0           A.5. 3236    8.0500  None        S   \n",
       "414  female  39.0      0      0            PC 17758  108.9000  C105        C   \n",
       "415    male  38.5      0      0  SOTON/O.Q. 3101262    7.2500  None        S   \n",
       "416    male   NaN      0      0              359309    8.0500  None        S   \n",
       "417    male   NaN      1      1                2668   22.3583  None        C   \n",
       "\n",
       "     predict_result  \n",
       "0                 0  \n",
       "1                 0  \n",
       "2                 0  \n",
       "3                 0  \n",
       "4                 0  \n",
       "..              ...  \n",
       "413               0  \n",
       "414               1  \n",
       "415               0  \n",
       "416               0  \n",
       "417               0  \n",
       "\n",
       "[418 rows x 12 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%thanosql \n",
    "PREDICT USING titanic_automl_classification\n",
    "OPTIONS (\n",
    "    result_col='predict_result'\n",
    "    )\n",
    "AS \n",
    "SELECT * \n",
    "FROM titanic_test"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a0ef3104",
   "metadata": {},
   "source": [
    "<div class=\"admonition note\">\n",
    "    <h4 class=\"admonition-title\">Query Details</h4>\n",
    "    <ul>\n",
    "        <li>\"<strong>PREDICT USING</strong>\" predicts the outcome using the <strong>titanic_automl_classification</strong>.</li>\n",
    "        <li>\"<strong>OPTIONS</strong>\" specifies the option values to be used for prediction.\n",
    "        <ul>\n",
    "            <li>\"result_col\": the column that contains the predicted results (str, optional, default: 'predict_result')</li>\n",
    "        </ul>\n",
    "        </li>\n",
    "    </ul>\n",
    "</div>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "5389cb4a",
   "metadata": {},
   "source": [
    "## __5. In Conclusion__\n",
    "\n",
    "In this tutorial, we created a Titanic survivor classification model using the Titanic: Machine Learning from Disaster dataset from [Kaggle](https://www.kaggle.com/). As this is a beginner-level tutorial, we focused on the process rather than accuracy.\n",
    "\n",
    "* [How to Upload My Data to the ThanoSQL Workspace](https://docs.thanosql.ai/1.5/en/getting_started/data_upload/)\n",
    "* [How to Create a Table Using My Data](https://docs.thanosql.ai/1.5/en/how-to_guides/ThanoSQL_query/COPY_SYNTAX/)\n",
    "* [How to Upload My Model to the ThanoSQL Workspace](https://docs.thanosql.ai/1.5/en/how-to_guides/ThanoSQL_query/UPLOAD_MODEL_SYNTAX/)\n",
    "\n",
    "<div class=\"admonition tip\">\n",
    "    <h4 class=\"admonition-title\">Inquiries About Deploying a Model for Your Own Service</h4>\n",
    "    <p>If you have any difficulties creating your own model using ThanoSQL or applying it to your service, please feel free to contact us below😊</p>\n",
    "    <p>For inquiries regarding building a classification model: <a href=\"mailto:contact@smartmind.team\">contact@smartmind.team</a></p>\n",
    "</div>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  },
  "vscode": {
   "interpreter": {
    "hash": "54a1ec72395a4a5a649013bb47cb6c1a711fb4b3d33a07524a09f31d6d2ee0ec"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}