# __Create a Classification Model Using AutoML__

- Tutorial difficulty: â˜…â˜†â˜†â˜†â˜†
- 4 min read
- Languages: [SQL](https://en.wikipedia.org/wiki/SQL) (100%)
- File location: tutorial_en/thanosql_ml/classification/automl_classification.ipynb
- References: [(Kaggle) Titanic - Machine Learning from Disaster](https://www.kaggle.com/competitions/titanic/overview)

## Tutorial Introduction

<div class="admonition note">
    <h4 class="admonition-title">Understanding Classification</h4>
    <p>Classification is a type of <a href="https://en.wikipedia.org/wiki/Machine_learning">Machine Learning</a> that predicts which category(Category or Class) the target belongs to. For example, both binary classifications(used for classifying men or women) and multiple classifications(used to predict animal species such as dogs, cats, rabbits, etc.) are included in the classification tasks. <br></p>
</div>

To predict whether or not a potential customer will react positively to a particular marketing promotion in your company, you can use your customer's [Customer Relationship Management (CRM)](https://en.wikipedia.org/wiki/Customer_relationship_management) data(demographic information, customer behavior/search data, etc.). In this case, the <a href="https://en.wikipedia.org/wiki/Feature_(machine_learning)">features</a> expressed in the CRM data are used as the input data, and the target value, which is the value to be predicted, is whether the target customer's response to the promotion is positive(1 or True) or negative(0 or False). By using this classification model, you can predict the reaction of customers who have not been exposed to advertisements and target the appropriate customers, thereby continuously increasing marketing efficiency.

__The following are examples and applications of the ThanoSQL classification model.__

- The classification model enables early detection of current user deviations and allows proactive response to problems(deviations). Collected data can help you identify the features of leaving customers and allow you to take appropriate action by discovering leaving customers in advance. This can help prevent customer defections and increase sales.
- You can predict the [Market Segmentation](https://en.wikipedia.org/wiki/Market_segmentation) involved in your online platform. Most service users have different characteristics, behaviors, and needs. Classification models utilize the users' features to identify granular groups and enable them to develop strategies tailored to them.  

<div class="admonition note">
    <h4 class="admonition-title">In This Tutorial</h4>
    <p>ðŸ‘‰ Create a classification model for survivors using the  <strong>Titanic: Machine Learning from Disaster</strong> dataset from the machine learning contest platform <a href="https://www.kaggle.com/">Kaggle</a>. The goals of this competition are as follows:
    (For reference, the data for the event is a list of real passengers who were on board during the Titanic incident on April 15, 1912.)</p>
</div>

__Predicting Passengers Who Would Survive The Titanic Incident__

ThanoSQL provides automated machine learning(__AutoML__) tools. This tutorial uses __AutoML__ to predict passengers who would survive the Titanic incident. ThanoSQL's __AutoML__ automates the process for model development and enables data collection and storage along with machine learning model development and distribution(end-to-end machine learning pipelines) using a single language.

__Automated ML has the following advantages:__

1. Implementation and deployment of machine learning solutions without extensive programming or data science knowledge
2. Saving time and resources for deployment of development models
3. Quickly solve problems using the data you have for decision-making

Now let's use ThanoSQL to create a classification model that predicts passengers who would survive the Titanic incident.

## __0. Prepare Dataset__

As mentioned in the [ThanoSQL Workspace](https://docs.thanosql.ai/1.5/en/getting_started/paas/workspace/lab/), you must create an API token and run the query below to execute the query of ThanoSQL. 

In [None]:
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>

### __Prepare Dataset__

In [2]:
%%thanosql
GET THANOSQL DATASET titanic_data
OPTIONS (overwrite=True)

Success


<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>GET THANOSQL DATASET</strong>" downloads the specified dataset to the workspace.</li>
        <li>"<strong>OPTIONS</strong>" specifies the option values to be used for the <strong>GET THANOSQL DATASET</strong> clause.
        <ul>
            <li>"overwrite": determines whether to overwrite a dataset if it already exists. If set as True, the old dataset is replaced with the new dataset (bool, optional, True|False, default: False)</li>
        </ul>
        </li>
    </ul>
</div>

In [3]:
%%thanosql
COPY titanic_train 
OPTIONS (if_exists='replace')
FROM 'thanosql-dataset/titanic_data/titanic_train.csv'

Success


In [4]:
%%thanosql
COPY titanic_test 
OPTIONS (if_exists='replace')
FROM 'thanosql-dataset/titanic_data/titanic_test.csv'

Success


<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>COPY</strong>" specifies the name of the dataset to be saved as a database table.</li>
        <li>"<strong>OPTIONS</strong>" specifies the option values to be used for the <strong>COPY</strong> clause.
        <ul>
           <li>"if_exists": determines how the function should handle the case where the table already exists, it can either raise an error, append to the existing table, or replace the existing table (str, optional, 'fail'|'replace'|'append', default: 'fail')</li>
        </ul>
        </li>
    </ul>
</div>

## __1. Check Dataset__

To create the survivor classification model, we use the <strong>titanic_train</strong> table located in the ThanoSQL workspace database. Run the query below to check the contents of the table.

In [5]:
%%thanosql
SELECT * 
FROM titanic_train
LIMIT 5 

Unnamed: 0,passengerid,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


<div class="admonition note">
    <h4 class="admonition-title">Understanding the Data Table</h4>
    <p>The <strong>tianic_train</strong> dataset contains the following columns.</p>
    <ul>
        <li>passengerid: passenger ID</li>
        <li>survived: whether the passenger on board survived</li>
        <li>pclass: passenger ticket class</li>
        <li>name: passenger name</li>
        <li>sex: passenger gender</li>
        <li>age: passenger age</li>
        <li>sibsp: number of siblings or spouses on board</li>
        <li>parch: number of parents or children on board</li>
        <li>ticket: ticket number</li>
        <li>fare: fare</li>
        <li>cabin: cabin number</li>
        <li>embarked: boarding location or port</li>
    </ul>
</div>

In this tutorial, we will exclude the name, ticket, cabin, and passengerid columns since they require additional data preprocessing.

## __2. Build a Classification Model__

To create a survivor classification model with the name __titanic_automl_classification__ using the __titanic_train__ table, run the following query.  
(Estimated duration of query execution: 8 min)

In [6]:
%%thanosql
BUILD MODEL titanic_automl_classification
USING AutomlClassifier 
OPTIONS (
    target_col='survived', 
    impute_type='iterative',  
    features_to_drop=['name', 'ticket', 'passengerid', 'cabin'],
    time_left_for_this_task=300,
    overwrite=True
    ) 
AS 
SELECT * 
FROM titanic_train

Success


<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>BUILD MODEL</strong>" creates and trains a model named <strong>titanic_automl_classification</strong>.</li>
        <li>"<strong>OPTIONS</strong>" specifies the option values used to create the model.
        <ul> 
            <li>"target_col": the name of the column containing the target value of the classification model (str, default: 'target') </li>
            <li>"impute_type": determines how empty values â€‹â€‹(NaNs) are handled (str, optional, 'simple'|'iterative' , default: 'simple') </li>
            <li>"features_to_drop": selects columns that cannot be used for training (list[str], optional)</li>
            <li>"time_left_for_this_task": the total time given to find a suitable classification model in seconds (int, optional, default: 60)</li>
            <li>"overwrite": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)</li>
        </ul>
        </li>
    </ul>
</div>


## __3. Evaluate the Model__

To evaluate the performance of the model created in the previous step, run the following query.

In [7]:
%%thanosql 
EVALUATE USING titanic_automl_classification 
OPTIONS (
    target_col='survived'
    )
AS
SELECT *
FROM titanic_train

Unnamed: 0,metric,score
0,Accuracy,0.923681
1,ROCAUC,0.927237
2,Recall,0.939103
3,Precision,0.856725
4,F1-Score,0.896024
5,Kappa,0.835941
6,MCC,0.838139


<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>EVALUATE USING</strong>" evaluates the <strong>titanic_automl_classification model</strong>. </li>
        <li>"<strong>OPTIONS</strong>" specifies the option values used to evaluate the model.
        <ul> 
            <li>"target_col": the name of the column containing the target value of the classification model (str, default: 'target') </li>
        </ul>
        </li>
    </ul>
</div>

<div class="admonition warning">
    <h4 class="admonition-title">Dataset for Evaluation</h4>
    <p>Normally, train datasets should not be used for evaluation. However, for this tutorial, the train dataset is used for convenience.</p>
</div>

## __4. Predict Survivors__

To use the classification model created in the previous step for prediction of <strong>titanic_test</strong>, run the following query.

In [8]:
%%thanosql 
PREDICT USING titanic_automl_classification
OPTIONS (
    result_col='predict_result'
    )
AS 
SELECT * 
FROM titanic_test

Unnamed: 0,passengerid,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,predict_result
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q,0
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S,0
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q,0
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S,0
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S,0
...,...,...,...,...,...,...,...,...,...,...,...,...
413,1305,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S,0
414,1306,1,"Oliva y Ocana, Dona. Fermina",female,39.0,0,0,PC 17758,108.9000,C105,C,1
415,1307,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S,0
416,1308,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S,0


<div class="admonition note">
    <h4 class="admonition-title">Query Details</h4>
    <ul>
        <li>"<strong>PREDICT USING</strong>" predicts the outcome using the <strong>titanic_automl_classification</strong>.</li>
        <li>"<strong>OPTIONS</strong>" specifies the option values to be used for prediction.
        <ul>
            <li>"result_col": the column that contains the predicted results (str, optional, default: 'predict_result')</li>
        </ul>
        </li>
    </ul>
</div>

## __5. In Conclusion__

In this tutorial, we created a Titanic survivor classification model using the Titanic: Machine Learning from Disaster dataset from [Kaggle](https://www.kaggle.com/). As this is a beginner-level tutorial, we focused on the process rather than accuracy.

* [How to Upload My Data to the ThanoSQL Workspace](https://docs.thanosql.ai/1.5/en/getting_started/data_upload/)
* [How to Create a Table Using My Data](https://docs.thanosql.ai/1.5/en/how-to_guides/ThanoSQL_query/COPY_SYNTAX/)
* [How to Upload My Model to the ThanoSQL Workspace](https://docs.thanosql.ai/1.5/en/how-to_guides/ThanoSQL_query/UPLOAD_MODEL_SYNTAX/)

<div class="admonition tip">
    <h4 class="admonition-title">Inquiries About Deploying a Model for Your Own Service</h4>
    <p>If you have any difficulties creating your own model using ThanoSQL or applying it to your service, please feel free to contact us belowðŸ˜Š</p>
    <p>For inquiries regarding building a classification model: <a href="mailto:contact@smartmind.team">contact@smartmind.team</a></p>
</div>