AutomlClassifier¶
Notation Conventions
- Parentheses
()
indicate literal parentheses. - Braces
{}
are used to bind combinations of options. - The bracket
[]
indicates an optional clause. - An ellipsis following a comma in brackets [,...] means that the preceding item can be repeated as a comma-separated list
- The vertical bar
|
represents the logicOR
. - VALUE represents a regular value.
- literal: a fixed or unchangeable value, also known as a Constant.
Each literal has a special data type such as column, in the table.
BUILD MODEL Syntax¶
Use the "BUILD MODEL" statement to develop an AI model. The "BUILD MODEL" statement allows you to train a model using datasets defined with the query_expr that comes after the "AS" clause.
query_statement:
query_expr
BUILD MODEL (model_name_expression)
USING AutomlClassifier
OPTIONS (
expression [ , ...]
)
AS
(query_expr)
OPTIONS Clause
OPTIONS (
(target_col=column_name),
[features_to_drop=[column_name, ...]],
[impute_type={'simple'|'iterative'}],
[datetime_attribs=[column_name, ...]],
[outlier_method={'knn'|'iso'|'pca'}],
[time_left_for_this_task=VALUE],
[overwrite={True|False}]
)
The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.
- "target_col": the name of the column containing the target value of the classification model (str, default: 'target')
- "features_to_drop": selects columns that cannot be used for training (list[str], optional)
- "impute_type": determines how empty values (NaNs) are handled (str, optional, 'simple'|'iterative' , default: 'simple')
"simple": for empty values, categorical variables are treated as the most common value and continuous variables are treated as the mean
"iterative": applies an algorithm that predicts empty values with the remaining properties - "datetime_attribs": selects columns corresponding to the date (list[str], optional)
- "outlier_method": determines how outliers are handled in the table. If None, the table includes outliers (str, optional, 'knn'|'iso'|'pca', default: None)
"knn": use a K-NN-based approach to detect abnormal samples based on the distance between each data
"iso": use Isolation Forest to randomly branch the data table on a tree basis, isolate all observations, and detect abnormal samples (Works efficiently on datasets with many variables)
"pca": detect abnormal samples by reducing and restoring dimensions using the Principal Component Analysis(PCA) - "time_left_for_this_task": the total time given to find a suitable classification model in seconds (int, optional, default: 60)
- "overwrite": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)
BUILD MODEL Example
An example "BUILD MODEL" query can be found in Create a Classification Model Using AutoML.
%%thanosql
BUILD MODEL titanic_automl_classification
USING AutomlClassifier
OPTIONS (
target_col='survived',
features_to_drop=['name', 'ticket', 'passengerid', 'cabin'],
impute_type='iterative',
time_left_for_this_task=300,
overwrite=True
)
AS
SELECT *
FROM titanic_train
FIT MODEL Syntax¶
Use the "FIT MODEL" statement to retrain an AI models. The "FIT MODEL" statement allows you to retrain a model using datasets defined with the query_expr that comes after the "AS" clause. In this case, the label of the data used for retraining should be the same as the label used for the previous training.
query_statement:
query_expr
FIT MODEL (model_name_expression)
USING (model_name_expression)
OPTIONS (
expression [ , ...]
)
AS
(query_expr)
OPTIONS Clause
OPTIONS (
(target_col=column_name),
[features_to_drop=[column_name, ...]],
[impute_type={'simple'|'iterative'}],
[datetime_attribs=[column_name, ...]],
[outlier_method={'knn'|'iso'|'pca'}],
[time_left_for_this_task=VALUE],
[overwrite={True|False}]
)
The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.
- "target_col": the name of the column containing the target value of the classification model (str, default: 'target')
- "features_to_drop": selects columns that cannot be used for training (list[str], optional)
- "impute_type": determines how empty values (NaNs) are handled (str, optional, 'simple'|'iterative' , default: 'simple')
"simple": for empty values, categorical variables are treated as the most common value and continuous variables are treated as the mean
"iterative": applies an algorithm that predicts empty values with the remaining properties - "datetime_attribs": selects columns corresponding to the date (list[str], optional)
- "outlier_method": determines how outliers are handled in the table. If None, the table includes outliers (str, optional, 'knn'|'iso'|'pca', default: None)
"knn": use a K-NN-based approach to detect abnormal samples based on the distance between each data
"iso": use Isolation Forest to randomly branch the data table on a tree basis, isolate all observations, and detect abnormal samples (Works efficiently on datasets with many variables)
"pca": detect abnormal samples by reducing and restoring dimensions using the Principal Component Analysis(PCA) - "time_left_for_this_task": the total time given to find a suitable classification model in seconds (int, optional, default: 60)
- "overwrite": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)
PREDICT Syntax¶
Use the "PREDICT" statement to apply AI models to perform prediction, classification, recommendation, and more. The "PREDICT" statement can preprocess the dataset defined by the query_expr that comes after the "AS" clause.
query_statement:
query_expr
PREDICT USING (model_name_expression)
OPTIONS (
expression [ , ...]
)
AS
(query_expr)
OPTIONS Clause
OPTIONS (
[result_col=column_name]
)
The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.
- "result_col": the column that contains the predicted results (str, optional, default: 'predict_result')
PREDICT Example
An example "PREDICT" query can be found in Create a Classification Model Using AutoML.
%%thanosql
PREDICT USING titanic_automl_classification
OPTIONS (
result_col='predict_result'
)
AS
SELECT *
FROM titanic_test
EVALUATE Syntax¶
Use the "EVALUATE" statement to evaluate the AI model. The "EVALUATE" expression evaluates a model using the dataset defined by the query_expr that comes after the "AS" clause.
query_statement:
query_expr
EVALUATE USING (model_name_expression)
OPTIONS (
expression [ , ...]
)
AS
(query_expr)
OPTIONS clause
OPTIONS (
(target_col=column_name)
)
The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.
- "target_col": the name of the column containing the target value of the classification model (str, default: 'target')
EVALUATE Example
An example "EVALUATE" query can be found in Create a Classification Model Using AutoML.
%%thanosql
EVALUATE USING titanic_automl_classification
OPTIONS (
target_col='survived'
)
AS
SELECT *
FROM titanic_train