Skip to content

AutomlRegressor

Notation Conventions

  • Parentheses () indicate literal parentheses.
  • Braces {} are used to bind combinations of options.
  • The bracket [] indicates an optional clause.
  • An ellipsis following a comma in brackets [,...] means that the preceding item can be repeated as a comma-separated list
  • The vertical bar | represents the logic OR.
  • VALUE represents a regular value.
  • literal: a fixed or unchangeable value, also known as a Constant.

    Each literal has a special data type such as column, in the table.

BUILD MODEL Syntax

Use the "BUILD MODEL" statement to develop an AI model. The "BUILD MODEL" statement allows you to train a model using datasets defined with the query_expr that comes after the "AS" clause.

query_statement:
    query_expr

BUILD MODEL (model_name_expression)
USING AutomlRegressor
OPTIONS (
    expression [ , ...]
    )
AS
(query_expr)

OPTIONS Clause

OPTIONS (
    (target_col=column_name),
    [features_to_drop=[column_name, ...]],
    [impute_type={'simple'|'iterative'}],
    [datetime_attribs=[column_name, ...]],
    [outlier_method={'knn'|'iso'|'pca'}],
    [time_left_for_this_task=VALUE],
    [overwrite={True|False}]
    )

The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.

  • "target_col": the name of the column containing the target value of the regression model (str, default: 'target')
  • "features_to_drop": selects columns that cannot be used for training (list[str], optional)
  • "impute_type": determines how empty values ​​(NaNs) are handled (str, optional, 'simple'|'iterative' , default: 'simple')

    "simple": for empty values, categorical variables are treated as the most common value and continuous variables are treated as the mean "iterative": applies an algorithm that predicts empty values with the remaining properties

  • "datetime_attribs": selects columns corresponding to the date (list[str], optional)
  • "outlier_method": determines how outliers are handled in the table. If None, the table includes outliers (str, optional, 'knn'|'iso'|'pca', default: None)

    "knn": use a K-NN-based approach to detect abnormal samples based on the distance between each data.
    "iso": use Isolation Forest to randomly branch the data table on a tree basis, isolate all observations, and detect abnormal samples (Works efficiently on datasets with many variables)
    "pca": detect abnormal samples by reducing and restoring dimensions using the Principal Component Analysis(PCA)

  • "time_left_for_this_task": the total time given to find a suitable regression model in seconds (int, optional, default: 60)
  • "overwrite": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)

BUILD MODEL Example

An example "BUILD MODEL" query can be found in Create a Regression Model Using AutoML.

%%thanosql
BUILD MODEL bike_regression
USING AutomlRegressor
OPTIONS (
    target_col='count', 
    impute_type='simple', 
    datetime_attribs=['datetime'],
    time_left_for_this_task=300,
    overwrite=True
    ) 
AS
SELECT *
FROM bike_sharing_train

FIT MODEL Syntax

Use the "FIT MODEL" statement to retrain an AI models. The "FIT MODEL" statement allows you to retrain a model using datasets defined with the query_expr that comes after the "AS" clause. In this case, the label of the data used for retraining should be the same as the label used for the previous training.

query_statement:
    query_expr

FIT MODEL (model_name_expression)
USING (model_name_expression)
OPTIONS (
    expression [ , ...]
    )
AS
(query_expr)

OPTIONS Clause

OPTIONS (
    (target_col=column_name),
    [features_to_drop=[column_name, ...]],
    [impute_type={'simple'|'iterative'}],
    [datetime_attribs=[column_name, ...]],
    [outlier_method={'knn'|'iso'|'pca'}],
    [time_left_for_this_task=VALUE],
    [overwrite={True|False}]
    )

The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.

  • "target_col": the name of the column containing the target value of the regression model (str, default: 'target')
  • "features_to_drop": selects columns that cannot be used for training (list[str], optional)
  • "impute_type": determines how empty values ​​(NaNs) are handled (str, optional, 'simple'|'iterative' , default: 'simple')

    "simple": for empty values, categorical variables are treated as the most common value and continuous variables are treated as the mean "iterative": applies an algorithm that predicts empty values with the remaining properties

  • "datetime_attribs": selects columns corresponding to the date (list[str], optional)
  • "outlier_method": determines how outliers are handled in the table. If None, the table includes outliers (str, optional, 'knn'|'iso'|'pca', default: None)

    "knn": use a K-NN-based approach to detect abnormal samples based on the distance between each data.
    "iso": use Isolation Forest to randomly branch the data table on a tree basis, isolate all observations, and detect abnormal samples (Works efficiently on datasets with many variables)
    "pca": detect abnormal samples by reducing and restoring dimensions using the Principal Component Analysis(PCA)

  • "time_left_for_this_task": the total time given to find a suitable regression model in seconds (int, optional, default: 300)
  • "overwrite": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)

PREDICT Syntax

Use the "PREDICT" statement to apply AI models to perform prediction, classification, recommendation, and more. The "PREDICT" statement can preprocess the dataset defined by the query_expr that comes after the "AS" clause.

query_statement:
    query_expr

PREDICT USING (model_name_expression)
OPTIONS (
    expression [ , ...]
    )
AS
(query_expr)

OPTIONS Clause

OPTIONS (
    [result_col=column_name]
    )

The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.

  • "result_col": the column that contains the predicted results (str, optional, default: 'predict_result')

PREDICT Example

An example "PREDICT" query can be found in Create a Regression Model Using AutoML.

%%thanosql
PREDICT USING bike_regression
OPTIONS (
    result_col='predict_result'
    )
AS
SELECT *
FROM bike_sharing_test
LIMIT 10

EVALUATE Syntax

Use the "EVALUATE" statement to evaluate the AI model. The "EVALUATE" expression evaluates a model using the dataset defined by the query_expr that comes after the "AS" clause.

query_statement:
    query_expr

EVALUATE USING (model_name_expression)
OPTIONS (
    expression [ , ...]
    )
AS
(query_expr)

OPTIONS clause

OPTIONS (
    (target_col=column_name)
    )

The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.

  • "target_col": the name of the column containing the target value of the classification model (str, default: 'target')

EVALUATE Example

A sample EVALUATE query can be found in Create a Regression Model Using AutoML.

%%thanosql
EVALUATE USING bike_regression 
OPTIONS (
    target_col='count'
    ) 
AS
SELECT *
FROM bike_sharing_train

Last update: 2023-08-09