Skip to content

Albert & Electra

Notation Conventions

  • Parentheses () indicate literal parentheses.
  • Braces {} are used to bind combinations of options.
  • The bracket [] indicates an optional clause.
  • An ellipsis following a comma in brackets [,...] means that the preceding item can be repeated as a comma-separated list.
  • The vertical bar | represents the logic OR.
  • VALUE represents a regular value.
  • literal: a fixed or unchangeable value, also known as a Constant.

    Each literal has a special data type such as column, in the table.

BUILD MODEL Syntax

Use the "BUILD MODEL" statement to develop an AI model. The "BUILD MODEL" statement allows you to train a model using datasets defined with the query_expr that comes after the "AS" clause.

query_statement:
    query_expr

BUILD MODEL (model_name_expression)
USING { AlbertKo | AlbertEn | ElectraKo | ElectraEn }
OPTIONS (
    expression [ , ...]
    )
AS
(query_expr)

OPTIONS Clause

OPTIONS (
    (text_col=column_name),
    (label_col=column_name),
    [batch_size=VALUE],
    [max_epochs=VALUE],
    [learning_rate=VALUE],
    [overwrite={True|False}]
    )

The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.

  • "text_col": the name of the column containing the text to be used for the training (str, default: 'text')
  • "label_col": the name of the column containing information about the target (str, default: 'label')
  • "batch_size": the size of dataset bundle utilized in a single cycle of training (int, optional, default: 16)
  • "max_epochs": number of times to train with the training dataset (int, optional, default: 3)
  • "learning_rate": the learning rate of the model (float, optional, default: 1e-4)
  • "overwrite": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)

BUILD MODEL Example

An example "BUILD MODEL" query can be found in Create a Text Classification Model.

%%thanosql
BUILD MODEL my_movie_review_classifier
USING ElectraEn
OPTIONS (
  text_col='review',
  label_col='sentiment',
  epochs=1,
  batch_size=4,
  overwrite=True
  )
AS
SELECT *
FROM movie_review_train

FIT MODEL Syntax

Use the "FIT MODEL" statement to retrain an AI models. The "FIT MODEL" statement allows you to retrain a model using datasets defined with the query_expr that comes after the "AS" clause. In this case, the label of the data used for retraining should be the same as the label used for the previous training.

query_statement:
    query_expr

FIT MODEL (model_name_expression)
USING (model_name_expression)
OPTIONS (
    expression [ , ...]
    )
AS
(query_expr)

OPTIONS Clause

OPTIONS (
    (text_col=column_name),
    (label_col=column_name),
    [batch_size=VALUE],
    [max_epochs=VALUE],
    [learning_rate=VALUE],
    [overwrite={True|False}]
    )

The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.

  • "text_col": the name of the column containing the text to be used for the training (str, default: 'text')
  • "label_col": the name of the column containing information about the target (str, default: 'label')
  • "batch_size": the size of dataset bundle utilized in a single cycle of training (int, optional, default: 16)
  • "max_epochs": number of times to train with the training dataset (int, optional, default: 3)
  • "learning_rate": the learning rate of the model (float, optional, default: 1e-4)
  • "overwrite": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)

PREDICT Syntax

Use the "PREDICT" statement to apply AI models to perform prediction, classification, recommendation, and more. The "PREDICT" statement can preprocess the dataset defined by the query_expr that comes after the "AS" clause.

query_statement:
    query_expr

PREDICT USING (model_name_expression)
OPTIONS (
    expression [ , ...]
    )
AS
(query_expr)

OPTIONS Clause

OPTIONS (
    (text_col=column_name),
    [batch_size=VALUE],
    [result_col=column_name]
    )

The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.

  • "text_col": the column containing the text to be used for prediction (str, default: 'text')
  • "batch_size": the size of dataset bundle utilized in a single cycle of prediction (int, optional, default: 16)
  • "result_col": the column that contains the predicted results (str, optional, default: 'predict_result')

PREDICT Example

An example "PREDICT" query can be found in Create a Text Classification Model.

%%thanosql
PREDICT USING my_movie_review_classifier
OPTIONS (
    text_col='review'
    )
AS
SELECT *
FROM movie_review_test

EVALUATE Syntax

Use the "EVALUATE" statement to evaluate the AI model. The "EVALUATE" statement evaluates a model using the dataset defined by the query_expr that comes after the "AS" clause.

query_statement:
    query_expr

EVALUATE USING (model_name_expression)
OPTIONS (
    expression [ , ...]
    )
AS
(query_expr)

OPTIONS Clause

OPTIONS (
    (text_col=column_name),
    (label_col=column_name),
    [batch_size=VALUE]
    )

The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.

  • "text_col": the column containing the text to be used for evaluation (str, default: 'text')
  • "label_col": the name of the column containing information about the target (str, default: 'label')
  • "batch_size": the size of dataset bundle utilized in a single cycle of evaluation (int, optional, default: 16)

Last update: 2023-08-09