Skip to content

Compute

1. Compute Statement

The "Compute" statement allows users to process data using ThanoSQL's "Compute" functions.

2. Compute Functions

2-1. groupby_emb_mean

The "groupby_emb_mean" is a function that groups by labels if embeddings and a label column exist in the data table and averages the embedding values.

groupby_emb_mean Syntax

query_statement:
    query_expr

FUNCTION compute.groupby_emb_mean
OPTIONS (
    expression [ , ...]
    )
{ AS (query_expr) | FROM {file_path_expression} } 

OPTIONS Clause

OPTIONS (
    (label_col=column_name),
    (emb_col=column_name),
    [result_col=expression]
    )

The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.

  • "label_col": the name of the column containing information about the label (str)
  • "emb_col": the name of the column containing information about the embedding (str)
  • "result_col": the name of the column containing the embeddings' mean values (str, optional, default: 'emb_mean_result')

groupby_emb_mean Example

%%thanosql
FUNCTION compute.groupby_emb_mean
OPTIONS (
    label_col='label',
    emb_col='convert_result',
    result_col='result_col'
    )
AS
SELECT * FROM mnist_train_test

2-2. similarity

The "similarity" is a function that uses the average embedding value of the unlabeled data to obtain similarity and label the data.

similarity Syntax

query_statement:
    query_expr

FUNCTION compute.similarity
OPTIONS (
    expression [ , ...]
    )
{ AS (query_expr) | FROM {file_path_expression} } 

OPTIONS Clause

OPTIONS (
    (emb_col=column_name),
    (src_table=expression),
    (src_emb_col=column_name),
    (src_label_col=column_name),
    [top_k=VALUE],
    [result_col=expression]
    )

The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.

  • "emb_col": the name of the column to be labeled (str)
  • "src_table": the name of the table containing information about the average embedding value (str)
  • "src_emb_col": the name of the column containing the embedding value (str)
  • "src_label_col": the name of the column containing the label (str)
  • "top_k": the number of label to be provided (int, optional, default: 1)
  • "result_col": the name of the column containing the similarity values (str, optional, default: 'similarity_result')

similarity Example

%%thanosql
FUNCTION compute.similarity
OPTIONS (
    emb_col='convert_result',
    src_table='mnist_train_test',
    src_emb_col='convert_result',
    src_label_col='label',
    top_k=3,
    result_col='result_col'
    )
AS
SELECT * FROM mnist_train_test

Last update: 2023-01-03