Skip to content

COPY

1. COPY Statement

The "COPY" statement allows users to create data tables in the ThanoSQL workspace database with their data files, data folders, and Pandas DataFrame within their workspace.

Supported File Types

  • csv
  • pkl, pickle
  • xls, xlsx, xlsm, xlsb
  • pqt, parquet

Warning

  • The PostgreSQL COPY clause is not supported.

2. COPY Syntax

COPY (table_name_expression)
OPTIONS (
    expression [ , ...]
    )
FROM {file_path | dir_path}

Query Details

  • The "OPTIONS" clause allows you to change the value of a parameter. The definition of each parameter is as follows.
    • "if_exists": determines how the function should handle the case where the table already exists, it can either raise an error, append to the existing table, or replace the existing table (str, optional, 'fail'|'replace'|'append', default: 'fail')
      • "fail": raise an error if there is already an existing table
      • "replace": replace a table if there is already an existing table
      • "append": append given dataframe to an existing table

3. COPY Example

3-1. Using the Path of the Data File

The example below demonstrates how to use a data file for the "COPY" clause. A specified file with a path as an input would be read by the ThanoSQL Engine and recreated as a table within a database.

%%thanosql
COPY mytable
OPTIONS (if_exists='replace')
FROM 'data/example.csv'

3-2. Using the Path of the Data Folder

The example below demonstrates how to use a data directory for the "COPY" clause. A specified folder with a path as an input would be read by the ThanoSQL Engine and recreated as a table within a database. 

Using COPY with Data Folders

  • If the path to the folder containing the images, audios, or videos is given as an input, the “COPY” clause will translate each file as a row and recreate it as a data table.
%%thanosql
COPY mytable
OPTIONS (if_exists='replace')
FROM 'diet_image_data/'

3-3. Using a Pandas DataFrame

The example below demonstrates how to use a Pandas DataFrame for the "COPY" clause. A specified dataframe as an input would be read by the ThanoSQL Engine and recreated as a table within a database. 

Prepare a Pandas DataFrame

# create a Pandas DataFrame
df = pd.read_csv("./diet_image_data/sample.csv")
# df must to be converted to JSON and 'orient' must be specified as 'records' 
df_in_json = df.to_json(orient="records")

# use a f-string to wrap 'df_in_json' within the COPY clause 
copy_pandas_df = f'''
COPY mytable
OPTIONS (if_exists='replace')
FROM '{df_in_json}'
'''

COPY a Pandas DataFrame

%thanosql $copy_pandas_df

Warning

  • A Pandas DataFrame must be converted to JSON before being wrapped in the COPY clause. 
  • Converted JSON variable must be enclosed with double quotes inside the COPY clause.
  • ${variable_name} should be followed by the %thanosql.

Last update: 2023-11-21