Search Video by Text¶
- Tutorial Difficulty: ★★☆☆☆
- 7 min read
- Languages: SQL (100%)
- File location: tutorial_en/thanosql_search/search_video_by_text.ipynb
- References: Kinetics-700, X-CLIP
Tutorial Introduction¶
Understanding Multi-modal Learning
Multi-modal refers to an environment in which various forms of information are communicated, where modality refers to data types. In the case of machine learning using multi-modal data, it enables an integrated analysis since it effectively learns from various forms of data such as image data, text data, and sensor data.
OpenAI's CLIP is a image-text multimodal deep learning model specialized in understanding text and images together.
The following are examples and applications of the ThanoSQL text-video search algorithm.
- Use text descriptions to search from your own videos to return videos containing the scenes you want.
- Search for the scene you want using text from YouTube videos and so on.
In This Tutorial
👉 This tutorial uses the kinetics700-2020 dataset. Kinetics is a large image dataset of human behavior released by DeepMind. Kinetics 700-2020 is a new version of the Kinetics dataset which was released in 2020 and includes images of 700 classes.
The ThanoSQL's X-CLIP model is a pre-built model that extends the existing image-text multimodal CLIP model to understand the relationship between video and text. In this tutorial, we'll use a model that inputs text to search for videos from within the ThanoSQL workspace database.
0. Prepare Dataset & Model¶
As mentioned in the ThanoSQL Workspace, you must create an API token and run the query below to execute the query of ThanoSQL.
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>
Prepare Dataset¶
%%thanosql
GET THANOSQL DATASET kinetics700_data
OPTIONS (overwrite=True)
Success
Query Details
- "GET THANOSQL DATASET" downloads the specified dataset to the workspace.
- "OPTIONS" specifies the option values to be used for the GET THANOSQL DATASET clause.
- "overwrite": determines whether to overwrite a dataset if it already exists. If set as True, the old dataset is replaced with the new dataset (bool, optional, True|False, default: False)
%%thanosql
COPY kinetics700
OPTIONS (if_exists='replace')
FROM 'thanosql-dataset/kinetics700_data/kinetics700.csv'
Success
Query Details
- "COPY" specifies the name of the dataset to be saved as a database table.
- "OPTIONS" specifies the option values to be used for the COPY clause.
- "if_exists": determines how the function should handle the case where the table already exists, it can either raise an error, append to the existing table, or replace the existing table (str, optional, 'fail'|'replace'|'append', default: 'fail')
Prepare the Model¶
%%thanosql
GET THANOSQL MODEL xclip
OPTIONS (
model_name='tutorial_search_xclip',
overwrite=True
)
Success
Query Details
- "GET THANOSQL MODEL" downloads the specified model to the workspace.
- "OPTIONS" specifies the option values to be used for the GET THANOSQL MODEL clause.
- "model_name": the model name to store a given model in the ThanoSQL workspace (str, optional)
- "overwrite": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)
1. Check Dataset¶
For this tutorial, we use the kinetics700 table located in the ThanoSQL workspace database. Run the query below to check the contents of the table.
%%thanosql
SELECT *
FROM kinetics700
LIMIT 5
video_path | label | duration | |
---|---|---|---|
0 | thanosql-dataset/kinetics700_data/video/-dhP2A... | checking tires | 10 |
1 | thanosql-dataset/kinetics700_data/video/1ejgHK... | testifying | 10 |
2 | thanosql-dataset/kinetics700_data/video/2Yvab3... | checking tires | 10 |
3 | thanosql-dataset/kinetics700_data/video/3nFLLc... | punching person (boxing) | 10 |
4 | thanosql-dataset/kinetics700_data/video/5PfhCJ... | kitesurfing | 10 |
Understanding the Data Table
The kinetics700 table contains the following information.
- video_path: video path
- label: video label
- duration: video time
%%thanosql
PRINT VIDEO
AS
SELECT video_path
FROM kinetics700
LIMIT 2
/home/jovyan/thanosql-dataset/kinetics700_data/video/-dhP2AH0eqI.mp4
/home/jovyan/thanosql-dataset/kinetics700_data/video/1ejgHKw8E3Y.mp4
2. Convert Using a Pre-built Model¶
To vectorize the kinetics700 videos, run the "CONVERT USING" query. The vectorized results are stored in a user-defined column(default: 'convert_result') in the kinetics700 table.
%%thanosql
CONVERT USING tutorial_search_xclip
OPTIONS (
video_col='video_path',
result_col='convert_result'
)
AS
SELECT *
FROM kinetics700
video_path | label | duration | convert_result | |
---|---|---|---|---|
0 | thanosql-dataset/kinetics700_data/video/-dhP2A... | checking tires | 10 | [b'\x16', b'm', b'\xfb', b'\xbe', b'\xba', b'!... |
1 | thanosql-dataset/kinetics700_data/video/1ejgHK... | testifying | 10 | [b'X', b'\x05', b'\xe7', b'\xbe', b'\xf1', b'\... |
2 | thanosql-dataset/kinetics700_data/video/2Yvab3... | checking tires | 10 | [b'\x10', b'\x96', b'\xfa', b'\xbe', b'\xff', ... |
3 | thanosql-dataset/kinetics700_data/video/3nFLLc... | punching person (boxing) | 10 | [b'\x19', b'O', b' ', b'?', b'\xf8', b'\xbf', ... |
4 | thanosql-dataset/kinetics700_data/video/5PfhCJ... | kitesurfing | 10 | [b' ', b'u', b'\xa3', b'?', b'\x12', b'D', b'L... |
... | ... | ... | ... | ... |
93 | thanosql-dataset/kinetics700_data/video/wwgl_8... | land sailing | 10 | [b'\xb6', b'7', b'\xa0', b'?', b'\x9e', b']', ... |
94 | thanosql-dataset/kinetics700_data/video/xICkLB... | cutting nails | 10 | [b'\xcd', b'\xe3', b'\x04', b'\xbf', b'\x94', ... |
95 | thanosql-dataset/kinetics700_data/video/xlRC0n... | testifying | 10 | [b'\x86', b'\xbb', b'_', b'\xbd', b'\x97', b'\... |
96 | thanosql-dataset/kinetics700_data/video/yyy2Vy... | bench pressing | 10 | [b'B', b'\xd0', b'%', b'?', b'\xae', b'=', b'\... |
97 | thanosql-dataset/kinetics700_data/video/zb9HGN... | country line dancing | 10 | [b'>', b'\x1b', b'u', b'?', b'\xbe', b't', b'\... |
98 rows × 4 columns
Query Details
- "CONVERT USING" uses tutorial_search_xclip as an algorithm for video vectorizaion.
- "OPTIONS" specifies the options to be used for text vectorization.
- "video_col": the name of the column containing the video path (str, default: 'video_path')
- "result_col": defines the column name that contains the vectorized results (str, optional, default: 'convert_result')
Execute the "CONVERT USING" query statement below and save the converted result in a new table so that it can be used with other ThanoSQL query statements.
%%thanosql
CREATE TABLE kinetics700_convert_en AS
SELECT * FROM (
CONVERT USING tutorial_search_xclip
OPTIONS (
video_col='video_path',
result_col='convert_result'
)
AS
SELECT *
FROM kinetics700
)
Success
3. Search¶
Perform a text-based video search using the "SEARCH VIDEO" query statement and the tutorial_search_xclip model. Execute the following query with the text value "bench press" and the embedded kinetics700 videos to calculate the similarity.
%%thanosql
SELECT video_path, label, score
FROM (
SEARCH VIDEO
USING tutorial_search_xclip
OPTIONS (
search_by='text',
search_input='bench press',
emb_col='convert_result',
result_col='score',
top_k=10
)
AS
SELECT *
FROM kinetics700_convert_en
)
video_path | label | score | |
---|---|---|---|
0 | thanosql-dataset/kinetics700_data/video/qNB9qv... | bench pressing | 0.312154 |
1 | thanosql-dataset/kinetics700_data/video/yyy2Vy... | bench pressing | 0.274932 |
2 | thanosql-dataset/kinetics700_data/video/a9S4Ox... | golf chipping | 0.202286 |
3 | thanosql-dataset/kinetics700_data/video/ML7Oll... | snowkiting | 0.198646 |
4 | thanosql-dataset/kinetics700_data/video/zb9HGN... | country line dancing | 0.196104 |
5 | thanosql-dataset/kinetics700_data/video/AfKqHI... | parasailing | 0.193912 |
6 | thanosql-dataset/kinetics700_data/video/6MWLkJ... | kitesurfing | 0.192512 |
7 | thanosql-dataset/kinetics700_data/video/BEnKTN... | snowkiting | 0.190333 |
8 | thanosql-dataset/kinetics700_data/video/aKcKTY... | opening bottle (not wine) | 0.187058 |
9 | thanosql-dataset/kinetics700_data/video/8DIU9c... | playing squash or racquetball | 0.186208 |
Query Details
- "SEARCH VIDEO" searches for videos. Input the text description of the video using the "text" variable.
- "USING" specifies tutorial_search_xclip as the model.
- "OPTIONS" specifies the option values required for video vectorization.
- "search_by": defines the image|text|audio|video type to be used for the search (str)
- "search_input": defines the input to be used for the search (str)
- "emb_col": the column that contains the vectorized results (str)
- "result_col": defines the name of the column that contains the search results (str, optional, default: 'search_result')
- "top_k": number of rows to return. If set as None, returns the entire data table (int, optional, default: 1000)
- "AS" defines the embedding table to be used for search. In this example, the kinetics700 table is used.
%%thanosql
PRINT VIDEO
AS (
SELECT video_path
FROM (
SEARCH VIDEO
USING tutorial_search_xclip
OPTIONS (
search_by='text',
search_input='bench press',
emb_col='convert_result',
result_col='score',
top_k=2
)
AS
SELECT *
FROM kinetics700_convert_en
)
)
/home/jovyan/thanosql-dataset/kinetics700_data/video/qNB9qv6PqwI.mp4
/home/jovyan/thanosql-dataset/kinetics700_data/video/yyy2Vy_5DjI.mp4
4. In Conclusion¶
In this tutorial, we searched for videos in the kinetics700 dataset by text using a multi-modal text/video vectorization model. As this is a beginner-level tutorial, we focused on the process and showing visible results rather than accuracy. The video search can retrieve more accurate results by utilizing various queries.
- How to Upload My Data to the ThanoSQL Workspace
- How to Create a Table Using My Data
- How to Upload My Model to the ThanoSQL Workspace
Inquiries About Deploying a Model for Your Own Service
If you have any difficulties creating your own model using ThanoSQL or applying it to your services, please feel free to contact us below😊
For inquiries regarding building a text-video search models: contact@smartmind.team