Search Image by Text¶
- Tutorial Difficulty: ★★☆☆☆
- 7 min read
- Languages: SQL (100%)
- File location: tutorial_en/thanosql_search/search_image_by_text.ipynb
- References: Unsplash Dataset - Lite, Learning Transferable Visual Models From Natural Language Supervision
Tutorial Introduction¶
Understanding Vectorizaion
For computers to understand human language, it must be vectorized. Recently, studies on pre-built models such as BERT and GPT-3 have been actively carried out, showing remarkable results. These models identify the meaning of each sentence based on Self-Supervised Learning, and sentences with similar meanings are vectorized and placed close to each other in a low-dimensional space. Self-supervised learning allows learning without labeling by determining whether each sentence/context is true/false. It randomly shuffles the order between sentences or masks some words.
The handling of different forms of input, such as texts and images, together is called multi-modal. "CLIP: Connecting Text and Image" utilizes a multi-modal model to understand low-dimensional vectors. While previous models only trained features of the image, the multi-modal model can train both images and texts. It can even learn the features of the texts that describe the images. In addition, by placing texts and images together in a low-dimensional space, the similarity between texts and images can be calculated. From this, a search algorithm can be derived.
ThanoSQL uses machine learning algorithms to vectorize datasets. The vectorized data is stored in a database column, and is used to search for similar images. This can be done by calculating the similarity scores.
The following are examples and applications of the ThanoSQL text-image search algorithm.
- Describe the desired scene of an image or video you have with text and search for the image that is the most similar to it. Using text-based descriptions rather than the keywords of a product, search for the most similar product images.
- Search for a timestamp in a Youtube video to place an advertisement in. Easily search for mountain or camping scenes to insert your travel advertisement.
In This Tutorial
👉 Unsplash released images taken by more than 200,000 photographers for free to be used as an AI dataset. Unsplash Dataset - Lite consists of 25,000 nature-themed images with 25,000 keywords.
In this tutorial, we will use the text-image search model to search for images from the Unsplash Dataset - Lite dataset using text.
0. Prepare Dataset and Model¶
As mentioned in the ThanoSQL Workspace, you must create an API token and run the query below to execute the query of ThanoSQL.
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>
Prepare Dataset¶
%%thanosql
GET THANOSQL DATASET unsplash_data
OPTIONS (overwrite=True)
Success
Query Details
- "GET THANOSQL DATASET" downloads the specified dataset to the workspace.
- "OPTIONS" specifies the option values to be used for the GET THANOSQL DATASET clause.
- "overwrite": determines whether to overwrite a dataset if it already exists. If set as True, the old dataset is replaced with the new dataset (bool, optional, True|False, default: False)
%%thanosql
COPY unsplash_data
OPTIONS (if_exists='replace')
FROM 'thanosql-dataset/unsplash_data/unsplash.csv'
Success
Query Details
- "COPY" specifies the name of the dataset to be saved as a database table.
- "OPTIONS" specifies the option values to be used for the COPY clause.
- "if_exists": determines how the function should handle the case where the table already exists, it can either raise an error, append to the existing table, or replace the existing table (str, optional, 'fail'|'replace'|'append', default: 'fail')
Prepare the Model¶
%%thanosql
GET THANOSQL MODEL clip
OPTIONS (
model_name='tutorial_search_clip',
overwrite=True
)
Success
Query Details
- "GET THANOSQL MODEL" downloads the specified model to the workspace.
- "OPTIONS" specifies the option values to be used for the GET THANOSQL MODEL clause.
- "model_name": the model name to store a given model in the ThanoSQL workspace (str, optional)
- "overwrite": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)
1. Check Dataset¶
For this tutorial, we use the unsplash_data table located in the ThanoSQL workspace database. Run the query below to check the contents of the table.
%%thanosql
SELECT *
FROM unsplash_data
LIMIT 5
photo_id | image_path | photo_image_url | photo_description | ai_description | |
---|---|---|---|---|---|
0 | XMyPniM9LF0 | thanosql-dataset/unsplash_data/XMyPniM9LF0.jpg | https://images.unsplash.com/uploads/1411949294... | Woman exploring a forest | woman walking in the middle of forest |
1 | rDLBArZUl1c | thanosql-dataset/unsplash_data/rDLBArZUl1c.jpg | https://images.unsplash.com/photo-141633941111... | Succulents in a terrarium | succulent plants in clear glass terrarium |
2 | cNDGZ2sQ3Bo | thanosql-dataset/unsplash_data/cNDGZ2sQ3Bo.jpg | https://images.unsplash.com/photo-142014251503... | Rural winter mountainside | rocky mountain under gray sky at daytime |
3 | iuZ_D1eoq9k | thanosql-dataset/unsplash_data/iuZ_D1eoq9k.jpg | https://images.unsplash.com/photo-141487280988... | Poppy seeds and flowers | red common poppy flower selective focus phography |
4 | BeD3vjQ8SI0 | thanosql-dataset/unsplash_data/BeD3vjQ8SI0.jpg | https://images.unsplash.com/photo-141700759404... | Silhouette near dark trees | trees during night time |
Understanding the Data Table
The unsplash_data table contains the following information.
- photo_id: unique id column of the image.
- image_path: the name of the column that stores the image path.
- photo_image_url: the name of the column indicating the address of the original image in the unsplash website.
- photo_description: the name of the column that represents a short description of the image.
- ai_description: the name of the column that describes the image generated by AI.
%%thanosql
PRINT IMAGE
AS
SELECT image_path
FROM unsplash_data
LIMIT 5
/home/jovyan/thanosql-dataset/unsplash_data/XMyPniM9LF0.jpg
/home/jovyan/thanosql-dataset/unsplash_data/rDLBArZUl1c.jpg
/home/jovyan/thanosql-dataset/unsplash_data/cNDGZ2sQ3Bo.jpg
/home/jovyan/thanosql-dataset/unsplash_data/iuZ_D1eoq9k.jpg
/home/jovyan/thanosql-dataset/unsplash_data/BeD3vjQ8SI0.jpg
2. Convert Using a Pre-built Model¶
Notes
Because the text-image algorithm takes a long time to train and since it uses a pre-built model that used 400 million datasets to train, we omit the training process using the "BUILD MODEL" query in this tutorial. The tutorial_search_clip model named above utilizes a pre-built model that uses CLIPEn. When the "CONVERT USING" statement is executed, a user-defined column(default: convert_result) containing the vectorized images is created. When the "SEARCH IMAGE" statement is executed, a user-defined column(default: search_result) containing the similarities is created.
To vectorize the unsplash_data images, run the "CONVERT USING" query. Results are stored in the new convert_result column.
(Estimated duration of query execution: 3 min)
%%thanosql
CONVERT USING tutorial_search_clip
OPTIONS (
image_col='image_path',
convert_type='image',
batch_size=128,
result_col='convert_result'
)
AS
SELECT *
FROM unsplash_data
photo_id | image_path | photo_image_url | photo_description | ai_description | convert_result | |
---|---|---|---|---|---|---|
0 | XMyPniM9LF0 | thanosql-dataset/unsplash_data/XMyPniM9LF0.jpg | https://images.unsplash.com/uploads/1411949294... | Woman exploring a forest | woman walking in the middle of forest | [b'\xf4', b'\xc6', b'2', b'\xbe', b'\xb1', b'"... |
1 | rDLBArZUl1c | thanosql-dataset/unsplash_data/rDLBArZUl1c.jpg | https://images.unsplash.com/photo-141633941111... | Succulents in a terrarium | succulent plants in clear glass terrarium | [b'F', b'\x08', b'\xbf', b'\xbe', b'\xc5', b'\... |
2 | cNDGZ2sQ3Bo | thanosql-dataset/unsplash_data/cNDGZ2sQ3Bo.jpg | https://images.unsplash.com/photo-142014251503... | Rural winter mountainside | rocky mountain under gray sky at daytime | [b'G', b'\x07', b'\xb8', b'\xbe', b'C', b'\x93... |
3 | iuZ_D1eoq9k | thanosql-dataset/unsplash_data/iuZ_D1eoq9k.jpg | https://images.unsplash.com/photo-141487280988... | Poppy seeds and flowers | red common poppy flower selective focus phography | [b'H', b'\x19', b'\xae', b'<', b'=', b'\xbe', ... |
4 | BeD3vjQ8SI0 | thanosql-dataset/unsplash_data/BeD3vjQ8SI0.jpg | https://images.unsplash.com/photo-141700759404... | Silhouette near dark trees | trees during night time | [b'\xaa', b'\x8c', b'\x88', b'\xbe', b'\xbb', ... |
... | ... | ... | ... | ... | ... | ... |
24963 | c7OrOMxrurA | thanosql-dataset/unsplash_data/c7OrOMxrurA.jpg | https://images.unsplash.com/photo-159300793778... | None | black metal fence during daytime | [b'N', b'\x88', b'\n', b'\xbe', b'p', b'\xcf',... |
24964 | 15IuQ5a0Qwg | thanosql-dataset/unsplash_data/15IuQ5a0Qwg.jpg | https://images.unsplash.com/photo-159296761254... | Pearl earrings and seashells | white and brown seashell on white surface | [b':', b'/', b'\xa1', b'\xbe', b'\xf4', b'\xbb... |
24965 | w8nrcXz8pwk | thanosql-dataset/unsplash_data/w8nrcXz8pwk.jpg | https://images.unsplash.com/photo-159299937329... | None | leopard on brown tree trunk during daytime | [b'\x96', b'i', b'\x96', b'=', b'\xb6', b'\x96... |
24966 | n1jHrRhehUI | thanosql-dataset/unsplash_data/n1jHrRhehUI.jpg | https://images.unsplash.com/photo-159192792878... | Floral truck in the streets of Rome | woman in beige coat and white hat standing on ... | [b'\x82', b'\xf0', b'c', b'=', b'`', b'e', b'm... |
24967 | Ic74ACoaAX0 | thanosql-dataset/unsplash_data/Ic74ACoaAX0.jpg | https://images.unsplash.com/photo-159240763188... | None | green plants on brown rocky mountain under blu... | [b'U', b'\x19', b'%', b'\xbe', b'!', b'Y', b'+... |
24968 rows × 6 columns
Query Details
- "CONVERT USING" uses the tutorial_search_clip model as an algorithm for image vectorization.
- "OPTIONS" specifies the option values required for image vectorization.
- "image_col": the name of the column containing the image path (str, default: 'image_path')
- "convert_type": file type for vectorization (str, 'image'|'text', default: 'image')
- "batch_size": the size of dataset bundle utilized in a single cycle of training. The larger the number, the better the learning performance. However, considering the size of the memory, only 128 is used in this case (int, optional, default: 16)
- "result_col": defines the column name that contains the vectorized results (str, optional, default: 'convert_result')
Execute the "CONVERT USING" query statement below and save the converted result in a new table so that it can be used with other ThanoSQL query statements.
%%thanosql
CREATE TABLE unsplash_data_convert_en AS
SELECT * FROM (
CONVERT USING tutorial_search_clip
OPTIONS (
image_col='image_path',
convert_type='image',
batch_size=128,
result_col='convert_result'
)
AS
SELECT *
FROM unsplash_data
)
Success
3. Search¶
Perform a text-based image search using the "SEARCH IMAGE" query statement and the tutorial_search_clip model created. Execute the following query with the text value of "a black cat" and embedded unsplash_data images to calculate the similarity. The result values are saved into the newly added search_result column.
%%thanosql
SEARCH IMAGE
USING tutorial_search_clip
OPTIONS (
search_by='text',
search_input='a black cat',
emb_col='convert_result',
result_col='search_result'
)
AS
SELECT *
FROM unsplash_data_convert_en
photo_id | image_path | photo_image_url | photo_description | ai_description | convert_result | search_result | |
---|---|---|---|---|---|---|---|
0 | UMyfDjQ6Ep8 | thanosql-dataset/unsplash_data/UMyfDjQ6Ep8.jpg | https://images.unsplash.com/photo-157712719502... | None | black cat | [b'[', b'Z', b'\xfe', b'>', b'\x94', b'\x95', ... | 0.316560 |
1 | 7XJ3d0xK444 | thanosql-dataset/unsplash_data/7XJ3d0xK444.jpg | https://images.unsplash.com/photo-157217373317... | None | black cat | [b'\x9c', b'\xec', b'\x80', b'>', b'#', b'j', ... | 0.311931 |
2 | m8HsSWh-y6E | thanosql-dataset/unsplash_data/m8HsSWh-y6E.jpg | https://images.unsplash.com/photo-156855266009... | simon the kitty. | silver tabby cat | [b'\xff', b')', b'\xa1', b'>', b'O', b'\xe2', ... | 0.310819 |
3 | 6ST6S6i9IGM | thanosql-dataset/unsplash_data/6ST6S6i9IGM.jpg | https://images.unsplash.com/photo-1548620848-d... | The cutest black cat to wake up to on a Sunday... | close-up photography of bombay cat | [b'Z', b'`', b'x', b'>', b'\x83', b'E', b'\x15... | 0.310214 |
4 | aFyD5aWKu6k | thanosql-dataset/unsplash_data/aFyD5aWKu6k.jpg | https://images.unsplash.com/photo-157850934606... | None | black cat | [b'\xc6', b'\x97', b'V', b'>', b'\x0f', b'@', ... | 0.309158 |
... | ... | ... | ... | ... | ... | ... | ... |
995 | VQ41v-gnd1M | thanosql-dataset/unsplash_data/VQ41v-gnd1M.jpg | https://images.unsplash.com/photo-158956048611... | None | purple smoke in black background | [b'\xb7', b'\xba', b'\x16', b'>', b'G', b'l', ... | 0.221887 |
996 | AtSgtZcxZFc | thanosql-dataset/unsplash_data/AtSgtZcxZFc.jpg | https://images.unsplash.com/photo-150329107570... | In the Smoke of Thinking | None | [b'\xa8', b'\xa7', b'\xb3', b'\xbc', b'\xd4', ... | 0.221874 |
997 | XzOMokbcp0Q | thanosql-dataset/unsplash_data/XzOMokbcp0Q.jpg | https://images.unsplash.com/photo-157616182589... | None | green-leafed plant during daytime | [b'\xd8', b'\x94', b'\xc1', b'\xbd', b'T', b'\... | 0.221858 |
998 | aWcJuh1mUhc | thanosql-dataset/unsplash_data/aWcJuh1mUhc.jpg | https://images.unsplash.com/photo-1544460671-b... | None | brown tabby cat on bed | [b',', b'\x9a', b'Y', b'>', b'\xf4', b'\x93', ... | 0.221827 |
999 | Zs6T2rub2zw | thanosql-dataset/unsplash_data/Zs6T2rub2zw.jpg | https://images.unsplash.com/photo-158179166724... | None | green pine trees covered with snow | [b'?', b'\x8e', b'\x1c', b'\xbf', b'^', b'\xa4... | 0.221822 |
1000 rows × 7 columns
Query Details
- "SEARCH IMAGE" searches for images. Input the text description of the image using the "text" variable.
- "USING" specifies tutorial_search_clip as the model.
- "OPTIONS" specifies the options to be used for image searching.
- "search_by": defines the image|text|audio|video type to be used for the search (str)
- "search_input": defines the input to be used for the search (str)
- "emb_col": the column that contains the vectorized results (str)
- "result_col": defines the name of the column that contains the search results (str, optional, default: 'search_result')
Execute the query below to view the similarity of the five most similar images to 'a black cat'.
%%thanosql
SELECT image_path, search_result
FROM (
SEARCH IMAGE
USING tutorial_search_clip
OPTIONS (
search_by='text',
search_input='a black cat',
emb_col='convert_result',
result_col='search_result',
top_k=5
)
AS
SELECT *
FROM unsplash_data_convert_en
)
image_path | search_result | |
---|---|---|
0 | thanosql-dataset/unsplash_data/UMyfDjQ6Ep8.jpg | 0.316560 |
1 | thanosql-dataset/unsplash_data/7XJ3d0xK444.jpg | 0.311931 |
2 | thanosql-dataset/unsplash_data/m8HsSWh-y6E.jpg | 0.310819 |
3 | thanosql-dataset/unsplash_data/6ST6S6i9IGM.jpg | 0.310214 |
4 | thanosql-dataset/unsplash_data/aFyD5aWKu6k.jpg | 0.309158 |
Query Details
- "SEARCH IMAGE" calculates and returns the similarity between the input text and the image.
- The first "SELECT" returns the image_path column and the search_result column from the result nested query.
To immediately print out the resulting images, nest the previous query with a "PRINT" clause.
%%thanosql
PRINT IMAGE
AS (
SELECT image_path, search_result
FROM (
SEARCH IMAGE
USING tutorial_search_clip
OPTIONS (
search_by='text',
search_input='a black cat',
emb_col='convert_result',
result_col='search_result',
top_k=5
)
AS
SELECT *
FROM unsplash_data_convert_en
)
)
/home/jovyan/thanosql-dataset/unsplash_data/UMyfDjQ6Ep8.jpg
/home/jovyan/thanosql-dataset/unsplash_data/7XJ3d0xK444.jpg
/home/jovyan/thanosql-dataset/unsplash_data/m8HsSWh-y6E.jpg
/home/jovyan/thanosql-dataset/unsplash_data/6ST6S6i9IGM.jpg
/home/jovyan/thanosql-dataset/unsplash_data/aFyD5aWKu6k.jpg
Query Details
This query, combined with the query above, is made of three levels.
- "SELECT", in the first parentheses, produces the result of the step immediately above.
- "PRINT IMAGE" prints the results of the query as images.
%%thanosql
PRINT IMAGE
AS (
SELECT image_path, search_result
FROM (
SEARCH IMAGE
USING tutorial_search_clip
OPTIONS (
search_by='text',
search_input='a dog on a chair',
emb_col='convert_result',
result_col='search_result',
top_k=5
)
AS
SELECT *
FROM unsplash_data_convert_en
)
)
/home/jovyan/thanosql-dataset/unsplash_data/jZUr3AuI8io.jpg
/home/jovyan/thanosql-dataset/unsplash_data/nnKBZTWzlMQ.jpg
/home/jovyan/thanosql-dataset/unsplash_data/HG2KpOO0vSc.jpg
/home/jovyan/thanosql-dataset/unsplash_data/f6qFneRNwNI.jpg
/home/jovyan/thanosql-dataset/unsplash_data/GKY4WDO3QgY.jpg
%%thanosql
PRINT IMAGE
AS (
SELECT image_path, search_result
FROM (
SEARCH IMAGE
USING tutorial_search_clip
OPTIONS (
search_by='text',
search_input='gloomy photos',
emb_col='convert_result',
result_col='search_result',
top_k=5
)
AS
SELECT *
FROM unsplash_data_convert_en
)
)
/home/jovyan/thanosql-dataset/unsplash_data/Xo4vJrtrmmA.jpg
/home/jovyan/thanosql-dataset/unsplash_data/QheWOfwEUio.jpg
/home/jovyan/thanosql-dataset/unsplash_data/_zHYUQmWrzk.jpg
/home/jovyan/thanosql-dataset/unsplash_data/Tu_lH5CFFZw.jpg
/home/jovyan/thanosql-dataset/unsplash_data/DfYPBHaOR04.jpg
%%thanosql
PRINT IMAGE
AS (
SELECT image_path, search_result
FROM (
SEARCH IMAGE
USING tutorial_search_clip
OPTIONS (
search_by='text',
search_input='the feeling when your program finally works',
emb_col='convert_result',
result_col='search_result',
top_k=5
)
AS
SELECT *
FROM unsplash_data_convert_en
)
)
/home/jovyan/thanosql-dataset/unsplash_data/nDLYtRqJtMw.jpg
/home/jovyan/thanosql-dataset/unsplash_data/qNJpGSCv_Jc.jpg
/home/jovyan/thanosql-dataset/unsplash_data/Yb5OBk-OxJY.jpg
/home/jovyan/thanosql-dataset/unsplash_data/6etH6346MHE.jpg
/home/jovyan/thanosql-dataset/unsplash_data/7GX5aICb5i4.jpg
4. In Conclusion¶
In this tutorial, we searched for images in the unsplash dataset by text using a multi-modal text/image vectorization model. As this is a beginner-level tutorial, we focused on the process and showing visible results rather than accuracy. The image search can retrieve more accurate results by utilizing various queries.
- How to Upload My Data to the ThanoSQL Workspace
- How to Create a Table Using My Data
- How to Upload My Model to the ThanoSQL Workspace
Inquiries About Deploying a Model for Your Own Service
If you have any difficulties creating your own model using ThanoSQL or applying it to your services, please feel free to contact us below😊
For inquiries regarding building a text-image search models: contact@smartmind.team