Search Image by Text¶

Tutorial Difficulty: ★★☆☆☆
7 min read
Languages: SQL (100%)
File location: tutorial_en/thanosql_search/search_image_by_text.ipynb
References: Unsplash Dataset - Lite, Learning Transferable Visual Models From Natural Language Supervision

Tutorial Introduction¶

Understanding Vectorizaion

For computers to understand human language, it must be vectorized. Recently, studies on pre-built models such as BERT and GPT-3 have been actively carried out, showing remarkable results. These models identify the meaning of each sentence based on Self-Supervised Learning, and sentences with similar meanings are vectorized and placed close to each other in a low-dimensional space. Self-supervised learning allows learning without labeling by determining whether each sentence/context is true/false. It randomly shuffles the order between sentences or masks some words.

The handling of different forms of input, such as texts and images, together is called multi-modal. "CLIP: Connecting Text and Image" utilizes a multi-modal model to understand low-dimensional vectors. While previous models only trained features of the image, the multi-modal model can train both images and texts. It can even learn the features of the texts that describe the images. In addition, by placing texts and images together in a low-dimensional space, the similarity between texts and images can be calculated. From this, a search algorithm can be derived.

ThanoSQL uses machine learning algorithms to vectorize datasets. The vectorized data is stored in a database column, and is used to search for similar images. This can be done by calculating the similarity scores.

The following are examples and applications of the ThanoSQL text-image search algorithm.

Describe the desired scene of an image or video you have with text and search for the image that is the most similar to it. Using text-based descriptions rather than the keywords of a product, search for the most similar product images.
Search for a timestamp in a Youtube video to place an advertisement in. Easily search for mountain or camping scenes to insert your travel advertisement.

In This Tutorial

👉 Unsplash released images taken by more than 200,000 photographers for free to be used as an AI dataset. Unsplash Dataset - Lite consists of 25,000 nature-themed images with 25,000 keywords.

In this tutorial, we will use the text-image search model to search for images from the Unsplash Dataset - Lite dataset using text.

0. Prepare Dataset and Model¶

As mentioned in the ThanoSQL Workspace, you must create an API token and run the query below to execute the query of ThanoSQL.

In [ ]:

            
                Copied!
                
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>
%load_ext thanosql
%thanosql API_TOKEN=

Prepare Dataset¶

In [4]:

            
                Copied!
                
%%thanosql
GET THANOSQL DATASET unsplash_data
OPTIONS (overwrite=True)
%%thanosql
GET THANOSQL DATASET unsplash_data
OPTIONS (overwrite=True)

Success

Query Details

"GET THANOSQL DATASET" downloads the specified dataset to the workspace.
"OPTIONS" specifies the option values to be used for the GET THANOSQL DATASET clause.
- "overwrite": determines whether to overwrite a dataset if it already exists. If set as True, the old dataset is replaced with the new dataset (bool, optional, True|False, default: False)

In [5]:

            
                Copied!
                
%%thanosql
COPY unsplash_data 
OPTIONS (if_exists='replace')
FROM 'thanosql-dataset/unsplash_data/unsplash.csv'
%%thanosql
COPY unsplash_data 
OPTIONS (if_exists='replace')
FROM 'thanosql-dataset/unsplash_data/unsplash.csv'

Success

Query Details

"COPY" specifies the name of the dataset to be saved as a database table.
"OPTIONS" specifies the option values to be used for the COPY clause.
- "if_exists": determines how the function should handle the case where the table already exists, it can either raise an error, append to the existing table, or replace the existing table (str, optional, 'fail'|'replace'|'append', default: 'fail')

Prepare the Model¶

In [6]:

            
                Copied!
                
                    
                    
                
                

        
%%thanosql
GET THANOSQL MODEL clip
OPTIONS (
    model_name='tutorial_search_clip',
    overwrite=True
    )
%%thanosql
GET THANOSQL MODEL clip
OPTIONS (
    model_name='tutorial_search_clip',
    overwrite=True
    )

Success

Query Details

"GET THANOSQL MODEL" downloads the specified model to the workspace.
"OPTIONS" specifies the option values to be used for the GET THANOSQL MODEL clause.
- "model_name": the model name to store a given model in the ThanoSQL workspace (str, optional)
- "overwrite": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)

1. Check Dataset¶

For this tutorial, we use the unsplash_data table located in the ThanoSQL workspace database. Run the query below to check the contents of the table.

In [7]:

            
                Copied!
                
%%thanosql
SELECT *
FROM unsplash_data
LIMIT 5
%%thanosql
SELECT *
FROM unsplash_data
LIMIT 5

Out[7]:

	photo_id	image_path	photo_image_url	photo_description	ai_description
0	XMyPniM9LF0	thanosql-dataset/unsplash_data/XMyPniM9LF0.jpg	https://images.unsplash.com/uploads/1411949294...	Woman exploring a forest	woman walking in the middle of forest
1	rDLBArZUl1c	thanosql-dataset/unsplash_data/rDLBArZUl1c.jpg	https://images.unsplash.com/photo-141633941111...	Succulents in a terrarium	succulent plants in clear glass terrarium
2	cNDGZ2sQ3Bo	thanosql-dataset/unsplash_data/cNDGZ2sQ3Bo.jpg	https://images.unsplash.com/photo-142014251503...	Rural winter mountainside	rocky mountain under gray sky at daytime
3	iuZ_D1eoq9k	thanosql-dataset/unsplash_data/iuZ_D1eoq9k.jpg	https://images.unsplash.com/photo-141487280988...	Poppy seeds and flowers	red common poppy flower selective focus phography
4	BeD3vjQ8SI0	thanosql-dataset/unsplash_data/BeD3vjQ8SI0.jpg	https://images.unsplash.com/photo-141700759404...	Silhouette near dark trees	trees during night time

Understanding the Data Table

The unsplash_data table contains the following information.

photo_id: unique id column of the image.
image_path: the name of the column that stores the image path.
photo_image_url: the name of the column indicating the address of the original image in the unsplash website.
photo_description: the name of the column that represents a short description of the image.
ai_description: the name of the column that describes the image generated by AI.

In [8]:

            
                Copied!
                
                    
                    
                
                

        
%%thanosql
PRINT IMAGE 
AS
SELECT image_path 
FROM unsplash_data 
LIMIT 5
%%thanosql
PRINT IMAGE 
AS
SELECT image_path 
FROM unsplash_data 
LIMIT 5

/home/jovyan/thanosql-dataset/unsplash_data/XMyPniM9LF0.jpg

/home/jovyan/thanosql-dataset/unsplash_data/rDLBArZUl1c.jpg

/home/jovyan/thanosql-dataset/unsplash_data/cNDGZ2sQ3Bo.jpg

/home/jovyan/thanosql-dataset/unsplash_data/iuZ_D1eoq9k.jpg

/home/jovyan/thanosql-dataset/unsplash_data/BeD3vjQ8SI0.jpg

2. Convert Using a Pre-built Model¶

Notes

Because the text-image algorithm takes a long time to train and since it uses a pre-built model that used 400 million datasets to train, we omit the training process using the "BUILD MODEL" query in this tutorial. The tutorial_search_clip model named above utilizes a pre-built model that uses CLIPEn. When the "CONVERT USING" statement is executed, a user-defined column(default: convert_result) containing the vectorized images is created. When the "SEARCH IMAGE" statement is executed, a user-defined column(default: search_result) containing the similarities is created.

To vectorize the unsplash_data images, run the "CONVERT USING" query. Results are stored in the new convert_result column.
(Estimated duration of query execution: 3 min)

In [9]:

            
                Copied!
                
                    
                    
                
                

        
%%thanosql
CONVERT USING tutorial_search_clip
OPTIONS (
    image_col='image_path', 
    convert_type='image',
    batch_size=128,
    result_col='convert_result'
    )
AS 
SELECT *
FROM unsplash_data
%%thanosql
CONVERT USING tutorial_search_clip
OPTIONS (
    image_col='image_path', 
    convert_type='image',
    batch_size=128,
    result_col='convert_result'
    )
AS 
SELECT *
FROM unsplash_data

Out[9]:

	photo_id	image_path	photo_image_url	photo_description	ai_description	convert_result
0	XMyPniM9LF0	thanosql-dataset/unsplash_data/XMyPniM9LF0.jpg	https://images.unsplash.com/uploads/1411949294...	Woman exploring a forest	woman walking in the middle of forest	[b'\xf4', b'\xc6', b'2', b'\xbe', b'\xb1', b'"...
1	rDLBArZUl1c	thanosql-dataset/unsplash_data/rDLBArZUl1c.jpg	https://images.unsplash.com/photo-141633941111...	Succulents in a terrarium	succulent plants in clear glass terrarium	[b'F', b'\x08', b'\xbf', b'\xbe', b'\xc5', b'\...
2	cNDGZ2sQ3Bo	thanosql-dataset/unsplash_data/cNDGZ2sQ3Bo.jpg	https://images.unsplash.com/photo-142014251503...	Rural winter mountainside	rocky mountain under gray sky at daytime	[b'G', b'\x07', b'\xb8', b'\xbe', b'C', b'\x93...
3	iuZ_D1eoq9k	thanosql-dataset/unsplash_data/iuZ_D1eoq9k.jpg	https://images.unsplash.com/photo-141487280988...	Poppy seeds and flowers	red common poppy flower selective focus phography	[b'H', b'\x19', b'\xae', b'<', b'=', b'\xbe', ...
4	BeD3vjQ8SI0	thanosql-dataset/unsplash_data/BeD3vjQ8SI0.jpg	https://images.unsplash.com/photo-141700759404...	Silhouette near dark trees	trees during night time	[b'\xaa', b'\x8c', b'\x88', b'\xbe', b'\xbb', ...
...	...	...	...	...	...	...
24963	c7OrOMxrurA	thanosql-dataset/unsplash_data/c7OrOMxrurA.jpg	https://images.unsplash.com/photo-159300793778...	None	black metal fence during daytime	[b'N', b'\x88', b'\n', b'\xbe', b'p', b'\xcf',...
24964	15IuQ5a0Qwg	thanosql-dataset/unsplash_data/15IuQ5a0Qwg.jpg	https://images.unsplash.com/photo-159296761254...	Pearl earrings and seashells	white and brown seashell on white surface	[b':', b'/', b'\xa1', b'\xbe', b'\xf4', b'\xbb...
24965	w8nrcXz8pwk	thanosql-dataset/unsplash_data/w8nrcXz8pwk.jpg	https://images.unsplash.com/photo-159299937329...	None	leopard on brown tree trunk during daytime	[b'\x96', b'i', b'\x96', b'=', b'\xb6', b'\x96...
24966	n1jHrRhehUI	thanosql-dataset/unsplash_data/n1jHrRhehUI.jpg	https://images.unsplash.com/photo-159192792878...	Floral truck in the streets of Rome	woman in beige coat and white hat standing on ...	[b'\x82', b'\xf0', b'c', b'=', b'`', b'e', b'm...
24967	Ic74ACoaAX0	thanosql-dataset/unsplash_data/Ic74ACoaAX0.jpg	https://images.unsplash.com/photo-159240763188...	None	green plants on brown rocky mountain under blu...	[b'U', b'\x19', b'%', b'\xbe', b'!', b'Y', b'+...

24968 rows × 6 columns

Query Details

"CONVERT USING" uses the tutorial_search_clip model as an algorithm for image vectorization.
"OPTIONS" specifies the option values required for image vectorization.
- "image_col": the name of the column containing the image path (str, default: 'image_path')
- "convert_type": file type for vectorization (str, 'image'|'text', default: 'image')
- "batch_size": the size of dataset bundle utilized in a single cycle of training. The larger the number, the better the learning performance. However, considering the size of the memory, only 128 is used in this case (int, optional, default: 16)
- "result_col": defines the column name that contains the vectorized results (str, optional, default: 'convert_result')

Execute the "CONVERT USING" query statement below and save the converted result in a new table so that it can be used with other ThanoSQL query statements.

In [11]:

            
                Copied!
                
                    
                    
                
                

        
%%thanosql
CREATE TABLE unsplash_data_convert_en AS 
SELECT * FROM ( 
    CONVERT USING tutorial_search_clip
    OPTIONS (
        image_col='image_path', 
        convert_type='image',
        batch_size=128,
        result_col='convert_result'
        )
    AS 
    SELECT *
    FROM unsplash_data
)
%%thanosql
CREATE TABLE unsplash_data_convert_en AS 
SELECT * FROM ( 
    CONVERT USING tutorial_search_clip
    OPTIONS (
        image_col='image_path', 
        convert_type='image',
        batch_size=128,
        result_col='convert_result'
        )
    AS 
    SELECT *
    FROM unsplash_data
)

Success

3. Search¶

Perform a text-based image search using the "SEARCH IMAGE" query statement and the tutorial_search_clip model created. Execute the following query with the text value of "a black cat" and embedded unsplash_data images to calculate the similarity. The result values are saved into the newly added search_result column.

In [12]:

            
                Copied!
                
                    
                    
                
                

        
%%thanosql
SEARCH IMAGE 
USING tutorial_search_clip
OPTIONS (
    search_by='text',
    search_input='a black cat',
    emb_col='convert_result',
    result_col='search_result'
    )
AS 
SELECT * 
FROM unsplash_data_convert_en
%%thanosql
SEARCH IMAGE 
USING tutorial_search_clip
OPTIONS (
    search_by='text',
    search_input='a black cat',
    emb_col='convert_result',
    result_col='search_result'
    )
AS 
SELECT * 
FROM unsplash_data_convert_en

Out[12]:

	photo_id	image_path	photo_image_url	photo_description	ai_description	convert_result	search_result
0	UMyfDjQ6Ep8	thanosql-dataset/unsplash_data/UMyfDjQ6Ep8.jpg	https://images.unsplash.com/photo-157712719502...	None	black cat	[b'[', b'Z', b'\xfe', b'>', b'\x94', b'\x95', ...	0.316560
1	7XJ3d0xK444	thanosql-dataset/unsplash_data/7XJ3d0xK444.jpg	https://images.unsplash.com/photo-157217373317...	None	black cat	[b'\x9c', b'\xec', b'\x80', b'>', b'#', b'j', ...	0.311931
2	m8HsSWh-y6E	thanosql-dataset/unsplash_data/m8HsSWh-y6E.jpg	https://images.unsplash.com/photo-156855266009...	simon the kitty.	silver tabby cat	[b'\xff', b')', b'\xa1', b'>', b'O', b'\xe2', ...	0.310819
3	6ST6S6i9IGM	thanosql-dataset/unsplash_data/6ST6S6i9IGM.jpg	https://images.unsplash.com/photo-1548620848-d...	The cutest black cat to wake up to on a Sunday...	close-up photography of bombay cat	[b'Z', b'`', b'x', b'>', b'\x83', b'E', b'\x15...	0.310214
4	aFyD5aWKu6k	thanosql-dataset/unsplash_data/aFyD5aWKu6k.jpg	https://images.unsplash.com/photo-157850934606...	None	black cat	[b'\xc6', b'\x97', b'V', b'>', b'\x0f', b'@', ...	0.309158
...	...	...	...	...	...	...	...
995	VQ41v-gnd1M	thanosql-dataset/unsplash_data/VQ41v-gnd1M.jpg	https://images.unsplash.com/photo-158956048611...	None	purple smoke in black background	[b'\xb7', b'\xba', b'\x16', b'>', b'G', b'l', ...	0.221887
996	AtSgtZcxZFc	thanosql-dataset/unsplash_data/AtSgtZcxZFc.jpg	https://images.unsplash.com/photo-150329107570...	In the Smoke of Thinking	None	[b'\xa8', b'\xa7', b'\xb3', b'\xbc', b'\xd4', ...	0.221874
997	XzOMokbcp0Q	thanosql-dataset/unsplash_data/XzOMokbcp0Q.jpg	https://images.unsplash.com/photo-157616182589...	None	green-leafed plant during daytime	[b'\xd8', b'\x94', b'\xc1', b'\xbd', b'T', b'\...	0.221858
998	aWcJuh1mUhc	thanosql-dataset/unsplash_data/aWcJuh1mUhc.jpg	https://images.unsplash.com/photo-1544460671-b...	None	brown tabby cat on bed	[b',', b'\x9a', b'Y', b'>', b'\xf4', b'\x93', ...	0.221827
999	Zs6T2rub2zw	thanosql-dataset/unsplash_data/Zs6T2rub2zw.jpg	https://images.unsplash.com/photo-158179166724...	None	green pine trees covered with snow	[b'?', b'\x8e', b'\x1c', b'\xbf', b'^', b'\xa4...	0.221822

1000 rows × 7 columns

Query Details

"SEARCH IMAGE" searches for images. Input the text description of the image using the "text" variable.
"USING" specifies tutorial_search_clip as the model.
"OPTIONS" specifies the options to be used for image searching.
- "search_by": defines the image|text|audio|video type to be used for the search (str)
- "search_input": defines the input to be used for the search (str)
- "emb_col": the column that contains the vectorized results (str)
- "result_col": defines the name of the column that contains the search results (str, optional, default: 'search_result')

Execute the query below to view the similarity of the five most similar images to 'a black cat'.

In [13]:

            
                Copied!
                
                    
                    
                
                

        
%%thanosql
SELECT image_path, search_result 
FROM (
    SEARCH IMAGE 
    USING tutorial_search_clip
    OPTIONS (
        search_by='text',
        search_input='a black cat',
        emb_col='convert_result',
        result_col='search_result',
        top_k=5
        )
    AS 
    SELECT * 
    FROM unsplash_data_convert_en
    )
%%thanosql
SELECT image_path, search_result 
FROM (
    SEARCH IMAGE 
    USING tutorial_search_clip
    OPTIONS (
        search_by='text',
        search_input='a black cat',
        emb_col='convert_result',
        result_col='search_result',
        top_k=5
        )
    AS 
    SELECT * 
    FROM unsplash_data_convert_en
    )

Out[13]:

	image_path	search_result
0	thanosql-dataset/unsplash_data/UMyfDjQ6Ep8.jpg	0.316560
1	thanosql-dataset/unsplash_data/7XJ3d0xK444.jpg	0.311931
2	thanosql-dataset/unsplash_data/m8HsSWh-y6E.jpg	0.310819
3	thanosql-dataset/unsplash_data/6ST6S6i9IGM.jpg	0.310214
4	thanosql-dataset/unsplash_data/aFyD5aWKu6k.jpg	0.309158

Query Details

"SEARCH IMAGE" calculates and returns the similarity between the input text and the image.
The first "SELECT" returns the image_path column and the search_result column from the result nested query.

To immediately print out the resulting images, nest the previous query with a "PRINT" clause.

In [14]:

            
                Copied!
                
                    
                    
                
                

        
%%thanosql
PRINT IMAGE 
AS (
    SELECT image_path, search_result 
    FROM (
        SEARCH IMAGE 
        USING tutorial_search_clip
        OPTIONS (
            search_by='text',
            search_input='a black cat',
            emb_col='convert_result',
            result_col='search_result',
            top_k=5
            )
        AS 
        SELECT * 
        FROM unsplash_data_convert_en
        )
    )
%%thanosql
PRINT IMAGE 
AS (
    SELECT image_path, search_result 
    FROM (
        SEARCH IMAGE 
        USING tutorial_search_clip
        OPTIONS (
            search_by='text',
            search_input='a black cat',
            emb_col='convert_result',
            result_col='search_result',
            top_k=5
            )
        AS 
        SELECT * 
        FROM unsplash_data_convert_en
        )
    )

/home/jovyan/thanosql-dataset/unsplash_data/UMyfDjQ6Ep8.jpg

/home/jovyan/thanosql-dataset/unsplash_data/7XJ3d0xK444.jpg

/home/jovyan/thanosql-dataset/unsplash_data/m8HsSWh-y6E.jpg

/home/jovyan/thanosql-dataset/unsplash_data/6ST6S6i9IGM.jpg

/home/jovyan/thanosql-dataset/unsplash_data/aFyD5aWKu6k.jpg

Query Details

This query, combined with the query above, is made of three levels.

"SELECT", in the first parentheses, produces the result of the step immediately above.
"PRINT IMAGE" prints the results of the query as images.

In [15]:

            
                Copied!
                
                    
                    
                
                

        
%%thanosql
PRINT IMAGE 
AS (
    SELECT image_path, search_result 
    FROM (
        SEARCH IMAGE 
        USING tutorial_search_clip
        OPTIONS (
            search_by='text',
            search_input='a dog on a chair',
            emb_col='convert_result',
            result_col='search_result',
            top_k=5
            )
        AS 
        SELECT * 
        FROM unsplash_data_convert_en
        )
    )
%%thanosql
PRINT IMAGE 
AS (
    SELECT image_path, search_result 
    FROM (
        SEARCH IMAGE 
        USING tutorial_search_clip
        OPTIONS (
            search_by='text',
            search_input='a dog on a chair',
            emb_col='convert_result',
            result_col='search_result',
            top_k=5
            )
        AS 
        SELECT * 
        FROM unsplash_data_convert_en
        )
    )

/home/jovyan/thanosql-dataset/unsplash_data/jZUr3AuI8io.jpg

/home/jovyan/thanosql-dataset/unsplash_data/nnKBZTWzlMQ.jpg

/home/jovyan/thanosql-dataset/unsplash_data/HG2KpOO0vSc.jpg

/home/jovyan/thanosql-dataset/unsplash_data/f6qFneRNwNI.jpg

/home/jovyan/thanosql-dataset/unsplash_data/GKY4WDO3QgY.jpg

In [16]:

            
                Copied!
                
                    
                    
                
                

        
%%thanosql
PRINT IMAGE 
AS (
    SELECT image_path, search_result 
    FROM (
        SEARCH IMAGE 
        USING tutorial_search_clip
        OPTIONS (
            search_by='text',
            search_input='gloomy photos',
            emb_col='convert_result',
            result_col='search_result',
            top_k=5
            )
        AS 
        SELECT * 
        FROM unsplash_data_convert_en
        )
    )
%%thanosql
PRINT IMAGE 
AS (
    SELECT image_path, search_result 
    FROM (
        SEARCH IMAGE 
        USING tutorial_search_clip
        OPTIONS (
            search_by='text',
            search_input='gloomy photos',
            emb_col='convert_result',
            result_col='search_result',
            top_k=5
            )
        AS 
        SELECT * 
        FROM unsplash_data_convert_en
        )
    )

/home/jovyan/thanosql-dataset/unsplash_data/Xo4vJrtrmmA.jpg

/home/jovyan/thanosql-dataset/unsplash_data/QheWOfwEUio.jpg

/home/jovyan/thanosql-dataset/unsplash_data/_zHYUQmWrzk.jpg

/home/jovyan/thanosql-dataset/unsplash_data/Tu_lH5CFFZw.jpg

/home/jovyan/thanosql-dataset/unsplash_data/DfYPBHaOR04.jpg

In [17]:

            
                Copied!
                
                    
                    
                
                

        
%%thanosql
PRINT IMAGE 
AS (
    SELECT image_path, search_result 
    FROM (
        SEARCH IMAGE 
        USING tutorial_search_clip
        OPTIONS (
            search_by='text',
            search_input='the feeling when your program finally works',
            emb_col='convert_result',
            result_col='search_result',
            top_k=5
            )
        AS 
        SELECT * 
        FROM unsplash_data_convert_en
        )
    )
%%thanosql
PRINT IMAGE 
AS (
    SELECT image_path, search_result 
    FROM (
        SEARCH IMAGE 
        USING tutorial_search_clip
        OPTIONS (
            search_by='text',
            search_input='the feeling when your program finally works',
            emb_col='convert_result',
            result_col='search_result',
            top_k=5
            )
        AS 
        SELECT * 
        FROM unsplash_data_convert_en
        )
    )

/home/jovyan/thanosql-dataset/unsplash_data/nDLYtRqJtMw.jpg

/home/jovyan/thanosql-dataset/unsplash_data/qNJpGSCv_Jc.jpg

/home/jovyan/thanosql-dataset/unsplash_data/Yb5OBk-OxJY.jpg

/home/jovyan/thanosql-dataset/unsplash_data/6etH6346MHE.jpg

/home/jovyan/thanosql-dataset/unsplash_data/7GX5aICb5i4.jpg

4. In Conclusion¶

In this tutorial, we searched for images in the unsplash dataset by text using a multi-modal text/image vectorization model. As this is a beginner-level tutorial, we focused on the process and showing visible results rather than accuracy. The image search can retrieve more accurate results by utilizing various queries.

Inquiries About Deploying a Model for Your Own Service

If you have any difficulties creating your own model using ThanoSQL or applying it to your services, please feel free to contact us below😊

For inquiries regarding building a text-image search models: contact@smartmind.team

Last update: 2023-08-31