{ "cells": [ { "cell_type": "markdown", "id": "7ad7df84", "metadata": {}, "source": [ "# __Search Image by Text__" ] }, { "cell_type": "markdown", "id": "d01de980", "metadata": {}, "source": [ "- Tutorial Difficulty: ★★☆☆☆\n", "- 7 min read\n", "- Languages: [SQL](https://en.wikipedia.org/wiki/SQL) (100%)\n", "- File location: tutorial_en/thanosql_search/search_image_by_text.ipynb\n", "- References: [Unsplash Dataset - Lite](https://unsplash.com/data), [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)" ] }, { "cell_type": "markdown", "id": "b5e19c02", "metadata": {}, "source": [ "## Tutorial Introduction\n", "\n", "
For computers to understand human language, it must be vectorized. Recently, studies on pre-built models such as BERT and GPT-3 have been actively carried out, showing remarkable results. These models identify the meaning of each sentence based on Self-Supervised Learning, and sentences with similar meanings are vectorized and placed close to each other in a low-dimensional space. Self-supervised learning allows learning without labeling by determining whether each sentence/context is true/false. It randomly shuffles the order between sentences or masks some words.
\n", "👉 Unsplash released images taken by more than 200,000 photographers for free to be used as an AI dataset. Unsplash Dataset - Lite consists of 25,000 nature-themed images with 25,000 keywords.
\n", "\n", " | photo_id | \n", "image_path | \n", "photo_image_url | \n", "photo_description | \n", "ai_description | \n", "
---|---|---|---|---|---|
0 | \n", "XMyPniM9LF0 | \n", "thanosql-dataset/unsplash_data/XMyPniM9LF0.jpg | \n", "https://images.unsplash.com/uploads/1411949294... | \n", "Woman exploring a forest | \n", "woman walking in the middle of forest | \n", "
1 | \n", "rDLBArZUl1c | \n", "thanosql-dataset/unsplash_data/rDLBArZUl1c.jpg | \n", "https://images.unsplash.com/photo-141633941111... | \n", "Succulents in a terrarium | \n", "succulent plants in clear glass terrarium | \n", "
2 | \n", "cNDGZ2sQ3Bo | \n", "thanosql-dataset/unsplash_data/cNDGZ2sQ3Bo.jpg | \n", "https://images.unsplash.com/photo-142014251503... | \n", "Rural winter mountainside | \n", "rocky mountain under gray sky at daytime | \n", "
3 | \n", "iuZ_D1eoq9k | \n", "thanosql-dataset/unsplash_data/iuZ_D1eoq9k.jpg | \n", "https://images.unsplash.com/photo-141487280988... | \n", "Poppy seeds and flowers | \n", "red common poppy flower selective focus phography | \n", "
4 | \n", "BeD3vjQ8SI0 | \n", "thanosql-dataset/unsplash_data/BeD3vjQ8SI0.jpg | \n", "https://images.unsplash.com/photo-141700759404... | \n", "Silhouette near dark trees | \n", "trees during night time | \n", "
The unsplash_data table contains the following information.
\n", "Because the text-image algorithm takes a long time to train and since it uses a pre-built model that used 400 million datasets to train, we omit the training process using the \"BUILD MODEL\" query in this tutorial. The tutorial_search_clip model named above utilizes a pre-built model that uses CLIPEn. When the \"CONVERT USING\" statement is executed, a user-defined column(default: convert_result) containing the vectorized images is created. When the \"SEARCH IMAGE\" statement is executed, a user-defined column(default: search_result) containing the similarities is created.\n", "
\n", "\n", " | photo_id | \n", "image_path | \n", "photo_image_url | \n", "photo_description | \n", "ai_description | \n", "convert_result | \n", "
---|---|---|---|---|---|---|
0 | \n", "XMyPniM9LF0 | \n", "thanosql-dataset/unsplash_data/XMyPniM9LF0.jpg | \n", "https://images.unsplash.com/uploads/1411949294... | \n", "Woman exploring a forest | \n", "woman walking in the middle of forest | \n", "[b'\\xf4', b'\\xc6', b'2', b'\\xbe', b'\\xb1', b'\"... | \n", "
1 | \n", "rDLBArZUl1c | \n", "thanosql-dataset/unsplash_data/rDLBArZUl1c.jpg | \n", "https://images.unsplash.com/photo-141633941111... | \n", "Succulents in a terrarium | \n", "succulent plants in clear glass terrarium | \n", "[b'F', b'\\x08', b'\\xbf', b'\\xbe', b'\\xc5', b'\\... | \n", "
2 | \n", "cNDGZ2sQ3Bo | \n", "thanosql-dataset/unsplash_data/cNDGZ2sQ3Bo.jpg | \n", "https://images.unsplash.com/photo-142014251503... | \n", "Rural winter mountainside | \n", "rocky mountain under gray sky at daytime | \n", "[b'G', b'\\x07', b'\\xb8', b'\\xbe', b'C', b'\\x93... | \n", "
3 | \n", "iuZ_D1eoq9k | \n", "thanosql-dataset/unsplash_data/iuZ_D1eoq9k.jpg | \n", "https://images.unsplash.com/photo-141487280988... | \n", "Poppy seeds and flowers | \n", "red common poppy flower selective focus phography | \n", "[b'H', b'\\x19', b'\\xae', b'<', b'=', b'\\xbe', ... | \n", "
4 | \n", "BeD3vjQ8SI0 | \n", "thanosql-dataset/unsplash_data/BeD3vjQ8SI0.jpg | \n", "https://images.unsplash.com/photo-141700759404... | \n", "Silhouette near dark trees | \n", "trees during night time | \n", "[b'\\xaa', b'\\x8c', b'\\x88', b'\\xbe', b'\\xbb', ... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
24963 | \n", "c7OrOMxrurA | \n", "thanosql-dataset/unsplash_data/c7OrOMxrurA.jpg | \n", "https://images.unsplash.com/photo-159300793778... | \n", "None | \n", "black metal fence during daytime | \n", "[b'N', b'\\x88', b'\\n', b'\\xbe', b'p', b'\\xcf',... | \n", "
24964 | \n", "15IuQ5a0Qwg | \n", "thanosql-dataset/unsplash_data/15IuQ5a0Qwg.jpg | \n", "https://images.unsplash.com/photo-159296761254... | \n", "Pearl earrings and seashells | \n", "white and brown seashell on white surface | \n", "[b':', b'/', b'\\xa1', b'\\xbe', b'\\xf4', b'\\xbb... | \n", "
24965 | \n", "w8nrcXz8pwk | \n", "thanosql-dataset/unsplash_data/w8nrcXz8pwk.jpg | \n", "https://images.unsplash.com/photo-159299937329... | \n", "None | \n", "leopard on brown tree trunk during daytime | \n", "[b'\\x96', b'i', b'\\x96', b'=', b'\\xb6', b'\\x96... | \n", "
24966 | \n", "n1jHrRhehUI | \n", "thanosql-dataset/unsplash_data/n1jHrRhehUI.jpg | \n", "https://images.unsplash.com/photo-159192792878... | \n", "Floral truck in the streets of Rome | \n", "woman in beige coat and white hat standing on ... | \n", "[b'\\x82', b'\\xf0', b'c', b'=', b'`', b'e', b'm... | \n", "
24967 | \n", "Ic74ACoaAX0 | \n", "thanosql-dataset/unsplash_data/Ic74ACoaAX0.jpg | \n", "https://images.unsplash.com/photo-159240763188... | \n", "None | \n", "green plants on brown rocky mountain under blu... | \n", "[b'U', b'\\x19', b'%', b'\\xbe', b'!', b'Y', b'+... | \n", "
24968 rows × 6 columns
\n", "\n", " | photo_id | \n", "image_path | \n", "photo_image_url | \n", "photo_description | \n", "ai_description | \n", "convert_result | \n", "search_result | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "UMyfDjQ6Ep8 | \n", "thanosql-dataset/unsplash_data/UMyfDjQ6Ep8.jpg | \n", "https://images.unsplash.com/photo-157712719502... | \n", "None | \n", "black cat | \n", "[b'[', b'Z', b'\\xfe', b'>', b'\\x94', b'\\x95', ... | \n", "0.316560 | \n", "
1 | \n", "7XJ3d0xK444 | \n", "thanosql-dataset/unsplash_data/7XJ3d0xK444.jpg | \n", "https://images.unsplash.com/photo-157217373317... | \n", "None | \n", "black cat | \n", "[b'\\x9c', b'\\xec', b'\\x80', b'>', b'#', b'j', ... | \n", "0.311931 | \n", "
2 | \n", "m8HsSWh-y6E | \n", "thanosql-dataset/unsplash_data/m8HsSWh-y6E.jpg | \n", "https://images.unsplash.com/photo-156855266009... | \n", "simon the kitty. | \n", "silver tabby cat | \n", "[b'\\xff', b')', b'\\xa1', b'>', b'O', b'\\xe2', ... | \n", "0.310819 | \n", "
3 | \n", "6ST6S6i9IGM | \n", "thanosql-dataset/unsplash_data/6ST6S6i9IGM.jpg | \n", "https://images.unsplash.com/photo-1548620848-d... | \n", "The cutest black cat to wake up to on a Sunday... | \n", "close-up photography of bombay cat | \n", "[b'Z', b'`', b'x', b'>', b'\\x83', b'E', b'\\x15... | \n", "0.310214 | \n", "
4 | \n", "aFyD5aWKu6k | \n", "thanosql-dataset/unsplash_data/aFyD5aWKu6k.jpg | \n", "https://images.unsplash.com/photo-157850934606... | \n", "None | \n", "black cat | \n", "[b'\\xc6', b'\\x97', b'V', b'>', b'\\x0f', b'@', ... | \n", "0.309158 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
995 | \n", "VQ41v-gnd1M | \n", "thanosql-dataset/unsplash_data/VQ41v-gnd1M.jpg | \n", "https://images.unsplash.com/photo-158956048611... | \n", "None | \n", "purple smoke in black background | \n", "[b'\\xb7', b'\\xba', b'\\x16', b'>', b'G', b'l', ... | \n", "0.221887 | \n", "
996 | \n", "AtSgtZcxZFc | \n", "thanosql-dataset/unsplash_data/AtSgtZcxZFc.jpg | \n", "https://images.unsplash.com/photo-150329107570... | \n", "In the Smoke of Thinking | \n", "None | \n", "[b'\\xa8', b'\\xa7', b'\\xb3', b'\\xbc', b'\\xd4', ... | \n", "0.221874 | \n", "
997 | \n", "XzOMokbcp0Q | \n", "thanosql-dataset/unsplash_data/XzOMokbcp0Q.jpg | \n", "https://images.unsplash.com/photo-157616182589... | \n", "None | \n", "green-leafed plant during daytime | \n", "[b'\\xd8', b'\\x94', b'\\xc1', b'\\xbd', b'T', b'\\... | \n", "0.221858 | \n", "
998 | \n", "aWcJuh1mUhc | \n", "thanosql-dataset/unsplash_data/aWcJuh1mUhc.jpg | \n", "https://images.unsplash.com/photo-1544460671-b... | \n", "None | \n", "brown tabby cat on bed | \n", "[b',', b'\\x9a', b'Y', b'>', b'\\xf4', b'\\x93', ... | \n", "0.221827 | \n", "
999 | \n", "Zs6T2rub2zw | \n", "thanosql-dataset/unsplash_data/Zs6T2rub2zw.jpg | \n", "https://images.unsplash.com/photo-158179166724... | \n", "None | \n", "green pine trees covered with snow | \n", "[b'?', b'\\x8e', b'\\x1c', b'\\xbf', b'^', b'\\xa4... | \n", "0.221822 | \n", "
1000 rows × 7 columns
\n", "\n", " | image_path | \n", "search_result | \n", "
---|---|---|
0 | \n", "thanosql-dataset/unsplash_data/UMyfDjQ6Ep8.jpg | \n", "0.316560 | \n", "
1 | \n", "thanosql-dataset/unsplash_data/7XJ3d0xK444.jpg | \n", "0.311931 | \n", "
2 | \n", "thanosql-dataset/unsplash_data/m8HsSWh-y6E.jpg | \n", "0.310819 | \n", "
3 | \n", "thanosql-dataset/unsplash_data/6ST6S6i9IGM.jpg | \n", "0.310214 | \n", "
4 | \n", "thanosql-dataset/unsplash_data/aFyD5aWKu6k.jpg | \n", "0.309158 | \n", "
This query, combined with the query above, is made of three levels.
\n", "If you have any difficulties creating your own model using ThanoSQL or applying it to your services, please feel free to contact us below😊
\n", "For inquiries regarding building a text-image search models: contact@smartmind.team
\n", "