Use the Visual Question Answering Model¶
- Tutorial Difficulty: ★☆☆☆☆
- 5 min read
- Languages: SQL (100%)
- File location: tutorial_en/thanosql_ml/question_answering/visual_question_answering.ipynb
- References: COCO Dataset, VQA, ViLT
Tutorial Introduction¶
Understanding Visual Question Answering
Visual Q&A is a technique to find an appropriate answer to a question when an image(Visual) and a question about the image(Question) are given. Visual Q&A is a technology that can be used in many fields, such as children's learning and artificial intelligence assistants.
Since its inception with CVPR in 2016, the VQA Challenge has been held every year. During the challenge, advancements in VQA technologies are evaluated and awarded. Since 2017, many of the researchers have been rigorously evaluating the effectiveness of artificial intelligence with the collected data showing similar images and giving different answers to the same question as the VQA 2.0 dataset. For example, given a similar image to the question "Who is wearing glasses?", the distribution of the data is considered so that the answer to the given question can either be 'male' or 'female'.
The following are examples and applications of the ThanoSQL text-image search algorithm.
It can be used as an assistive system for the visually impaired in an online product sales service. When a customer who has difficulty checking the product image asks a question about the color or material of the product, the visual Q&A model can be used to reduce CS work by providing the answer by itself.
It can be used to create a service that finds the information people need from similar images all at once by asking a single question.
In This Tutorial
👉 Since the following model can only be used for predictions, we will query how many people are in the given image and see the results using images from the COCO dataset.
Dataset Description
The COCO dataset is a dataset created to be used for object detection, segmentation, and keypoint detection. This tutorial will only use the images from the "person" category (2,685 images) among the validation images section of the dataset.
0. Prepare Dataset and Model¶
As mentioned in the ThanoSQL Workspace, you must create an API token and run the query below to execute the query of ThanoSQL.
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>
Prepare Dataset¶
%%thanosql
GET THANOSQL DATASET coco_person_data
OPTIONS (overwrite=True)
Success
Query Details
- "GET THANOSQL DATASET" downloads the specified dataset to the workspace.
- "OPTIONS" specifies the option values to be used for the GET THANOSQL DATASET clause.
- "overwrite": determines whether to overwrite a dataset if it already exists. If set as True, the old dataset is replaced with the new dataset (bool, optional, True|False, default: False)
%%thanosql
COPY coco_person_data
OPTIONS (if_exists='replace')
FROM 'thanosql-dataset/coco_person_data/coco_person.csv'
Success
Query Details
- "COPY" specifies the name of the dataset to be saved as a database table.
- "OPTIONS" specifies the option values to be used for the COPY clause.
- "if_exists": determines how the function should handle the case where the table already exists, it can either raise an error, append to the existing table, or replace the existing table (str, optional, 'fail'|'replace'|'append', default: 'fail')
Prepare the Model¶
%%thanosql
GET THANOSQL MODEL vilt
OPTIONS (
model_name='tutorial_vilt',
overwrite=True
)
Success
Query Details
- "GET THANOSQL MODEL" downloads the specified model to the workspace.
- "OPTIONS" specifies the option values to be used for the GET THANOSQL MODEL clause.
- "model_name": the model name to store a given model in the ThanoSQL workspace (str, optional)
- "overwrite": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)
1. Check Dataset¶
For this tutorial, we use the coco_person_data table stored in the ThanoSQL workspace database. Execute the query below to check the contents of the table.
%%thanosql
SELECT *
FROM coco_person_data
LIMIT 5
image_path | category | |
---|---|---|
0 | thanosql-dataset/coco_person_data/000000398905... | person |
1 | thanosql-dataset/coco_person_data/000000562243... | person |
2 | thanosql-dataset/coco_person_data/000000376307... | person |
3 | thanosql-dataset/coco_person_data/000000441586... | person |
4 | thanosql-dataset/coco_person_data/000000007281... | person |
Understanding the Data Table
coco_person_data table contains the following information.
- image_path: the name of the column that stores the image path
- category: categories of images
%%thanosql
PRINT IMAGE
AS
SELECT image_path
FROM coco_person_data
LIMIT 5
/home/jovyan/thanosql-dataset/coco_person_data/000000398905.jpg
/home/jovyan/thanosql-dataset/coco_person_data/000000562243.jpg
/home/jovyan/thanosql-dataset/coco_person_data/000000376307.jpg
/home/jovyan/thanosql-dataset/coco_person_data/000000441586.jpg
/home/jovyan/thanosql-dataset/coco_person_data/000000007281.jpg
2. Predict Using Pre-built Model¶
To predict the results using the pre-built tutorial_vilt model, run the query below.
%%thanosql
PREDICT USING tutorial_vilt
OPTIONS (
image_col='image_path',
question='How many people are there?',
result_col='predict_result'
)
AS
SELECT image_path
FROM coco_person_data
image_path | predict_result | |
---|---|---|
0 | thanosql-dataset/coco_person_data/000000398905... | 1 |
1 | thanosql-dataset/coco_person_data/000000562243... | 1 |
2 | thanosql-dataset/coco_person_data/000000376307... | 2 |
3 | thanosql-dataset/coco_person_data/000000441586... | 1 |
4 | thanosql-dataset/coco_person_data/000000007281... | 10 |
... | ... | ... |
2680 | thanosql-dataset/coco_person_data/000000008844... | 1 |
2681 | thanosql-dataset/coco_person_data/000000321790... | 1 |
2682 | thanosql-dataset/coco_person_data/000000166478... | 1 |
2683 | thanosql-dataset/coco_person_data/000000122672... | 1 |
2684 | thanosql-dataset/coco_person_data/000000163562... | 1 |
2685 rows × 2 columns
Query Details
- "PREDICT USING" predicts the outcome using the tutorial_vilt.
- "OPTIONS" specifies the option values to be used for prediction.
- "image_col": the name of the column where the path of the image used for prediction is stored (str, default: "image_path")
- "question": the question text to be used for prediction (str)
- "result_col": defines the name of the column to contain the result (str, optional, default: "predict_result")
3. In Conclusion¶
In this tutorial, we used a visual question-and-answer model to predict outcomes by asking questions about images in the COCO dataset. As this is a beginner-level tutorial, we focused on getting visible results through simple queries. If you only ask a question about the images you need, you will get a value that is closer to the desired result.
- How to Upload My Data to the ThanoSQL Workspace
- How to Create a Table Using My Data
- How to Upload My Model to the ThanoSQL Workspace
Inquiries About Deploying a Model for Your Own Service
If you have any difficulties creating your own model using ThanoSQL or applying it to your service, please feel free to contact us below😊
For inquiries regarding building a visual question-and-answer model: contact@smartmind.team