Use the Visual Question Answering Model¶

Tutorial Difficulty: ★☆☆☆☆
5 min read
Languages: SQL (100%)
File location: tutorial_en/thanosql_ml/question_answering/visual_question_answering.ipynb
References: COCO Dataset, VQA, ViLT

Tutorial Introduction¶

Understanding Visual Question Answering

Visual Q&A is a technique to find an appropriate answer to a question when an image(Visual) and a question about the image(Question) are given. Visual Q&A is a technology that can be used in many fields, such as children's learning and artificial intelligence assistants.

Since its inception with CVPR in 2016, the VQA Challenge has been held every year. During the challenge, advancements in VQA technologies are evaluated and awarded. Since 2017, many of the researchers have been rigorously evaluating the effectiveness of artificial intelligence with the collected data showing similar images and giving different answers to the same question as the VQA 2.0 dataset. For example, given a similar image to the question "Who is wearing glasses?", the distribution of the data is considered so that the answer to the given question can either be 'male' or 'female'.

The following are examples and applications of the ThanoSQL text-image search algorithm.

It can be used as an assistive system for the visually impaired in an online product sales service. When a customer who has difficulty checking the product image asks a question about the color or material of the product, the visual Q&A model can be used to reduce CS work by providing the answer by itself.
It can be used to create a service that finds the information people need from similar images all at once by asking a single question.

In This Tutorial

👉 Since the following model can only be used for predictions, we will query how many people are in the given image and see the results using images from the COCO dataset.

Dataset Description

The COCO dataset is a dataset created to be used for object detection, segmentation, and keypoint detection. This tutorial will only use the images from the "person" category (2,685 images) among the validation images section of the dataset.

0. Prepare Dataset and Model¶

As mentioned in the ThanoSQL Workspace, you must create an API token and run the query below to execute the query of ThanoSQL.

In [ ]:

            
                Copied!
                
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>
%load_ext thanosql
%thanosql API_TOKEN=

Prepare Dataset¶

In [2]:

            
                Copied!
                
%%thanosql
GET THANOSQL DATASET coco_person_data
OPTIONS (overwrite=True)
%%thanosql
GET THANOSQL DATASET coco_person_data
OPTIONS (overwrite=True)

Success

Query Details

"GET THANOSQL DATASET" downloads the specified dataset to the workspace.
"OPTIONS" specifies the option values to be used for the GET THANOSQL DATASET clause.
- "overwrite": determines whether to overwrite a dataset if it already exists. If set as True, the old dataset is replaced with the new dataset (bool, optional, True|False, default: False)

In [3]:

            
                Copied!
                
%%thanosql
COPY coco_person_data 
OPTIONS (if_exists='replace')
FROM 'thanosql-dataset/coco_person_data/coco_person.csv'
%%thanosql
COPY coco_person_data 
OPTIONS (if_exists='replace')
FROM 'thanosql-dataset/coco_person_data/coco_person.csv'

Success

Query Details

"COPY" specifies the name of the dataset to be saved as a database table.
"OPTIONS" specifies the option values to be used for the COPY clause.
- "if_exists": determines how the function should handle the case where the table already exists, it can either raise an error, append to the existing table, or replace the existing table (str, optional, 'fail'|'replace'|'append', default: 'fail')

Prepare the Model¶

In [4]:

            
                Copied!
                
                    
                    
                
                

        
%%thanosql
GET THANOSQL MODEL vilt
OPTIONS (
    model_name='tutorial_vilt',
    overwrite=True
    )
%%thanosql
GET THANOSQL MODEL vilt
OPTIONS (
    model_name='tutorial_vilt',
    overwrite=True
    )

Success

Query Details

"GET THANOSQL MODEL" downloads the specified model to the workspace.
"OPTIONS" specifies the option values to be used for the GET THANOSQL MODEL clause.
- "model_name": the model name to store a given model in the ThanoSQL workspace (str, optional)
- "overwrite": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)

1. Check Dataset¶

For this tutorial, we use the coco_person_data table stored in the ThanoSQL workspace database. Execute the query below to check the contents of the table.

In [5]:

            
                Copied!
                
%%thanosql
SELECT *
FROM coco_person_data
LIMIT 5
%%thanosql
SELECT *
FROM coco_person_data
LIMIT 5

Out[5]:

	image_path	category
0	thanosql-dataset/coco_person_data/000000398905...	person
1	thanosql-dataset/coco_person_data/000000562243...	person
2	thanosql-dataset/coco_person_data/000000376307...	person
3	thanosql-dataset/coco_person_data/000000441586...	person
4	thanosql-dataset/coco_person_data/000000007281...	person

Understanding the Data Table

coco_person_data table contains the following information.

image_path: the name of the column that stores the image path
category: categories of images

In [6]:

            
                Copied!
                
                    
                    
                
                

        
%%thanosql
PRINT IMAGE 
AS
SELECT image_path 
FROM coco_person_data 
LIMIT 5
%%thanosql
PRINT IMAGE 
AS
SELECT image_path 
FROM coco_person_data 
LIMIT 5

/home/jovyan/thanosql-dataset/coco_person_data/000000398905.jpg

/home/jovyan/thanosql-dataset/coco_person_data/000000562243.jpg

/home/jovyan/thanosql-dataset/coco_person_data/000000376307.jpg

/home/jovyan/thanosql-dataset/coco_person_data/000000441586.jpg

/home/jovyan/thanosql-dataset/coco_person_data/000000007281.jpg

2. Predict Using Pre-built Model¶

To predict the results using the pre-built tutorial_vilt model, run the query below.

In [7]:

            
                Copied!
                
                    
                    
                
                

        
%%thanosql
PREDICT USING tutorial_vilt
OPTIONS (
    image_col='image_path',
    question='How many people are there?',
    result_col='predict_result'
    )
AS
SELECT image_path
FROM coco_person_data
%%thanosql
PREDICT USING tutorial_vilt
OPTIONS (
    image_col='image_path',
    question='How many people are there?',
    result_col='predict_result'
    )
AS
SELECT image_path
FROM coco_person_data

Out[7]:

	image_path	predict_result
0	thanosql-dataset/coco_person_data/000000398905...	1
1	thanosql-dataset/coco_person_data/000000562243...	1
2	thanosql-dataset/coco_person_data/000000376307...	2
3	thanosql-dataset/coco_person_data/000000441586...	1
4	thanosql-dataset/coco_person_data/000000007281...	10
...	...	...
2680	thanosql-dataset/coco_person_data/000000008844...	1
2681	thanosql-dataset/coco_person_data/000000321790...	1
2682	thanosql-dataset/coco_person_data/000000166478...	1
2683	thanosql-dataset/coco_person_data/000000122672...	1
2684	thanosql-dataset/coco_person_data/000000163562...	1

2685 rows × 2 columns

Query Details

"PREDICT USING" predicts the outcome using the tutorial_vilt.
"OPTIONS" specifies the option values to be used for prediction.
- "image_col": the name of the column where the path of the image used for prediction is stored (str, default: "image_path")
- "question": the question text to be used for prediction (str)
- "result_col": defines the name of the column to contain the result (str, optional, default: "predict_result")

3. In Conclusion¶

In this tutorial, we used a visual question-and-answer model to predict outcomes by asking questions about images in the COCO dataset. As this is a beginner-level tutorial, we focused on getting visible results through simple queries. If you only ask a question about the images you need, you will get a value that is closer to the desired result.

Inquiries About Deploying a Model for Your Own Service

If you have any difficulties creating your own model using ThanoSQL or applying it to your service, please feel free to contact us below😊

For inquiries regarding building a visual question-and-answer model: contact@smartmind.team

Last update: 2023-08-31