{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "6cfd2a8c-fdfc-4233-abd1-ece097069522", "metadata": {}, "source": [ "# __Create a Speech Recognition Model__" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6520d41e", "metadata": {}, "source": [ "- Tutorial Difficulty: ★☆☆☆☆\n", "- 10 min read\n", "- Languages: [SQL](https://en.wikipedia.org/wiki/SQL) (100%)\n", "- File location: tutorial_en/thanosql_ml/audio_recognition/speech_recognition.ipynb\n", "- References: [LibriSpeech DataSet](http://www.openslr.org/12), [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "05f3d517", "metadata": {}, "source": [ "## Tutorial Introduction\n", "\n", "
\n", "

Understanding Speech Recognition

\n", "

Speech recognition technology, also called computer speech recognition or speech-to-text, allows programs to process human speech into text format. Recently, it has been used in a wide range of fields such as automobiles, medical fields, and everyday life involving artificial intelligence speakers or smartphones. Recent Machine Learning Speech recognition technology utilizes algorithms that understand and process speech by integrating grammar, syntax, structure, and composition of audio and speech signals.

\n", "
\n", "\n", "
\n", "

Speech Recognition should not be confused with Voice Recognition, which focuses only on identifying the individual users' voices.

\n", "
\n", "\n", "Today, speech recognition technology is being applied in various industries. Advances in speech recognition technology have been expanding into automatic interpretation for simple travel to high-level business meetings. In addition, it has delved into fields such as speech synthesis technology, which acts as a virtual guide, mimicking the voice of a specific celebrity, and converting a predetermined fingerprint into a voice.\n", "\n", "__The following are examples and applications of the ThanoSQL speech recognition model.__\n", "\n", "- Speech recognition technology converts phone consultation data into text to enable customer sentiment analysis and consultation trend analysis. Using speech recognition technology, customer service representatives can improve their service by quickly receiving relevant information that answers customer inquiries.\n", "In addition, after consultation, the customer satisfaction trend can be analyzed even with the indirect measurement of customer satisfaction through sentiment analysis.\n", "- Using speech recognition technology, you can write notes faster than writing with a keyboard and instantly search for specific keywords even in long audio files.\n", "\n", "
\n", "

In This Tutorial

\n", "

👉 Librispeech [Panayotov et al. 2015] is the result of LibriVox project, a user-participating audiobook project, which is one of the most used large-scale English speech data in speech recognition research. It was created by processing approximately 1,000 hours of recorded audiobook data sampled at 16 kHz. The target table for the tutorial consists of the pre-uploaded audio file paths and scripts. This tutorial aims to convert audio files to text.

\n", "
\n", "\n", "
\n", "

Tutorial Notes

\n", " \n", "
" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4d5a038e-2951-433a-b5b2-2cc0cf439d2e", "metadata": {}, "source": [ "## __0. Prepare Dataset and Model__\n", "\n", "As mentioned in the [ThanoSQL Workspace](https://docs.thanosql.ai/1.5/en/getting_started/paas/workspace/lab/), you must create an API token and run the query below to execute the query of ThanoSQL. " ] }, { "cell_type": "code", "execution_count": null, "id": "9cb93ef5-7309-4842-b616-f8269965db46", "metadata": {}, "outputs": [], "source": [ "%load_ext thanosql\n", "%thanosql API_TOKEN=" ] }, { "attachments": {}, "cell_type": "markdown", "id": "073a6182", "metadata": {}, "source": [ "### __Prepare Dataset__" ] }, { "cell_type": "code", "execution_count": 2, "id": "a75c05e1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Success\n" ] } ], "source": [ "%%thanosql\n", "GET THANOSQL DATASET librispeech_data\n", "OPTIONS (overwrite=True)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1ec2d024", "metadata": {}, "source": [ "
\n", "

Query Details

\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": 3, "id": "784d86c0-984b-4df7-a46c-b8c2294e4e99", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Success\n" ] } ], "source": [ "%%thanosql\n", "COPY librispeech_train \n", "OPTIONS (if_exists='replace')\n", "FROM 'thanosql-dataset/librispeech_data/librispeech_train.csv'" ] }, { "cell_type": "code", "execution_count": 4, "id": "ebe4e8d3-bbd6-4fcb-92f2-0f6263612af6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Success\n" ] } ], "source": [ "%%thanosql\n", "COPY librispeech_test \n", "OPTIONS (if_exists='replace')\n", "FROM 'thanosql-dataset/librispeech_data/librispeech_test.csv'" ] }, { "attachments": {}, "cell_type": "markdown", "id": "984aefd3", "metadata": {}, "source": [ "
\n", "

Query Details

\n", " \n", "
" ] }, { "attachments": {}, "cell_type": "markdown", "id": "63f797e8", "metadata": {}, "source": [ "### __Prepare the Model__" ] }, { "cell_type": "code", "execution_count": 5, "id": "cf80d407", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Success\n" ] } ], "source": [ "%%thanosql\n", "GET THANOSQL MODEL wav2vec2\n", "OPTIONS (\n", " model_name='tutorial_audio_recognition',\n", " overwrite=True\n", " )" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f38bd64f", "metadata": {}, "source": [ "
\n", "

Query Details

\n", " \n", "
" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e557a156-1075-41df-b19f-ff0812a14b4c", "metadata": {}, "source": [ "## __1. Check Dataset__\n", "\n", "To create a speech recognition model, we use the __librispeech_train__ table located in the ThanoSQL workspace database. Run the query below to check the contents of the table." ] }, { "cell_type": "code", "execution_count": 6, "id": "49d801df-54d2-4809-bbed-69b2818e9cec", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
audio_pathtext
0thanosql-dataset/librispeech_data/000.wavi noticed how white and well shaped his own ha...
1thanosql-dataset/librispeech_data/001.wavthe only conflicts that occurred on irish soil...
2thanosql-dataset/librispeech_data/002.wavinquired shaggy in the metal forest
3thanosql-dataset/librispeech_data/003.wavmy grandmother always spoke in a very loud ton...
4thanosql-dataset/librispeech_data/004.wavthe poets of succeeding ages have dwelt much i...
\n", "
" ], "text/plain": [ " audio_path \\\n", "0 thanosql-dataset/librispeech_data/000.wav \n", "1 thanosql-dataset/librispeech_data/001.wav \n", "2 thanosql-dataset/librispeech_data/002.wav \n", "3 thanosql-dataset/librispeech_data/003.wav \n", "4 thanosql-dataset/librispeech_data/004.wav \n", "\n", " text \n", "0 i noticed how white and well shaped his own ha... \n", "1 the only conflicts that occurred on irish soil... \n", "2 inquired shaggy in the metal forest \n", "3 my grandmother always spoke in a very loud ton... \n", "4 the poets of succeeding ages have dwelt much i... " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%thanosql\n", "SELECT *\n", "FROM librispeech_train\n", "LIMIT 5" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ff864330", "metadata": {}, "source": [ "
\n", "

Understanding the Data Table

\n", "

librispeech_train table contains the following information.

\n", " \n", "
\n" ] }, { "cell_type": "code", "execution_count": 7, "id": "41d7b7cc-e411-4e39-b443-1aa287a0adff", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/jovyan/thanosql-dataset/librispeech_data/000.wav\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "/home/jovyan/thanosql-dataset/librispeech_data/001.wav\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "/home/jovyan/thanosql-dataset/librispeech_data/002.wav\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%thanosql\n", "PRINT AUDIO \n", "AS\n", "SELECT audio_path\n", "FROM librispeech_train\n", "LIMIT 3" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b3251b51-e9d8-4c5c-9882-46ca1c5d58bd", "metadata": {}, "source": [ "## __2. Predict Using Pre-built Model__\n", "\n", "To predict the results using the pre-built __tutorial_audio_recognition__ model, run the query below." ] }, { "cell_type": "code", "execution_count": 8, "id": "ef603c7d-ac80-4918-bb9d-d4abe3e07d03", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
audio_pathtextpredict_result
0thanosql-dataset/librispeech_data/000.wavi noticed how white and well shaped his own ha...I NOTICED HOW WHITE AND WELL SHAPED HIS OWN HA...
1thanosql-dataset/librispeech_data/001.wavthe only conflicts that occurred on irish soil...THE ONLY CONFLICTS THAT OCCURRED ON IRISH SOIL...
2thanosql-dataset/librispeech_data/002.wavinquired shaggy in the metal forestINQUIRED SHAGGY IN THE MEDAL FOREST
3thanosql-dataset/librispeech_data/003.wavmy grandmother always spoke in a very loud ton...MY GRANDMOTHER ALWAYS SPOKE IN A VERY LOUD TON...
4thanosql-dataset/librispeech_data/004.wavthe poets of succeeding ages have dwelt much i...THE POETS OF SUCCEEDING AGES HAVE DWELT MUCH I...
............
75thanosql-dataset/librispeech_data/075.wavwe can't do anything without evidence complainWE CAN'T DO ANYTHING WITHOUT EVIDENCE COMPLAIN
76thanosql-dataset/librispeech_data/076.wavwhen i came up he touched my shoulder and look...WHEN I CAME UP HE TOUCHED MY SHOULDER AND LOOK...
77thanosql-dataset/librispeech_data/077.wavit relieved him for a whileIT RELIEVED HIM FOR A WHILE
78thanosql-dataset/librispeech_data/078.wavthis world's thick vapours whelm your eyes unw...THIS WORLD'S THICK VAPOURS WHELM YOUR EYES UNW...
79thanosql-dataset/librispeech_data/079.wavi began to enjoy the exhilarating delight of t...I BEGAN TO ENJOY THE EXHILARATING DELIGHT OF T...
\n", "

80 rows × 3 columns

\n", "
" ], "text/plain": [ " audio_path \\\n", "0 thanosql-dataset/librispeech_data/000.wav \n", "1 thanosql-dataset/librispeech_data/001.wav \n", "2 thanosql-dataset/librispeech_data/002.wav \n", "3 thanosql-dataset/librispeech_data/003.wav \n", "4 thanosql-dataset/librispeech_data/004.wav \n", ".. ... \n", "75 thanosql-dataset/librispeech_data/075.wav \n", "76 thanosql-dataset/librispeech_data/076.wav \n", "77 thanosql-dataset/librispeech_data/077.wav \n", "78 thanosql-dataset/librispeech_data/078.wav \n", "79 thanosql-dataset/librispeech_data/079.wav \n", "\n", " text \\\n", "0 i noticed how white and well shaped his own ha... \n", "1 the only conflicts that occurred on irish soil... \n", "2 inquired shaggy in the metal forest \n", "3 my grandmother always spoke in a very loud ton... \n", "4 the poets of succeeding ages have dwelt much i... \n", ".. ... \n", "75 we can't do anything without evidence complain \n", "76 when i came up he touched my shoulder and look... \n", "77 it relieved him for a while \n", "78 this world's thick vapours whelm your eyes unw... \n", "79 i began to enjoy the exhilarating delight of t... \n", "\n", " predict_result \n", "0 I NOTICED HOW WHITE AND WELL SHAPED HIS OWN HA... \n", "1 THE ONLY CONFLICTS THAT OCCURRED ON IRISH SOIL... \n", "2 INQUIRED SHAGGY IN THE MEDAL FOREST \n", "3 MY GRANDMOTHER ALWAYS SPOKE IN A VERY LOUD TON... \n", "4 THE POETS OF SUCCEEDING AGES HAVE DWELT MUCH I... \n", ".. ... \n", "75 WE CAN'T DO ANYTHING WITHOUT EVIDENCE COMPLAIN \n", "76 WHEN I CAME UP HE TOUCHED MY SHOULDER AND LOOK... \n", "77 IT RELIEVED HIM FOR A WHILE \n", "78 THIS WORLD'S THICK VAPOURS WHELM YOUR EYES UNW... \n", "79 I BEGAN TO ENJOY THE EXHILARATING DELIGHT OF T... \n", "\n", "[80 rows x 3 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%thanosql\n", "PREDICT USING tutorial_audio_recognition\n", "OPTIONS (\n", " audio_col='audio_path',\n", " batch_size=8\n", " )\n", "AS \n", "SELECT * \n", "FROM librispeech_train" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e9b4137e-75ad-4bea-bda0-0c7fcdb3b72b", "metadata": { "tags": [] }, "source": [ "## __3. Build a Speech Recognition Model__\n", "\n", "To create a speech recognition model with the name __my_speech_recognition_model__ using the __librispeech_train__ dataset from the previous step, run the following query. \n", "(Estimated duration of query execution: 1 min)" ] }, { "cell_type": "code", "execution_count": 9, "id": "8ffb0317-476e-409f-baa3-dc562d924e99", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Success\n" ] } ], "source": [ "%%thanosql\n", "BUILD MODEL my_speech_recognition_model\n", "USING Wav2Vec2En\n", "OPTIONS (\n", " audio_col='audio_path', \n", " text_col='text', \n", " max_epochs=1, \n", " batch_size=4,\n", " overwrite= True \n", " )\n", "AS\n", "SELECT *\n", "FROM librispeech_train" ] }, { "attachments": {}, "cell_type": "markdown", "id": "cc4f0cfa", "metadata": {}, "source": [ "
\n", "

Query Details

\n", "
    \n", "
  • \"BUILD MODEL\" creates and trains a model named my_speech_recognition_model.
  • \n", "
  • \"USING\" specifies Wav2Vec2En as the base model.
  • \n", "
  • \"OPTIONS\" specifies the option values used to create the model. \n", "
      \n", "
    • \"audio_col\": the name of the column containing the audio path to be used for training (str, default: 'audio_path')
    • \n", "
    • \"text_col\": the name of the column containing the audio script information (str, default: 'text')
    • \n", "
    • \"max_epochs\": number of times to train with the training dataset (int, optional, default: 5)
    • \n", "
    • \"batch_size\": the size of dataset bundle utilized in a single cycle of training (int, optional, default: 16)
    • \n", "
    • \"overwrite\": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)
    • \n", "
    \n", "
  • \n", "
\n", "
\n", "\n", "
\n", "

In this example, we set “max_epochs” to 1 to train the model quickly. In general, larger number of “max_epochs” increases performance of the inference at the cost of the computation time.

\n", "
" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d6402263-ff45-45e8-b221-27c1ff97c556", "metadata": {}, "source": [ "## __4. Predict__\n", "\n", "To use the speech recognition model created in the previous step for prediction of __librispeech_test__, run the following query." ] }, { "cell_type": "code", "execution_count": 10, "id": "93bb46ac", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
audio_pathtextpredict_result
0thanosql-dataset/librispeech_data/080.wavdead said doctor macklewainDEAD SAID DOCTOR MACKELWAYNE
1thanosql-dataset/librispeech_data/081.wavone day when i rode over to the shimerdas i fo...ONE DAY WHEN I RODE OVER TO THE SHIMERIDAS I F...
2thanosql-dataset/librispeech_data/082.wavwell i don't think you should turn a guy's t v...WELL I DON'T THINK YOU SHOULD TURN A GUISE TIV...
3thanosql-dataset/librispeech_data/083.wavand what allurements or what vantages upon the...AND WHAT ALLUREMENTS OR WHAT VANTAGES UPON THE...
4thanosql-dataset/librispeech_data/084.wavyes how manyYES HOW MANY
5thanosql-dataset/librispeech_data/085.wavthen i look perhaps like what i amTHEN I LOOK PERHAPS LIKE WHAT I AM
6thanosql-dataset/librispeech_data/086.wavi'm mister christopher from londonI'M MISTER CHRISTOPHER FROM LONDON
7thanosql-dataset/librispeech_data/087.wavnature a difference of fifty years had set a p...NATURE A DIFFERENCE OF FIFTY YEARS HAD SET A P...
8thanosql-dataset/librispeech_data/088.wavhe is just married you know is he said burgessHE IS JUST MARRIED YOU KNOWIS HE SAID BURGIS
9thanosql-dataset/librispeech_data/089.wavshe pointed into the gold cottonwood tree behi...SHE POINTED IN TO THE GOLD COTTONWOOD TREE BEH...
10thanosql-dataset/librispeech_data/090.wavand she saw the other birds hopping about and ...AND SHE SAW ALL THE OTHER BIRDS HOPPING ABOUT ...
11thanosql-dataset/librispeech_data/091.wavalways but it's worse nowALWAYS BUT IT'S WORSE NOW
12thanosql-dataset/librispeech_data/092.wavweek followed week these two beings led a happ...WEEK FOLLOWED WEEK THESE TWO BEINGS LED A HAPP...
13thanosql-dataset/librispeech_data/093.wavgwynplaine was a mountebankGWYNPLAINE WAS A MOUNT A BANK
14thanosql-dataset/librispeech_data/094.wavthe coals in the grate settled down with a sli...THE COALS IN THE GRATE SETTLED DOWN WITH A SLI...
15thanosql-dataset/librispeech_data/095.wavi've decided to enlist in the armyI'VE DECIDED T ENLIST IN THE ARMY
16thanosql-dataset/librispeech_data/096.wavi also offered to help your brother to escape ...I ALSO OFFERED TO HELP YOUR BROTHER TO ESCAPE ...
17thanosql-dataset/librispeech_data/097.wavwell now said meekin with asperity i don't agr...WELL NOW SAID MICON WITH ASPERITYI DON'T AGREE...
18thanosql-dataset/librispeech_data/098.wavlittle did i expect however the spectacle whic...LITTLE DID I EXPECT HOWEVER THE SPECTACLE WHIC...
19thanosql-dataset/librispeech_data/099.wavi look at my watch it's a quarter to elevenLOOK AT MY WATCHIT'S A QUARTER TO ELEVEN
\n", "
" ], "text/plain": [ " audio_path \\\n", "0 thanosql-dataset/librispeech_data/080.wav \n", "1 thanosql-dataset/librispeech_data/081.wav \n", "2 thanosql-dataset/librispeech_data/082.wav \n", "3 thanosql-dataset/librispeech_data/083.wav \n", "4 thanosql-dataset/librispeech_data/084.wav \n", "5 thanosql-dataset/librispeech_data/085.wav \n", "6 thanosql-dataset/librispeech_data/086.wav \n", "7 thanosql-dataset/librispeech_data/087.wav \n", "8 thanosql-dataset/librispeech_data/088.wav \n", "9 thanosql-dataset/librispeech_data/089.wav \n", "10 thanosql-dataset/librispeech_data/090.wav \n", "11 thanosql-dataset/librispeech_data/091.wav \n", "12 thanosql-dataset/librispeech_data/092.wav \n", "13 thanosql-dataset/librispeech_data/093.wav \n", "14 thanosql-dataset/librispeech_data/094.wav \n", "15 thanosql-dataset/librispeech_data/095.wav \n", "16 thanosql-dataset/librispeech_data/096.wav \n", "17 thanosql-dataset/librispeech_data/097.wav \n", "18 thanosql-dataset/librispeech_data/098.wav \n", "19 thanosql-dataset/librispeech_data/099.wav \n", "\n", " text \\\n", "0 dead said doctor macklewain \n", "1 one day when i rode over to the shimerdas i fo... \n", "2 well i don't think you should turn a guy's t v... \n", "3 and what allurements or what vantages upon the... \n", "4 yes how many \n", "5 then i look perhaps like what i am \n", "6 i'm mister christopher from london \n", "7 nature a difference of fifty years had set a p... \n", "8 he is just married you know is he said burgess \n", "9 she pointed into the gold cottonwood tree behi... \n", "10 and she saw the other birds hopping about and ... \n", "11 always but it's worse now \n", "12 week followed week these two beings led a happ... \n", "13 gwynplaine was a mountebank \n", "14 the coals in the grate settled down with a sli... \n", "15 i've decided to enlist in the army \n", "16 i also offered to help your brother to escape ... \n", "17 well now said meekin with asperity i don't agr... \n", "18 little did i expect however the spectacle whic... \n", "19 i look at my watch it's a quarter to eleven \n", "\n", " predict_result \n", "0 DEAD SAID DOCTOR MACKELWAYNE \n", "1 ONE DAY WHEN I RODE OVER TO THE SHIMERIDAS I F... \n", "2 WELL I DON'T THINK YOU SHOULD TURN A GUISE TIV... \n", "3 AND WHAT ALLUREMENTS OR WHAT VANTAGES UPON THE... \n", "4 YES HOW MANY \n", "5 THEN I LOOK PERHAPS LIKE WHAT I AM \n", "6 I'M MISTER CHRISTOPHER FROM LONDON \n", "7 NATURE A DIFFERENCE OF FIFTY YEARS HAD SET A P... \n", "8 HE IS JUST MARRIED YOU KNOWIS HE SAID BURGIS \n", "9 SHE POINTED IN TO THE GOLD COTTONWOOD TREE BEH... \n", "10 AND SHE SAW ALL THE OTHER BIRDS HOPPING ABOUT ... \n", "11 ALWAYS BUT IT'S WORSE NOW \n", "12 WEEK FOLLOWED WEEK THESE TWO BEINGS LED A HAPP... \n", "13 GWYNPLAINE WAS A MOUNT A BANK \n", "14 THE COALS IN THE GRATE SETTLED DOWN WITH A SLI... \n", "15 I'VE DECIDED T ENLIST IN THE ARMY \n", "16 I ALSO OFFERED TO HELP YOUR BROTHER TO ESCAPE ... \n", "17 WELL NOW SAID MICON WITH ASPERITYI DON'T AGREE... \n", "18 LITTLE DID I EXPECT HOWEVER THE SPECTACLE WHIC... \n", "19 LOOK AT MY WATCHIT'S A QUARTER TO ELEVEN " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%thanosql\n", "PREDICT USING my_speech_recognition_model\n", "OPTIONS (\n", " audio_col='audio_path',\n", " result_col='predict_result'\n", " )\n", "AS\n", "SELECT *\n", "FROM librispeech_test" ] }, { "attachments": {}, "cell_type": "markdown", "id": "da7d9847", "metadata": {}, "source": [ "
\n", "

Query Details

\n", "
    \n", "
  • \"PREDICT USING\" predicts the outcome using the my_speech_recognition_model.\n", "
  • \"OPTIONS\" specifies the option values to be used for prediction.\n", "
      \n", "
    • \"audio_col\": the name of the column containing the audio path to be used for prediction (str, default: 'audio_path')
    • \n", "
    • \"result_col\": the column that contains the predicted results (str, optional, default: 'predict_result')
    • \n", "
    \n", "
  • \n", "
\n", "
" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5389cb4a", "metadata": {}, "source": [ "## __5. In Conclusion__\n", "\n", "In this tutorial, we created a speech recognition model using the LibriSpeech dataset. As this is a beginner-level tutorial, we focused on the process rather than accuracy. Speech recognition models can be improved in accuracy through fine tuning that is suitable for the user's needs. Try using your own data to train the base model and improving its performance. Create your own model and provide competitive services by combining various unstructured data(image, audio, video, etc.) and structured data with ThanoSQL.\n", "\n", "* [How to Upload My Data to the ThanoSQL Workspace](https://docs.thanosql.ai/1.5/en/getting_started/data_upload/)\n", "* [How to Create a Table Using My Data](https://docs.thanosql.ai/1.5/en/how-to_guides/ThanoSQL_query/COPY_SYNTAX/)\n", "* [How to Upload My Model to the ThanoSQL Workspace](https://docs.thanosql.ai/1.5/en/how-to_guides/ThanoSQL_query/UPLOAD_MODEL_SYNTAX/)\n", "\n", "
\n", "

Inquiries About Deploying a Model for Your Own Service

\n", "

If you have any difficulties creating your own model using ThanoSQL or applying it to your service, please feel free to contact us below😊

\n", "

For inquiries regarding building a speech recognition model: contact@smartmind.team

\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]" }, "toc-autonumbering": false, "vscode": { "interpreter": { "hash": "54a1ec72395a4a5a649013bb47cb6c1a711fb4b3d33a07524a09f31d6d2ee0ec" } } }, "nbformat": 4, "nbformat_minor": 5 }