{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "6cfd2a8c-fdfc-4233-abd1-ece097069522", "metadata": {}, "source": [ "# __Create a Speech Recognition Model__" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6520d41e", "metadata": {}, "source": [ "- Tutorial Difficulty: ★☆☆☆☆\n", "- 10 min read\n", "- Languages: [SQL](https://en.wikipedia.org/wiki/SQL) (100%)\n", "- File location: tutorial_en/thanosql_ml/audio_recognition/speech_recognition.ipynb\n", "- References: [LibriSpeech DataSet](http://www.openslr.org/12), [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "05f3d517", "metadata": {}, "source": [ "## Tutorial Introduction\n", "\n", "
Speech recognition technology, also called computer speech recognition or speech-to-text, allows programs to process human speech into text format. Recently, it has been used in a wide range of fields such as automobiles, medical fields, and everyday life involving artificial intelligence speakers or smartphones. Recent Machine Learning Speech recognition technology utilizes algorithms that understand and process speech by integrating grammar, syntax, structure, and composition of audio and speech signals.
\n", "Speech Recognition should not be confused with Voice Recognition, which focuses only on identifying the individual users' voices.
\n", "👉 Librispeech [Panayotov et al. 2015] is the result of LibriVox project, a user-participating audiobook project, which is one of the most used large-scale English speech data in speech recognition research. It was created by processing approximately 1,000 hours of recorded audiobook data sampled at 16 kHz. The target table for the tutorial consists of the pre-uploaded audio file paths and scripts. This tutorial aims to convert audio files to text.
\n", "\n", " | audio_path | \n", "text | \n", "
---|---|---|
0 | \n", "thanosql-dataset/librispeech_data/000.wav | \n", "i noticed how white and well shaped his own ha... | \n", "
1 | \n", "thanosql-dataset/librispeech_data/001.wav | \n", "the only conflicts that occurred on irish soil... | \n", "
2 | \n", "thanosql-dataset/librispeech_data/002.wav | \n", "inquired shaggy in the metal forest | \n", "
3 | \n", "thanosql-dataset/librispeech_data/003.wav | \n", "my grandmother always spoke in a very loud ton... | \n", "
4 | \n", "thanosql-dataset/librispeech_data/004.wav | \n", "the poets of succeeding ages have dwelt much i... | \n", "
librispeech_train table contains the following information.
\n", "\n", " | audio_path | \n", "text | \n", "predict_result | \n", "
---|---|---|---|
0 | \n", "thanosql-dataset/librispeech_data/000.wav | \n", "i noticed how white and well shaped his own ha... | \n", "I NOTICED HOW WHITE AND WELL SHAPED HIS OWN HA... | \n", "
1 | \n", "thanosql-dataset/librispeech_data/001.wav | \n", "the only conflicts that occurred on irish soil... | \n", "THE ONLY CONFLICTS THAT OCCURRED ON IRISH SOIL... | \n", "
2 | \n", "thanosql-dataset/librispeech_data/002.wav | \n", "inquired shaggy in the metal forest | \n", "INQUIRED SHAGGY IN THE MEDAL FOREST | \n", "
3 | \n", "thanosql-dataset/librispeech_data/003.wav | \n", "my grandmother always spoke in a very loud ton... | \n", "MY GRANDMOTHER ALWAYS SPOKE IN A VERY LOUD TON... | \n", "
4 | \n", "thanosql-dataset/librispeech_data/004.wav | \n", "the poets of succeeding ages have dwelt much i... | \n", "THE POETS OF SUCCEEDING AGES HAVE DWELT MUCH I... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
75 | \n", "thanosql-dataset/librispeech_data/075.wav | \n", "we can't do anything without evidence complain | \n", "WE CAN'T DO ANYTHING WITHOUT EVIDENCE COMPLAIN | \n", "
76 | \n", "thanosql-dataset/librispeech_data/076.wav | \n", "when i came up he touched my shoulder and look... | \n", "WHEN I CAME UP HE TOUCHED MY SHOULDER AND LOOK... | \n", "
77 | \n", "thanosql-dataset/librispeech_data/077.wav | \n", "it relieved him for a while | \n", "IT RELIEVED HIM FOR A WHILE | \n", "
78 | \n", "thanosql-dataset/librispeech_data/078.wav | \n", "this world's thick vapours whelm your eyes unw... | \n", "THIS WORLD'S THICK VAPOURS WHELM YOUR EYES UNW... | \n", "
79 | \n", "thanosql-dataset/librispeech_data/079.wav | \n", "i began to enjoy the exhilarating delight of t... | \n", "I BEGAN TO ENJOY THE EXHILARATING DELIGHT OF T... | \n", "
80 rows × 3 columns
\n", "In this example, we set “max_epochs” to 1 to train the model quickly. In general, larger number of “max_epochs” increases performance of the inference at the cost of the computation time.
\n", "\n", " | audio_path | \n", "text | \n", "predict_result | \n", "
---|---|---|---|
0 | \n", "thanosql-dataset/librispeech_data/080.wav | \n", "dead said doctor macklewain | \n", "DEAD SAID DOCTOR MACKELWAYNE | \n", "
1 | \n", "thanosql-dataset/librispeech_data/081.wav | \n", "one day when i rode over to the shimerdas i fo... | \n", "ONE DAY WHEN I RODE OVER TO THE SHIMERIDAS I F... | \n", "
2 | \n", "thanosql-dataset/librispeech_data/082.wav | \n", "well i don't think you should turn a guy's t v... | \n", "WELL I DON'T THINK YOU SHOULD TURN A GUISE TIV... | \n", "
3 | \n", "thanosql-dataset/librispeech_data/083.wav | \n", "and what allurements or what vantages upon the... | \n", "AND WHAT ALLUREMENTS OR WHAT VANTAGES UPON THE... | \n", "
4 | \n", "thanosql-dataset/librispeech_data/084.wav | \n", "yes how many | \n", "YES HOW MANY | \n", "
5 | \n", "thanosql-dataset/librispeech_data/085.wav | \n", "then i look perhaps like what i am | \n", "THEN I LOOK PERHAPS LIKE WHAT I AM | \n", "
6 | \n", "thanosql-dataset/librispeech_data/086.wav | \n", "i'm mister christopher from london | \n", "I'M MISTER CHRISTOPHER FROM LONDON | \n", "
7 | \n", "thanosql-dataset/librispeech_data/087.wav | \n", "nature a difference of fifty years had set a p... | \n", "NATURE A DIFFERENCE OF FIFTY YEARS HAD SET A P... | \n", "
8 | \n", "thanosql-dataset/librispeech_data/088.wav | \n", "he is just married you know is he said burgess | \n", "HE IS JUST MARRIED YOU KNOWIS HE SAID BURGIS | \n", "
9 | \n", "thanosql-dataset/librispeech_data/089.wav | \n", "she pointed into the gold cottonwood tree behi... | \n", "SHE POINTED IN TO THE GOLD COTTONWOOD TREE BEH... | \n", "
10 | \n", "thanosql-dataset/librispeech_data/090.wav | \n", "and she saw the other birds hopping about and ... | \n", "AND SHE SAW ALL THE OTHER BIRDS HOPPING ABOUT ... | \n", "
11 | \n", "thanosql-dataset/librispeech_data/091.wav | \n", "always but it's worse now | \n", "ALWAYS BUT IT'S WORSE NOW | \n", "
12 | \n", "thanosql-dataset/librispeech_data/092.wav | \n", "week followed week these two beings led a happ... | \n", "WEEK FOLLOWED WEEK THESE TWO BEINGS LED A HAPP... | \n", "
13 | \n", "thanosql-dataset/librispeech_data/093.wav | \n", "gwynplaine was a mountebank | \n", "GWYNPLAINE WAS A MOUNT A BANK | \n", "
14 | \n", "thanosql-dataset/librispeech_data/094.wav | \n", "the coals in the grate settled down with a sli... | \n", "THE COALS IN THE GRATE SETTLED DOWN WITH A SLI... | \n", "
15 | \n", "thanosql-dataset/librispeech_data/095.wav | \n", "i've decided to enlist in the army | \n", "I'VE DECIDED T ENLIST IN THE ARMY | \n", "
16 | \n", "thanosql-dataset/librispeech_data/096.wav | \n", "i also offered to help your brother to escape ... | \n", "I ALSO OFFERED TO HELP YOUR BROTHER TO ESCAPE ... | \n", "
17 | \n", "thanosql-dataset/librispeech_data/097.wav | \n", "well now said meekin with asperity i don't agr... | \n", "WELL NOW SAID MICON WITH ASPERITYI DON'T AGREE... | \n", "
18 | \n", "thanosql-dataset/librispeech_data/098.wav | \n", "little did i expect however the spectacle whic... | \n", "LITTLE DID I EXPECT HOWEVER THE SPECTACLE WHIC... | \n", "
19 | \n", "thanosql-dataset/librispeech_data/099.wav | \n", "i look at my watch it's a quarter to eleven | \n", "LOOK AT MY WATCHIT'S A QUARTER TO ELEVEN | \n", "
If you have any difficulties creating your own model using ThanoSQL or applying it to your service, please feel free to contact us below😊
\n", "For inquiries regarding building a speech recognition model: contact@smartmind.team
\n", "