Let’s create a new Python file for speech recognition.
We’ll need a library to convert the Japanese characters (hiragana, katakana, and kanji) to Latin/Roman (rōmaji) characters. In my experience, Pykakasi is one of the best choices. If you know a better one, let me know in the comments.
pip3 install pykakasi
Add the necessary imports:
import requests
import json
import pykakasi
from recorder import record_audio, read_audio
Then, let’s add the Wit.ai configuration:
# Wit speech API endpoint
API_ENDPOINT = 'https://api.wit.ai/'
API_FUNCTION_SPEECH = 'speech'# Wit API token
wit_access_token = 'VPEZHKEUXSSOGT4EVCO6JXCGTJLP'
Write a function to recognize the speech:
Then we need to convert the Japanese characters to rōmaji. This way, we’ll be able to compare the user input with the expected word, which is written in rōmaji.
The documentation of Pykakasi is pretty clear and concise. Here’s how to use the library:
Now, we want to evaluate the result. To do so, we’ll compare the expected word with the user’s audio record. If they’re equal, the pronunciation is correct.
Here’s the speech-evaluation function:
Finally, let’s write the main
method to start the program:
Note that I have hard-coded the Japanese word “neko” (meaning cat) as a target word. This is for the sake of easier testing. Later, we’ll use a predefined word list and choose a random word.
I’ve also set the recording duration to four seconds. Shortly, we’ll create a spinner using Streamlit’s widgets to provide flexible duration value.
Test the prototype
Save the file, and start the program:
python3 japan.py
Say “neko” after the “Listening” prompt, and check the result:
Listening...
Finished recording.
{'entities': {}, 'intents': [], 'text': '猫', 'traits': {}}You said: 猫
**************************************
猫[Neko]
You said: neko which is: Correct
- Wit.ai recognized the word and produced a kanji character output (see the ‘text’ column).
- The audio input has been converted to romaji thanks to Pykakasi.
- Our input matches the expected word “neko.”
Now, say something different to test the behavior:
Listening...
Finished recording.
{'entities': {}, 'intents': [], 'text': 'テスト', 'traits': {}}You said: テスト
**************************************
テスト[Tesuto]
You said: tesuto which is: Incorrect
Great! Our prototype is working. It’s time to build more features.