How To Build a Japanese Pronunciation Checker With Python and Wit.ai

Let’s create a new Python file for speech recognition.

We’ll need a library to convert the Japanese characters (hiragana, katakana, and kanji) to Latin/Roman (rōmaji) characters. In my experience, Pykakasi is one of the best choices. If you know a better one, let me know in the comments.

pip3 install pykakasi

Add the necessary imports:

import requests
import json
import pykakasi
from recorder import record_audio, read_audio

Then, let’s add the Wit.ai configuration:

# Wit speech API endpoint
API_ENDPOINT = 'https://api.wit.ai/'
API_FUNCTION_SPEECH = 'speech'# Wit API token
wit_access_token = 'VPEZHKEUXSSOGT4EVCO6JXCGTJLP'

Write a function to recognize the speech:

Recognizing speech from the audio

Then we need to convert the Japanese characters to rōmaji. This way, we’ll be able to compare the user input with the expected word, which is written in rōmaji.

The documentation of Pykakasi is pretty clear and concise. Here’s how to use the library:

Converting kanji to romaji

Now, we want to evaluate the result. To do so, we’ll compare the expected word with the user’s audio record. If they’re equal, the pronunciation is correct.

Here’s the speech-evaluation function:

Evaluating the user input

Finally, let’s write the main method to start the program:

The main method

Note that I have hard-coded the Japanese word “neko” (meaning cat) as a target word. This is for the sake of easier testing. Later, we’ll use a predefined word list and choose a random word.

I’ve also set the recording duration to four seconds. Shortly, we’ll create a spinner using Streamlit’s widgets to provide flexible duration value.

Test the prototype

Save the file, and start the program:

python3 japan.py

Say “neko” after the “Listening” prompt, and check the result:

Listening...
Finished recording.
{'entities': {}, 'intents': [], 'text': '猫', 'traits': {}}You said: 猫
**************************************
猫[Neko]
You said: neko which is: Correct

Wit.ai recognized the word and produced a kanji character output (see the ‘text’ column).
The audio input has been converted to romaji thanks to Pykakasi.
Our input matches the expected word “neko.”

Now, say something different to test the behavior:

Listening...
Finished recording.
{'entities': {}, 'intents': [], 'text': 'テスト', 'traits': {}}You said: テスト
**************************************
テスト[Tesuto]
You said: tesuto which is: Incorrect

Great! Our prototype is working. It’s time to build more features.

Test the prototype

Footer