In this particular post, we will learn how to use the Google Cloud Text-to-Speech REST service. It converts text into natural-sounding speech using an API powered by Google’s AI technologies. It is useful in the Web/Mobile applications to provide customer interactions with lifelike human intonation. This service is also helpful in converting the PDF’s into the audiobooks.
The service uses the SSML — Speech Synthesis Markup language. The output of the REST API service is the Synthesised audio in base-64 encoded format. You can decode that content to an mp3 file.
- Set-up google cloud account
Step 1: Enable Cloud Text-to-Speech API
Enable the Cloud Text-to-Speech API in the google developer console.
Step 2: Create a Google Service Account to access API
In the google cloud console, search for IAM & Admin and click on the service accounts. Create a Google Service Account with No Service Role. No role is required to access this service.
Generate the Key.json for the created google service account
Step 3: Access the Google Cloud shell to invoke Text-to-Speech API
In the top-right corner, click in the cloud shell. Cloud shell is a google provided Linux playground environment.
Access the cloud shell. Upload the generated key.json.
Export the GOOGLE_APPLICATION_CREDENTIALS as shown below.
export GOOGLE_APPLICATION_CREDENTIALS='key.json'
Step 4: Test the Google Text-to-Speech REST Service in Cloud Shell
Create the request.json file as follows. In the following example, you are converting the source text to the audio file.
{"input":{"text":"Google cloud has many AI powered API services. This services helps organisations to provide better customer experience."},"voice":{"languageCode":"en-gb","name":"en-GB-Standard-A","ssmlGender":"FEMALE"},"audioConfig":{"audioEncoding":"MP3"}}
REST API Endpoint
POST https://texttospeech.googleapis.com/v1/text:synthesize
Use the curl command to hit the Text-to-Speech API endpoint
$ curl -X POST -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) -H "Content-Type: application/json; charset=utf-8" -d @request.json POST https://texttospeech.googleapis.com/v1/text:synthesize > response.json
The output is the audio content in the base-64 encoded format
Step 5: Decode the encoded content to an audio file
Copy the contents of the audioContent
the field into a new file named synthesize-output-base64.txt
Decode the text file to an mp3
base64 synthesize-output-base64.txt --decode > demo-audio.mp3
Output file:
Hurray, Congratulations — Now you successfully learned how to create and use the Google Cloud Text-to-Speech API service.