xAI voice cloning API

xAI has just released an API to clone your voice. It is pretty simple, read a script, and then an API where you can have text to speech in that voice.

Here is the python code after you have cloned your voice.

			
import os
import requests
voice_id = os.environ['ANDY1_VOICE'] # my demo voice ID
text = '''this is a test demo of my voice. Be excited! 
OK, how about a list of things; one, two, three. 
Lets see where this takes us.'''
response = requests.post(
    "https://api.x.ai/v1/tts",
    headers={
        "Authorization": f"Bearer {os.environ['XAI_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "text": llm_book,
        "voice_id": voice_id,
        "language": "en",
    },
)
response.raise_for_status()
with open("AndyTest1.mp3", "wb") as f:
    f.write(response.content)

		

I need to figure out my audio set up a bit better (my mic set up is probably not optimal and it produces some echo). But does a good job imitating my boring voice right out of the box!

And here is an example for longer speech from my intro to LLMs book:

			
# intro to llm book
llm_book = '''
Large language models (LLMs) are transforming how we work. Some of these examples include using LLMs to help write computer code, using LLMs to extract out information from irregular text sources, and creating chat-bots that can interact with various data sources and documents.
Most analysts, however, do not have any experience with these tools. This book is meant to be a general introduction to realistic examples of how individuals can use these tools; either in general software applications, or to help analysts write code to create software itself. Given the rapid pace of advancement in this area, a general introduction to help individuals who work in the knowledge economy understand the capabilities of these tools I believe is in order.
Here is a simple example of using an LLM API (*Application Programming Interface* -- just a standard way to send information and get information back on the web) using the anthropic library in python to extract key information from a free text crime narrative:
'''
response = requests.post(
    "https://api.x.ai/v1/tts",
    headers={
        "Authorization": f"Bearer {os.environ['XAI_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "text": llm_book,
        "voice_id": voice_id,
        "language": "en",
    },
)
response.raise_for_status()
with open("AndyTest_LLMIntro.mp3", "wb") as f:
    f.write(response.content)

		

The LLM intro messed up *Application Programming Interface* section (start listening at 50 seconds in). But otherwise it is very nice.

For those worried about security, xAI did something smart here — you need to input text live into the API given their prompts. You cannot have a pre-recording audio input to do this. So cloning someone elses voice is pretty hard.

Costs are around $4 per million characters in the text to speech API. So say narrating my entire book should be under $10 I believe.

Took me a total of less than an hour to set up a voice, create the python code, and write this blog post!

Andrew Wheeler

xAI voice cloning API

Leave a comment Cancel reply

Recent Posts

Categories

Site RSS Feeds

Follow Blog via Email

Top Posts & Pages

Stack Exchange

Andrew Wheeler

xAI voice cloning API

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Categories

Site RSS Feeds

Follow Blog via Email

Top Posts & Pages

Stack Exchange