Since ChatGPT hit the scene in late 2022, new generative AI (artificial intelligence) programs have been popping up everywhere. One of the more unique types of artificial intelligence is AI voice, which allows you to use text prompts to create voice clips for marketing, employee training, and more. In this post, we’ll show you how to make an AI voice using a popular program, PlayHT. Let’s dive in.
What is AI?
image created with Midjourney
Artificial intelligence is an expansive technology that allows computers to perform extensive tasks that humans would normally do. However, those tasks take a fraction of the time with AI systems. There are several types of AI, the most common being narrow AI. This type of AI is used to create images, voice, music, and text from a simple text prompt.
What is AI Voice Generation?
image created with Midjourney | Photoshop AI
AI voices are computer-generated voices that mimic human voices’ sounds, features, and tones. They either start as text or by recording your own voice to create a unique human-sounding voice. Through the power of AI’s text-to-speech technology, creators can develop voices for podcasts, voiceovers, or serve as assistive tools for the visually impaired.
How do AI Voice Generators Work?
image created with Leonardo | Photoshop AI
AI voice generators require a text preprocessor, phonetic transcription, and voice synthesis in order to work. The first step, text preprocessing, takes the raw text and makes it all neat and organized. It breaks down words into smaller parts called tokens, fixes weird artifacts like contractions or special characters, and turns numbers into actual words.
Then, in the second step, those tokens are analyzed and given tags like verbs, nouns, or adjectives. This helps the system understand how each word should be used and what it means in the context.
Now, here’s where the magic happens. The text goes through phonetic analysis, which means it’s converted into a special kind of writing that captures how words should sound when spoken. This includes stress, tone, and rhythm to make the speech sound natural.
Finally, in the last step, all that hard work pays off. The phonetic transcriptions are turned into real speech using sound waves. Thanks to modern AI algorithms, the speech generated nowadays sounds way more natural and human-like than the text-to-speech engines of the past.
How to Make an AI Voice
Voice generators have come a long way since IBM released Shoebox in 1962. Modern systems, such as PlayHT, have a variety of synthetic voices with varying tones, making it possible to create the most realistic voices. Other tools, such as MurfAI, will allow you to adjust the pitch, tone, and speed. In this tutorial, we will use PlayHT to create an AI voice.
Step 1: Creating an Account
The first step in the process is to create a free PlayHT account. Start by navigating to the home page, then click the try for free button at the top right of the screen.
You can sign up using a Google account or enter your name and email. Use your chosen method, then click sign up to continue.
In the next step, choose whether you plan to use PlayHT as an individual or in a company setting.
PlayHT will ask questions about how you will use the software in the next few screens.
Once your selections are made, and your account is created, you can begin exploring the interface.
Step 2: Exploring the Interface
Let’s get familiar with the interface before we generate our first voice:
- Create new file: This is where you’ll create your first project.
- Recent files: A list of your most recent projects.
- Files: Where all of your files are located.
- Voice cloning: You can upload audio of a voice, then clone it for use in the software.
- API access: For integrating PlayHT into other applications
- Billing: Manage your account.
- Language selector: English is the only option right now, but other languages are in the works.
- Word count: Allows you to see how many remaining words are available.
- Generate all paragraphs: Used to generate an AI voice.
- Import video: Add voiceovers to an uploaded video.
- Text prompt: Text input for AI voice generation.
- Audio controls: Adjust the timeline of your voice, and rearrange clips.
- Export project: Save your project by paragraph or as one wav file.
Step 3: Generating Your First AI Voice
Generating an AI voice with PlayHT is simple. You can create your own script or use an AI chatbot to assist you. In this tutorial, we’ll use ChatGPT to generate the text for our voice. Start by clicking the create new file button to create a new project for our voice.
Next, we’ll choose a synthetic voice for the project. To do this, click the voice icon directly above the text prompt.
A new popup window will appear, allowing you to sample one of more than 130 AI voices. To choose one, simply click on it (1), select the playback speed (2), choose to apply the voice to all paragraphs in your project (3), then confirm the changes (4).
We’ll choose Hudson because he sounds the most realistic and has a good narrative voice for our script.
Use ChatGPT to Generate a Video Script
Now that we have our voice selected, we’ll need to generate some text. We’ll ask ChatGPT to create a short video script providing interesting facts about dogs. We used the following prompt: Provide a brief video script to showcase 5 unknown facts about dogs.
Once the script is generated, we’ll need to input some text for our project. We’ll start by pasting the intro of our script into PlayHT: Hello, dog lovers! Today, we will uncover five fascinating and lesser-known facts about man’s best friend, dogs! So, grab a treat, sit back, and let’s dive in!
To generate the voice, click the play icon to the left of the text prompt.
PlayHT will generate the text using Hudson’s voice. Depending on the amount of text you use, the generation process could take up to a few minutes. Once complete, you’ll be able to preview the voice by clicking the play button (1) to the right of the screen. If you aren’t satisfied with the result, you can click the regenerate button (2) to try again.
Here’s how the first paragraph sounds:
While this does sound pretty good, it could use a bit of adjusting. The breaks in the voice don’t sound quite right, so we’ll make a couple of adjustments.
PlayHT allows you to make changes to the text input, then regenerate it. This is helpful when the output isn’t up to your standards. A good tip is to add dashes between sentences to create a natural pause. With AI voice generators, there is a tendency to rush the text, creating unnatural run-on sentences. So, to correct this, we’ll change our original prompt to: Hello dog lovers! – Today, we will uncover five fascinating and lesser-known facts about man’s best friend – dogs! – So, grab a treat, sit back, and let’s dive in!
Here are the results:
Step 4: Voice Cloning
Another cool feature of PlayHT is the ability to create your own AI voices. It works by uploading a 30-second clip, then transforming it into a usable AI voice for your projects. To start, click the voice cloning tab in the PlayHT interface.
Next, click the create a new clone button.
Since we are using the free license, our only option is to choose the instant option, which creates an AI voice from a 30-second sound clip.
Next, give your voice a name (1), choose a gender (2), upload an audio file (3), confirm that you have the rights to use the clip (4), and click create (5).
Once the clone is created, it will appear underneath the create a new clone button (1). From there, you can use it (2), share it (3), or delete it (4).
To see how it sounds, add a text prompt to preview it. To be honest, we were pretty impressed with the results:
Step 5: Exporting a Project
The last step in the creation process is exporting your sound files. You can do this one of two ways: exporting one paragraph at a time or all paragraphs in one file. For most creators, it makes sense to export files separately. That way, you can add cut scenes and other effects between each one. To export your files, click the export button at the top left of the screen.
A drop-down menu will appear with two options: each paragraph separately and as a single audio file.
All files are exported as wav audio files, which can be imported using any audio software.
Best Practices for AI Voice Generators
Understanding a few best practices when creating clips to get the most out of AI voice generators is important. First, separate sentences by adding a dash (-). This lets the algorithm know there should be a pause and will typically eliminate run-on sentences. Similarly, commas and semi-colons can add a natural pause between words. On the other hand, avoid hyphens between words in a sentence. For example, you would use landlocked rather than land-locked.
You should also add spaces between acronyms to help AI understand that there should be individual spoken letters rather than words. For example, instead of using AI, use A I. You can also add a period between letters in acronyms. To avoid word repetition, rephrase your text to include punctuation, such as commas, semicolons, or hyphens. Another way to remedy repetition is to break sentences down into smaller ones. This prevents the AI from becoming confused, which usually ends with undesirable results.
Final Thoughts on Creating an AI Voice
AI voice generators are changing the way creators make audio. Through artificial intelligence software like PlayHT, you can create voices for podcasts, YouTube videos, marketing videos, training materials, and more. As AI technology advances, the future of voice generation holds tremendous potential, opening doors to more immersive experiences.
For those interested in exploring other AI applications, our blog has plenty of posts to help you become an AI superstar in no time.
Featured Image via Pro_Vector / Shutterstock.com
This content was originally published here.