How to use the ElevenLabs API: Python text-to-speech tutorial with examples

How to use the ElevenLabs API: Python text-to-speech tutorial with examples


How to use the ElevenLabs API: Python text-to-speech tutorial with examples

A beginner-friendly tutorial showing how to generate voice output with Python using the ElevenLabs API for TTS (text-to-speech). No account or API key is needed for the initial trial so you can run the code in this video for free.

Get the source code here:
https://puppycoding.com/2023/08/24/el

And see a video of all the premade ElevenLabs voice samples here:
   • ElevenLabs Voice Samples: Hear all pr…  

00:00 Intro
00:27 Install the “elevenlabs” package
00:56 Generate \u0026 play audio
04:01 Change the voice
06:09 Save as an audio file
07:25 Control how the voice sounds
10:29 Use an API key

#Python #programming #ElevenLabs


Content

0 -> If you'd like to know how to create voice output like this,
3.36 -> "Hi, I'm from the future"
5.16 -> using Python, keep watching.
7.36 -> I'm going to be showing you the ElevenLabs API in this tutorial
11.36 -> because I've found it, at the moment, to have the best quality output.
15.36 -> That may change in the future, but it's also easy to use
18.36 -> and you don't need an API key or even an account just to try it out.
22.64 -> You can do it totally free, straight from a few lines of Python code.
26.16 -> So let's see how to do that.
28.16 -> ElevenLabs have actually made a Python package which makes it very easy to use their API,
33.4 -> so we'll install that first with "pip".
35.92 -> We use pip and on my machine I've got versions two and three installed so I need to specify
40.14 -> three, but for you maybe you can just use "pip" on its own.
44.92 -> Install and then it's called "elevenlabs".
47.04 -> Nice and easy.
48.04 -> Press Enter.
49.4 -> I've already got it installed so it's very quick but maybe wait a few seconds and it should
53.2 -> be done and we're ready to start coding.
58.04 -> in a new Python file here, the first thing we do is import the package we just installed.
62.94 -> So "import elevenlabs". We're going to start off without an API key just to show that it can
69.08 -> be done. We can use it for free but after several attempts, it will eventually stop
73.6 -> us and say, "hey, you've had your trial now, please make an account", but I think we'll
77.8 -> still be okay. So let's start off by creating the program in two parts. The first part is
84.64 -> generating the audio and the second part is playing the audio. Let's store the
89.86 -> audio in a new variable called "audio" or call it whatever you like, and this is
94.78 -> going to be a method within the "elevenlabs" module called "generate". So it's "elevenlabs"
102.58 -> and within that, "generate". Inside these brackets, the two arguments that we
111.38 -> need are the text and which voice you want to use. I'm going to press Enter to
115.94 -> just to make it a bit cleaner. So what text shall we have? Let's have…
121.78 -> Okay and the second argument will be the voice that we want. This can be a voice
131.98 -> ID if you know it, it can be the name of a voice if you know it, or it can be a
136.94 -> voice object which can be created and I'll show you that a bit later on.
140.72 -> First of all, let's use a voice that I know is in there called Bella.
145.72 -> That's it. That's the generation part finished, but we can't hear it yet.
154.72 -> To do that, we need to play it.
156.72 -> We use the "elevenlabs" module again, and within that there is a "play" method.
165.72 -> That's so easy to understand. That's good.
167.72 -> And what do we play? Well we play the audio that we just generated.
171.52 -> And that should be it. Let's give it a try. I'll save this. I'll go over
179.4 -> to my terminal. Let's clear all that so it's easy to understand. Right, as I said
185.18 -> I've got two versions of Python installed so you may be okay just doing
188.8 -> "python". I need to actually type "python3" and then I've called this file
193.4 -> "generate_voice.py". Press Enter and wait a couple of seconds for it to generate
198.44 -> and we'll see what happens. "Hi, I'm from the future!"
203.28 -> It worked! Thank you very much and to prove that it's not a pre-recorded thing
208.4 -> that I created, let's just change this a little bit. Why don't we say "Hello I am
213.28 -> from the past." Okay save that, try it again. I'm pressing the up key on my
219.16 -> keyboard which automatically fills in the last command that I used. So...
224.92 -> "Hi, I'm from the past!"
226.92 -> Oh, are you Bella? Very interesting. So there you go. That's the easiest way to use the
231.92 -> ElevenLabs API for text-to-speech generation. Now you might be wondering how can I
236.88 -> change the voice or how do I even know what voices are available? So let's look
240.28 -> at that next. On YouTube if you search for ElevenLabs voice samples I've got this
246.32 -> video here which plays a preview of all of the ElevenLabs pre-made voice samples.
251.28 -> I think it's nearly 40 or so of them and so it's very easy to hear what they
255.64 -> sound like. And so on. If you look in the description I list the names of all of
267.76 -> them, together with the voice ID and initially the name is enough but when
273.32 -> you get into a bit more complex programming then it's better to have the voice ID as well.
278.92 -> Anyway, let's pick something from here… Freya. I can't remember what kind of voice that is,
284.52 -> but we'll try Freya. So back over to our program and we'll change this to Freya. I'm going to save that,
292.12 -> run it again. Oh wow, interesting voice there. And let's demonstrate with a voice ID. So I'm going
301.4 -> to go back over to YouTube and copy James. Here we are. The voice ID for
307.34 -> James and back over to our program. I'll put that in instead of the name so you
313.94 -> can use either. Save that over in the terminal, run again. Very good. So those are
323.08 -> the two minimal things you need, the text and the voice. You can also add a
328.28 -> a model and the model can be monolingual, so by default English, or it can be multilingual.
335.16 -> And they have a few European voices I think at the moment, some others, I can't remember the full
341 -> list, but if you want to do something that's not in English, then you would add the model like this.
346.28 -> So a new line, "model =" and then it's "eleven_" So "monolingual" is the default
356.2 -> and it's "v1". If you wanted to do languages other than English, then you would change this to
361.8 -> "multilingual". That's it, very easy. If you're just dealing with English, you don't even need to say
366.76 -> this, so you can just ignore that whole line together. The next thing you might want to do
371.88 -> is save the file, not just listen to it, but save it, and we can easily do that. Instead of the "play"
377.32 -> method, we just use the "save" method, so let's try that. I'll hide that and write "save". And this takes
385.16 -> two arguments – the actual audio that we generated and then the file name, so a comma and we can call
391.64 -> it whatever you like – "audio.mp3". It's in MP3 format by the way, and that's it. Let's change
399 -> this message a bit shall we? "Hi, I'm from outer space." Why not? Alright, we'll save this and run again.
407.8 -> Okay, so there's nothing that we can hear there which is fine because all we did was save the file.
414.44 -> Well, let's look for it and play it.
417 -> So in my file list here, I've got the Python file I was working with,
419.84 -> and there is a newly generated "audio.mp3" file.
423.24 -> If we play it, it should be...
424.84 -> "Hi, I'm from outer space."
426.4 -> Hey, it's James saying he's from outer space!
429.08 -> "Hi, I'm from outer space."
431 -> And he sounds very excited about it.
433.72 -> Good. So there we go,
434.68 -> we've seen how to play audio and how to save audio.
439.12 -> There's another thing that we can do as well,
440.8 -> which is to tweak the settings a bit
442.56 -> to kind of control how the voice sounds.
445.44 -> First of all, I'm going to go back to just playing the audio rather than saving it as a file,
451.2 -> and then what we want to do is we want to change the voice ID here into a voice object.
457.76 -> So let's use a variable for that called "voice" and we need to prepare this variable before we
464.96 -> generate it. So the voice is going to equal, and this is something that is created by an "elevenlabs"
471.84 -> class called "Voice". So we need to start off with the "elevenlabs" module again. Within that there is
480.16 -> a "Voice" class like this. This has at least two arguments and the two that we need are "voice_id"
487.28 -> and "settings". I'm going to press Enter, create a bit of space there. So the "voice_id" and let's use what we used
496.24 -> before, James, so I'm going to paste that in. "settings" is the next one and this will be
505.76 -> another "elevenlabs" class called "VoiceSettings". So we can do the same thing again from the "elevenlabs"
512.8 -> module. Within that we've got "VoiceSettings". There are two settings we can control in here
520.16 -> so I'm going to press Enter again, make it nice and pretty, and the two settings are "stability"
525.6 -> and "similarity_boost". Let's start with "stability", and this can go up to 1 for a very stable voice,
534.32 -> which actually can sound a bit boring. And it can go down to 0, which is very unstable,
539.76 -> very expressive, almost emotional. Let's start with 1 and see what that sounds like. The next one
546.32 -> is "similarity_boost" and I have to say, I haven't really heard any difference when I've been changing
552.4 -> this. The default is 0.75. I think it has the biggest effect when you clone voices and you can
558.16 -> make the voice more similar with a higher number here but there might be a few weird sounds in it.
563.36 -> As I say, I haven't heard any difference. Keep it at 0.75 and I would recommend concentrating on
568.88 -> stability for the biggest effect. So let's try with stability 1 and see what James sounds like.
575.2 -> Saving that file, over to the terminal and running again.
578.96 -> "Hi, I'm from outer space."
581.84 -> OK, not so boring.
583.2 -> Yeah, kind of normal.
585.68 -> Let's try with stability of 0 and see what he sounds like.
589.12 -> Save again, run again.
591.56 -> "Hi, I'm from outer space."
594.32 -> Oh, OK. Yeah.
595.36 -> Are you really James?
596.32 -> Well, good for you.
597.64 -> OK, so he did sound a bit more expressive there.
600.76 -> It makes a big difference when you have a longer piece of text.
604.64 -> Speaking of longer pieces of text, this totally free way of doing it here is restricted to a
610.64 -> certain number of characters. Officially, I think it's supposed to be 2,500 characters,
615.52 -> although I've been getting warnings when I've gone over above 300 or so. So you will need to
620.64 -> get an API key if you want to continue using this or have long chunks of text. Let's finally look at
626.88 -> how to use an API key in this program.
630.12 -> Firstly, to get an API key,
633.46 -> you need to sign up for ElevenLabs, go to your Profile,
637.98 -> and then if you click the eye icon,
639.96 -> you can see it and then copy it
642.22 -> so we can use it in our program.
644.6 -> Once you've got your API key,
646.58 -> we'll add one more line to this program
648.94 -> and go back to the "elevenlabs" module.
651.86 -> Within that, there is a method called "set_api_key".
656.26 -> It takes one argument.
657.58 -> You can probably guess what it is.
659.02 -> It's your API key.
660.62 -> I'm not going to type mine directly here
662.82 -> because obviously everybody can see it,
664.46 -> and I recommend in general
665.74 -> not putting your API key directly in your program,
668.7 -> just in case you accidentally share it on GitHub
670.9 -> or make a video like me and put it on YouTube.
673.94 -> There is a better way, another way,
676.14 -> of including API keys in your programs
678.18 -> that other people can't see.
679.34 -> I've made another video about that
680.72 -> so I recommend checking that out
682.14 -> if you want to code like this with your API key.
685.7 -> Anyway, that is how you use the ElevenLabs API to create voices, to create text-to-speech.
691.9 -> The only thing left for me to say is…
693.78 -> "Thank you for watching!"

Source: https://www.youtube.com/watch?v=1t9FhxQcDiw