Free Text-to-speech technologies

Here is a short review of freely available (open source or not) « text-to-speech » technologies. I digged in this topic because I wanted to check whether anyone invented some software package turning my RSS aggregator into a personalized radio. More precisely, while I am doing some other task (feeding one of my kids, brushing my teeth, having my breakfast, …) I would like to be able to check my favorite blogs for news without having to read stuff. My conclusion : two packages come near to the expected result.

Regarding features, the most advanced one is NewsAloud from nextup.com. It acts as a simple and limited news aggregator combined with a text-to-speech engine that reads selected newsfeeds loud. But it still lacks some important features (loading my OPML subscription file so that I don’t have to enter my favorites RSS feeds one by one, displaying a scrolling text as it is read, …) and worst : it is NOT open source.

The second nice-looking package going in the expected direction is just a nice hack called BlogTalker and enabling any IBlogExtension-compatible .Net aggregator (RSSBandit, NewsGator…) to read any blog entry. But it is just a proof-of-concept since it cannot be setup so that it reads a whole newsfeed nor any set of newsfeeds. It seems to me that adding TTS abilities to existing news aggregators is the way to go (compared to NewsAloud which is coming from TTS technologies and trying to build a news aggregator from there). And BlogTalker passes successfully the « is it open source ? » test.

Both packages depend on third party text-to-speech engines (the « voices » you install on your system). As such, they are dependent on the quality of the underlying TTS engine. For example, if you are a Windows user, you can freely use some Microsoft voices (Mike, Mary, Sam, robot voices, …) or Lernout & Hauspie voices or many other freely available TTS engines that support the Microsoft Speech API version 4 or 5 (or above ?). The problem is that these voices do not sound good enough to me. As a native French speaker, I am comfortable with the LH Pierre or LH Veronique French voices even if they still sound like automat voices. But for listening to English newsfeeds on the long run, the MS, LH or other voices are not good enough. Fortunately, AT&T invented its « natural voices » which sound extremely … natural according to the samples provided online. Unfortunately, you have to purchase them. I will wait for this new kind of natural voices to become commoditized.

Meanwhile, I have to admit that TTS-enabled news aggregators are not ready for end-users. You can assemble a nice proof-of-concept but the quality is still lacking with the above three issues : aggregators are not fully mature (from a usability point-of-view), high-quality TTS engines are still rare, nobody has achieved to integrate them well one with the other yet. With the maturation of audio streaming technologies, I expect some hacker some day to TTS-enable my favorite CMS : Plone. With the help some of the Plone aggregation modules (CMFFeed, CMFSin, …), it would be able to stream personalized audio newsfeed directly to WinAmp… Does it sound like a dream ? Not sure…

During my tests, I encountered several other TTS utilities that are open source (or free or included in Windows) :

  • Windows Narrator is a nice feature that reads any Windows message box for more accessibility. It seems to be bundled in all the recent Windows releases. Windows TTS features are also delivered with the help of the friendly-but-useless Microsoft Agents.
  • Speakerdaemon‘s concept is simple : it monitors any set of local files or URLs and it speaks a predefined message at any change in the local or remote resource (« Your favorite weblog has been updated ! »). Too bad it cannot read the content or excerpts (think regular expressions) of these resources.
  • SayzMe sits in your icon tray and reads any text that is pasted by Windows into the clipboard. Limited but easy.
  • Clip2Speech offer the same simple set of features as SayzMe plus it allows you to convert text to .WAV files.
  • Voxx Open Source is somewhat ambitious. It offers both TTS features (read any highlighted text when you hit Ctrl-3, read message boxes, read any text file, convert text to .WAV or .MP3, …) and speech recognition. Once again, it is « just » a packaging (front-end) of third party speech recognition engines. As such, it uses by default Microsoft Speech recognizer which is not available in French (but in U.S. English, Chinese and Japanese if I remember properly). I have still to try it in its U.S. English with a headset microphone since my laptop microphones catches too much noise for it to be usable. The speech recognition feature allows the user to dictate a text or to command Voxx or Windows via voice. So it is an open source competitor to IBM ViaVoice or ScanSoft Dragon Naturally Speaking.
  • PhantomSpeech is middleware that plugs into TTS engines and allows application developers to add TTS capabilities to their applications. It is said to be distributed with addins for Office 2000. Indeed I could display a PhantomSpeech toolbar in Word 2003. It could read a text but only using the female Microsoft voice. And this toolbar had unexpected behaviors and errors within Office. Not reliable as a front-end application. Anyway, the use and configuration of speech engines is really a mess. The result is that PhantomSpeech does not look as really intended for end-users but maybe just for developers.
  • CHIPSpeaking is a nice utility for « the vocally disabled » (people who cannot speak). It allows the user to dictate sentences with a virtual keyboard and to record predefined sentences that are read aloud with one click.
  • ReadPlease (the free version) is just a nice simple text reader made by developers who played too much with Warcraft (click on the faces and you’ll see why). The word being read is highlighted. Simple options allow users to change the voices with one click (which is cool when you switch between several languages) or to customize the size of the text, …
  • Spacejock’s yRead is another text reader that includes a pronunciation editor (read km as « kilometers » please) and also allows the download of public domain texts available from Project Gutenberg. The phrase being read is highlighted, you can easily switch from one voice (and language) to another. Too bad its Window always sucks the focus when it reads a new phrase.
  • For the *nix-inclined people, I should also mention the famous Festival suite of TTS components (Festival, FLite, Festvox). For the java-inclined people, don’t miss the FreeTTS engine (that is based on Festival Lite !) and the associated end-user applications. An example of an end-user application based on Festival is the CMU Communicator, see its sample conversation as a demo.
  • Last but not least, do not miss Euler and the underlying MBROLA package. Euler is a simple open source reading machine based on MBROLA that implements a huge number of voices in many many languages plus these voices can include very natural intonations and vocal stresses. Euler + MBROLA were produces by an academic research program. They are free for non-commercial use and their source code is available (BTW, it is said that MBROLA could not be distributed under an open source license because of a France Telecom software patent !). Beware : the installation of MBROLA may be quite tricky. First, download the MBROLATools Windows binaries package, download patch #1 and read the instructions included, (I had problems when trying patch #2 so I did not use it), download as many MBROLA voices as you want (wow ! that many languages supported !), then download Win Euler (or any other MBROLA compatible TTS engine from third parties ; note that MBROLA is supported by festival).

Further ranting about TTS engines : I feel like the ecosystem of speech engines is really not mature enough. Sure several vendors provide speech engines. But they are not uniformly supported by the O.S.. There was a Microsoft S.A.P.I. version 4 (SDK available here) which is now version 5.1 but people even mention v.6 (included in Office 2003 U.S. ?) and a v.7 to be included in LongHorn (note that there also is another TTS API : the Java Speech API 1.0 – JSAPI– as implemented by FreeTTS… bridgeable with MS SAPI ?). But as any Microsoft standard, these API are … not that standardized (i.e. they seem to be Microsoft-specific). Even worst : they seem rather unstable since the installation of various speech engines give strange results : some software detects most of the installed TTS engines, other only detect SOME of the SAPI v.4 TTS engines, some other display a mix of some of your SAPI4 TTS engines and some of your SAPI5 TTS engines…. In order to be able to use SAPI5 engines and control panel I had to install Microsoft Reader and to TTS-enable it (additional download). What a mess ! The result is that you cannot easily control which voices you will be using on your computer (which one will be supported or not ?). As a further example, I could not install and use the free CMU Kal Diphone voice and I still don’t know why. Is it the API fault ? the engine’s fault ? Don’t know… Last remark on this point : Festival seems to be the main open source stream in the field of TTS technologies but it does not seem to be fully mature ; and the end-user applications based on it seem to be quite rare. Let’s wait some more years before it becomes a mainstream, user-friendly and free technology.

More precisely, the TTS puzzle seems to be made with the following parts :

  • a TTS engine made with three parts :
    • a text processing system that takes a text as input and produces phonetic and prosodic (duration of phonemes and a piecewise linear description of pitch) commands
    • a speech synthesizer that transforms phonemes plus a prosody (think « speech melody ») into a speech
    • a « voice » that is a reference database that allows the speech to be synthesized according to the tone, characteristics and accent of a given voice
  • an A.P.I. that hosts this engine and publishes its features toward end-user applications and may provide some general features such as a control panel
  • an end-user application (a reading machine, a file monitor with audio alerts, a audio news aggregator, …) that invokes the dedicated speech API

You can get more detailed information from the MRBOLA project site.

These were my notes and ranting about text-to-speech technologies. Please drop me a comment if you feel like my explanations were wrong or biased as I don’t know this field in details and I may have made a lot of errors here. Thanks !

69 réflexions au sujet de « Free Text-to-speech technologies »

  1. Sig

    A reader asks me by email (hey, you should post here instead ! even in French !) how to generate MP3 or WAV files on the fly server-side from the content of a database, in a *AMP (Apache + MySQL + PHP) environment. The MP3 or WAV files would then be played from a SWF animation by example. Here is my answer :

    Unfortunately, I did not notice any easy-to-user-out-of-the-box TTS PHP framework. I assume you run with Linux. Your solution must be Festival. You need to install Festival on your server. Then you just call Festival from PHP (or a cronned shell script ?) with a shell command line. See http://www.cstr.ed.ac.uk/projects/festival/manual/festival_28.html#SEC128 for help. There is a Festival « text2wav » command available. Note that you will certainly need some tweaking with Festival in order to use a MBROLA database corresponding to your language.

    Also note that if your prefer Java, FreeTTS looks like a nice Java TTS package.

    Please drop me a mail as soon as your server is up and running so that I can test it !

  2. kp

    Hi,

    Is there any free PHP based application (which is a open source) which can convert text to speech on a website? If so please let me know.

    Thanks,
    KP

  3. Sig

    KP, as I said in the comment above. I don’t know any pure PHP application that would convert text to speech on a website. In order to convert text to speech on a PHP website, you will have to install Festival on the hosting system (or have it installed by your admin) and then you should be able to call the Festival text2wav utility from a PHP script. This utility will convert your text into .WAV files.

  4. issac

    i need help to write Program for convert farsi text to speech.

  5. Sig

    In order to write a program that converts text into speech, you first have to choose a text-to-speech engine and feed it with a voice implementing the pronounciation of the language you want to deal with (farsi).
    There is an iranian voice available with the MBROLA engine. It means you can download and use the MBROLA engine and utilities to convert some farsi text into speech (it is more convenient to use the MBROLA engine with the help of the Euler application). Then you may be able to do this programmatically (with Festival ?). Or you may ask an expert. Good luck and keep us informed of your success or failure !

  6. A mir Javan

    I want to use a program to speech text files witten in persian(farsi), i would be so much thankfull if somebody could help me.
    thanks a lot!

    Amir Javan

  7. Eleanor

    Hi,
    I too am looking for a Farsi TTS program. I followed your steps and on my computer MBROLA works, but Euler does not. I am using XP. When I try to use Euler, it just says « CSystem initialisation failed: section [SPEAKER] not found ». I am sorry to bother you, but do you know what the problem might be? Apart from MBROLA, I cannot find any other Farsi (Persian) TTS program. I would be really grateful for any help!

    Thank you,

    Eleanor

  8. Sig

    Hi Eleanor,

    There seems to be a whole bunch of people trying to get Farsi Text-to-Speech capabilities ! Anyway…

    The error message you got is documented as a symptom of an improper installation of MBROLA tools (see also that). So I suggest that you check your MBROLA installation. I read that you said that MBROLA works on your computer but maybe there is still a component missing. If ever it still does not work after this, I suggest that you get in touch by e-mail with the Euler team and explain them your problem. I am quite sure they will help you, especially if you mention the fact that many people are trying here to use their program for Farsi text-to-speech.
    Please also consider posting here a follow-up of your experiment. Please tell us if it works and how you fixed your installation. It will certainly be valuable for other readers.

    Oh, I was about to forget your last point : unfortunately, I do not know any other free Farsi TTS package.

  9. Max

    Thank you everyone for those cool resources, i have started developing TTS PHP script for me and my client.

  10. Bill

    I’ve been searching all over for instructions on how to add voices to the two supplied with MSReader (LHMichael and LHMichelle). You are the closest site so far to even understanding the nature of the problem. There are other sites that have supplied additionalvoices called « speech text voices » to the Microsoft SAPI…Any idea on how it’s done or where to look or ask?

  11. Bill Raim

    I’m just learning about text to speech and have come across SAPI 5.1
    with L&H (Lernout & Hauspie) voices Mary, Mike and Sam. In my explorations
    this site has the most reliable files
    http://www.geocities.com/lhcsoft2004/Product.htm

    I’m looking for more – free – SAPI 5.1 voices.
    Can anyone point me to some?

  12. Sig Auteur de l’article

    If you’ve got many some bucks to waste spend, you should really try AT&T Natural voices. Their quality is really astounding.
    Regarding free voices, I don’t know many more of them.

  13. Rita Lee

    we are a Chinese speech technology company, i am looking for Farsi TTS engine, do you have?

  14. Sig Auteur de l’article

    No. I don’t. I’m really impressed by the number of people interested in Farsi TTS engines ! Do governments have such an urgent need for automating the analysis of phone conversations wiretaped from Iran or what ? (this was my conspiracy-moment-of-the-year). Why are you so interested in Farsi TTS engines ?

  15. Anonyme

    Greetings Very good web site. I loved it. Found invaluable information. Just what I was looking for :-)

  16. Anonyme

    Thank you for opening a wonderfully new sight..I wish you the best of luck with your new venture.

  17. Raghu

    I am having problems in creating a sapi object from a PHP script.
    Here is the sample:
    Volume=100;
    echo $voiceobject->Volume;
    $returntype=$voiceobject->Speak(« hi this is speech test »);
    if($returntype==0)
    {
    echo  » no sound played »;
    }
    else
    {
    echo « Sounded played »;
    }

    //com_set ( resource com_object, string property, mixed value)
    ?>

  18. RamyaLN.

    Sir, i am in need of a java code that would convert the given text into a .wav file. presently i am using the freetts for the text to speech conversion but not knowing how to convert the text into a .wav file. could u please suggest some ideas regarding the same.

    thanks and regards,
    RamyaLN.

  19. Catalin

    Is Festival compatible with SAPI4 and SAPI5? Are voices in Festival and Cloudgarden JSAPI compatible?

  20. anonymous

    hello
    i m need of the code to convert a text file into a .wav file using freetts….could u help me out with it…
    thanks in advance and regards.

  21. Sig Auteur de l’article

    Raghu, Im sorry but my PHP skills are really weak. But maybe some other read can help ?

  22. Sig Auteur de l’article

    RamyaLN,

    The option for outputting a wav file from freetts is « dumpAudio ». The command line would be something like this :

    java -Xmx128m -jar freetts.jar -file input.txt -dumpAudio output.wav

    Then you can use an MP3 encoder like LAME if you need.

    Tell me if ever you find nice open source voices for FreeTTS.

  23. Sig Auteur de l’article

    Catalin,

    In my understanding of JSAPI (which understanding is quite limited), the voices themselves are not the components that have or don’t have a JSAPI compatibility. The engine that uses the voice can or cannot be JSAPI compatible. More precisely, the engine itself may or may not implement the JSAPI. Same is true for MS SAPI4 and SAPI5.

    Now regarding Festival, it does not implement any of these APIs as far as I know. Regarding Cloudgarden, as far as I understand its main feature is that it implements the Java Speech API (JSAPI) and encapsulate MS SAPI4 or MS SAPI5 compatible speech engines.

    Note that you can also import some Festival voices into FreeTTS and FreeTTS implements a big part of JSAPI. It means that you can access Festival voices from JSAPI through FreeTTS.

  24. Xavier

    Salut Sig,
    D’abord merci pour toutes ces infos. Je suis en train de développer un reader de news en Java, avec la possibilité d’activer le TTS. J’utilise FreeTTS avec les voix mbrola, malheureusement il n’y a pas de voix françaises par défaut (uniquement 3 voix US). Il semble qu’il faille recompiler FreeTTS (avec le code Java qui va bien) pour pouvoir utiliser d’autres voix de mbrola (et plus précisement les voix françaises).
    Sais tu ce qu’il faut faire exactement ? L’as tu déjà fait ? Si oui le code m’intéresse. Sinon sais tu où je peux trouver des infos ?

    Merci d’avance pour ta réponse.

    Xavier

  25. Sig Auteur de l’article

    Xavier, j’ai constaté comme toi qu’on ne peut pas importer les voix françaises MBROLA dans FreeTTS sans modifier le code de FreeTTS de manière significative. Je ne l’ai pas fait parce que je ne sais pas le faire et je pense qu’il faudrait passer pas mal de temps pour comprendre le code de FreeTTS avant de pouvoir le modifier. De plus, les voix MBROLA ont une contrainte importante : on ne peut les utiliser qu’à des fins personnelles ou de recherche non commerciale (cf. la licence).

    Pour moi, les infos sont avant tout dans le code source de FreeTTS. Et peut-être ici. Il y a aussi des infos à piocher du côté des forums de FreeTTS. On y lit par exemple qu’il manque à FreeTTS un gros morceau pour pouvoir parler français : il ne sait pas analyser la grammaire et la phonétique française. D’autres personnes ont essayé d’ajouter d’autres langues à FreeTTS mais s’y sont peut-être cassé les dents. Il faudra eur demander.

    Le travail fait autour de FranFest, la tentative d’adaptation française de Festival, peut également t’aider.

    A noter également : pour faire du TTS avec les voix MBROLA françaises, il y aussi LIA_PHONE et TTS-French en Perl. C’est parce que ces voix ne sont pas open source que beaucoup ont dû renoncer…

    Lliaphon (Light-Lia-Phon) est-il la meilleure solution pour du TTS libre en français ?

  26. SoundStudents

    Pourquoi IBM a-t-il retirée son SDK ?
    Qui peut nous fournir cet IBMJS (IBM SDK) , meme pour une version d’évolution ?
    Merci d’avance .

  27. gazab

    Hey.
    Great info. But do you know of any windows program that reads what’s supplied through the command line? I want to no fancy dialogs or anything like that. Just a small program that’s configurable through the command line.
    Thanks.

  28. naresh

    sir,
    i want an overview about IVRS , and how it can be done using java.

    how to proceed in coding is my problem?
    Please guide.

  29. Sig Auteur de l’article

    SoundStudents: je ne sais pas.

    Gazab: you can do this at least with Festival and from FreeTTS but I assume the setup procedure may be too complicated for you. BTW, you can also do this simply if you use the Microsoft Speech API from any proper wrapper software (from your favorite language). In other words: I don’t have in mind any « simple » command-line program that does it « out of the box ». But I think it is probably somewhat easy to program that.

    Naresh: I don’t know what IVRS is. I assume it is Interactive Voice Response System? In my opinion, the most difficult problem would be the speech recognition part. But I did not give a look into open source speech recognition engines. This is another story.

  30. Denis

    Need help!
    I find TTS SAPI4 engine on source. Only interface example, how to regist.
    I read it in SAPI SDK 4, it is not work. I wont writ free open source Delphi TTS Engine

  31. Sunny Night

    We are trying to create an all lesbian radio, whose goal will be that of counteract this paranoia world male dominated society with a movement of good willed women for love and understanding. We are interested in how texts can be turned into sounds because, as you said, sometimes we have to do chores and do many things at the same time and reading blogs are difficult sometimes, and listening to them could simplify things, we women sometimes looke like octopus at home and at work.

  32. Ping : Keks? » Blog Archiv » Herumspielen mit Sprachausgabe…

  33. gus_

    I’m searching for some webservice providing tts, do you know any?

  34. Sig Auteur de l’article

    I don’t know any. But I’am quite sure there are many of these.

  35. Tommy

    here you can find how text-to-speech server side synthesis (php+apache+tts) works.

  36. Craig

    Hi – Great resources here, but just wondering if anyone did find a LAMP solution? Thanks!

  37. ashish

    hello i tell me java code which can save text as a wev file . It is possible through dos prompt by following command
    java -jar lib/freetts.jar -dumpAudio hello.wav -text Hello World
    but i want to do it through java code ..how is it possible plz tell me……..

  38. ashish

    hello friend i am working text to voice conversion through freetts and jsapi.
    now i want to save my text as a wev file i can do it through dos prompt

    java -jar lib/freetts.jar -dumpAudio hello.wav -text Hello World
    but i want to do it through java code.
    plz help me i really need it.
    my mail id is
    ashishjain.mits@gmail.com

  39. Julio Costa

    Hello, I have written a speech application using selectable SAPI 4 voices which are of varying quality, but at least free. One issue I remember having is that the order of installation makes a difference in whether the voices work or not. I am happy to say that the application is now working perfectly. Taking into consideration the price of commercially available voices, and the size of the required software, the SAPI4 English (American) speech engines were definitely suitable.

  40. saket

    hey

    is there any symbian based tts engine available

    am trying to develop an sms reader on symbian based phones

  41. maria

    Hi Max!, Plis can you help me, I really need the TTS PHP script, but i don´t have a clue of how to call festival from the script and how to convert the text from my webpage. HELP plis!!

  42. maria

    i need help for creating the script in php that calls FESTIVAL.

Les commentaires sont fermés.