Simon: Open-Source Speech Recognition

Mittwoch, 24. Februar 2010

Benefit project

On the 1st of February 2010 the friendly society simon listens started to work on a new project. For the next one and a half years, the simon listens team will investigate ways and means to make the simon speech recognition solution even more usable – especially for the elderly.

Abstract:
With the help of verbal control provided by simon using terms of everyday language, useful scenarios and areas of application shall be created to enable an easy use of new communication technologies such as the internet, telephone and multimedia applications for elderly people. Moreover, additional security can be provided, for example, a reminder for the user to take a medication.

In the course of this project we will join forces with the Signal Processing and Speech Communication Laboratory of the Graz University of Technology, the HTBLA Kaindorf/Sulm, the Rehabilitation Clinic Maria-Theresia, the KFU Research Center for Austrian German and the Huminatis Graz to ensure that we have the necessary expertise to tackle such an ambitions project.

The solution created in this project will be released under the GPL license. All code will be freely available to the community.

Thanks to the generous support of the bmvit (federal ministry of transport, innovation and technology) of Austria and the FFG (Austrian Research Promotion Agency) for making this possible!

Dienstag, 23. Februar 2010

Model Compilation Adapter

In simon 0.2 we introduced some mechanisms to catch common errors during the compilation of the model and display nicer error messages to the user explaining ways to solve the issue manually. In simon 0.3 simon, however, simon will automatically repair some common mistakes without the user even noticing.

To explain what I am talking about, I first have to talk about simons architecture a bit so bear with me...

During normal operation, the simon client gathers the instructions (words, grammar, etc.) that will then be sent to simond. simond in turn compiles the model out of the given input files. To do that, simond first converts them to a format usable by the underlying tools (HTK, Julius). This conversion step was not needed in 0.2 because simon 0.2 only used the raw file formats of HTK / Julius. However, in simon 0.3 we need more control over the model and also want to give the user some advanced features that were not possible with just the information contained in those raw formats.

In simon 0.3 we introduced a new step between gathering the data and compiling it to a usable model: Adapting the input files.

This sounds like a boring but nescessairy conversion and indeed it is.

But what makes it interesting is that at the point of adaption we have all the input data that will be turned in to a mode in a format that is easily parsable. This means that it is an ideal place to do some last minute optimizations on the temporary files that are then used to generate the model.

The model adaption manager will for example automatically remove words from lexicon that have no training data associated. It will also clean the grammar of sentences that have no associated words. It will even remove samples containing words that are not in your dictionary. Basically, simon should be able to handle a lot of case that would cause an error in simon 0.2, automatically.

Sonntag, 24. Januar 2010

Git

Like all the cool kids, simon moved to git.

And so far, while the syntax is a bit strange at first it works really, really well.

The cheap branches are great and svn2git made the transition painless.

Repository URL (read-only):

git://speech2text.git.sourceforge.net/gitroot/speech2text/speech2text

The old svn repository is still available and up to date but if everything works well enough it will eventually be removed in the next couple of days.

Sonntag, 17. Januar 2010

Model adaption

Keeping in line with the last couple of blog posts that all were breakthroughs on their own, this one is definitely up there as well.

Since revision 1117 simon now supports to use static models or adapt speaker independent models to your own voice in addition to building a new, speaker dependent model from scratch (which is still the default obviously). This means that new users can set up a complete working speech recognition literally in seconds. Pick the scenarios you want, point simon to the voxforge speech model, press "Connect" and start talking.

Of course this only works if you have a fairly "standard" voice and the voxforge model is still not perfect. So if you want a little higher recognition rate go ahead, train a few samples and tell simon to adapt the voxforge base model with it. As little as one minute of speech will yield visible results (you will still need to install the HTK for this, tough).

The only user interaction needed is to click a radio button - simon will do all the work for you.

While I was at it I also improved the julius error reporting so that the recognition process now writes a log file (~/.kde/share/apps/simond/models/<user>/active/julius.log) so that you can easily debug low recognition rates, mic troubles etc. When the recognition fails completely, simon will display the log along with a short description of what simon thinks that happened.

Of course all of this is completely untested and will most likely contain bugs so try it at your own risk. By the way: Current trunk needs KDE 4.4 to compile.

Samstag, 26. Dezember 2009

SSC: Large scale sample acquisition

Two new applications have joined the simon application suite: ssc and sscd.

ssc stands for Simon Sample Collector and is specifially designed for large scale sample acquisition.

sscd is the central data server which it's data in a MySQL database (no graphical frontend).

ssc is the graphical client that records the samples, manages the users and institutions and uploads the data to the server.

We use the software to collect samples in various medical institution (rehabilitation clinics mostly at the moment) for further analysis. Every user of the software has a quite extensive profile which stores birth year, education, mothers tongue, diagnosis and other relevant factors that may influence the speech.

The system allows the user to create "institution" (the clinic for example) and associate users with zero or many institutions.

ssc users are not interchangable with simon users (more information) and there is no attempt to creating speech models with ssc / sscd. Those tools are simple data-acquisition tools and probably only useful if you are a speech researcher who wants to gather data from a lot of speakers. Then, however, it is a neat little utility that works very well for our uses already.

simon scenarios

First of all: Merry Christmas to everybody!

Some of you might still remember that back when simon 0.2 was released, I talked about modularizing the recognition into smaller "packages" or "scenarios". Well this features has been implemented now and is almost ready.

The idea is to provide pre-defined application packages (like default values on steroids) as well as the ability to create such packages and share them with the community easily. This should significantly improve our biggest weakness: The time it takes a new user to get started with simon.

To show you what I mean, I created a little screencast that shows the current development version of simon in action:

Mittwoch, 16. September 2009

Automatic BOMP import

Disclaimer: This Blog post is probably only interesting if you want to use simon for the German language.

If you do, then you most likely already know the problem with the BOMP dictionary. It is a great quality, huge dictionary that even contains terminal information. In short: It is the best shadow dictionary for simon there is - and I am not saying the best for the German language; it's the best. Period.

Only problem: The licence. It doesn't permit free distribution. Instead, you had to send an e-Mail to a specific address at the university of Bonn and wait for a reply that contained the dictionary as attachment. This process could take a couple of days.

So we contacted the team that is holding the copyright and they permitted us an exception in the licence. We are now allowed to distribute the dictionary to our users ourselves as long as we still gather their name and e-Mail addresses (for statistic purposes) and they still have to accept the BOMP licence terms. So we integrated this process directly into simon. When selecting to import a HADIFIX dictionary you can now select to manually specify a file or to download and import the HADIFIX BOMP automatically.