Simon: Open-Source Speech Recognition

Freitag, 28. August 2009

Calculator Plugin and Keyboard Plugin

Thanks to the "Österreichische Forschungsförderungsgesellschaft" (literal trans.: 'Austrian Researchfundingassociation') the SIMON listens team has been expanded with the two summer interns Mario Strametz and Dominik Neumeister.

After some general testing and getting to know the system, they are now working on two promising command plugins: A calculator plugin and a keyboard plugin.

The calculator plugin is a natural extension of the existing input-number-plugin.

As seen it is still quite basic but already usable to a certain extend. However, it is under heavy development and we expect first stable versions by the end of next week.

The calculator is - beside the obvious - also targeted to school kids doing their math homework so upon pressing ok it provides the option to not only write out the result but also the calculation leading up to it (e.g.: "1+1=2" instead of just "2"). The finished version will also include formatting options like formatting the output as an amount of money, etc.

At the same time, the two are working on a keyboard plugin (no screenshot there yet as development has just started). However, our "keyboard" will not only be a regular on screen keyboard.

The keyboard plugin will not have a fixed amount of fields (keys), nor will their values be fixed to that of a qwerty keyboard.

Instead, the user will be able configure them as he likes in configuration sets (sensible defaults will of course be provided) and even spread the keys out across multiple tabs.

While this just seems overly complicated on paper it makes advanced configurations possible with e.g. a text-snippet tab that combines his most often used text snippets or allows the user to add - for him - important special characters (e.g. Currency symbols for an accountant) right where he wants them.

I will update this blog as the development progresses so check back!

Publicity

Hi fellow readers! Long time no see!

As some of you might have seen, there was an article about simon on the dot. Thanks for Troy Unrau for making that happen!

The article spawned a lot of discussion and interested and several sites brought it up. Most notably the discussion on lwn focusing on the license issues. The article also hit digg (50 digs), osnews, several twitter/identi.ca feeds and a lot of blogs everywhere.

Of course this also showed on our download statistics. We had more downloads in the last week than we had in the whole month before that! The forum has also been noticeably busier than usual but the low number of support requests showed that the extensive documentation of simon 0.2 really helps a lot.

The simon homepage runs google analytics so there has been quite some interesting data about our (newly found) user base:

55% of all visitors were running GNU/Linux (Windows: 39%; Mac: 5%)

Our 5000 hits were spread out to 106 countries using 75 different languages; The most used languages were English (2500), German (1000), French (500), Chinese (200).

In the open source scene, firefox rules the browser battle (58%)

More people are using konqueror (9%) than Internet Explorer (7%) (of course this is because of the KDE-specific audience this month but I still found it interesting; konqueror was actually on 2nd place after firefox)

Donnerstag, 6. August 2009

sam

I already mentioned it in the last post: A new application has been added to the simon application suite: sam.

sam is targeted towards power users who want to tweak and improve their acoustic model manually to improve recognition rates even further.

sam will include a sophisticated testing framework to immediatly receive feedback on changes in the model configuration. In fact during optimizing models manually, I realized that IMHO a well working, automated model testing framework is the most essential part in manual optimization as it makes the impact of changes immediatly visisble.

In contrast to simon, sam will not hide any of the internal workings from the user (due to the different target group) so the logs of both the building and the testing of the model are displayed and the whole operation can be double-checked for errors or warnings.

An initial, working version is already available through SVN.

Selecting the input files:

Building the model:

Testing the model:

Test results:

As you can see, simon will run the recognition with the generated models on the trainingssamples to see if simon correctly recognizes their contents. The algorithm already recognizes and considers confidence scores of the recognition results which is why in the screenshot you can see the recognition rate of e.g. "NULL" not being 100% even tough every instance of it was recognized correctly (5/5).

Btw: This is a well trained, rather small model which really works very well in practice so don't be alarmed by the very high recognition rate...

Greetings,
Peter

Donnerstag, 30. Juli 2009

Look out - cool stuff coming your way!

Ok I have way to little time at the moment for simon development let alone regular blog updates.

However here is a quick overview of the latest updates:

simon can now import dictionaries to the active lexicon. While you obviously not want the whole BOMP or Voxforge Dictionary in your active dictionary it is a little step towards easy export and import of the speech model.

The URL to the BOMP has been corrected - they had moved.

simon can now import prompts files through the import training data wizard.

simon can now be launched through the ksimond context menu.

Some phoneme segmentation issues have been fixed.

And finally: A new application has been added to the simon suite: "sam".

sam stands for simon acoustic modeller and is an application targeted towards power users to tweak and test their speech models. Of course sam is nowhere near usable right now but the first lines of code have been written so I thought I should mention it here.

Montag, 20. Juli 2009

simon 0.3: One Week In

About a week ago, I announced the simon 0.2 stable release. Fueled by this milestone and a lot of positive feedback all around, simon 0.3 development has already started ... and is already showing results!

I'll start small: simon now supports a "Power Training" mode which starts the recording immediatly as the text to say is shown. The recording is then, upon preceding to the next page, automatically stopped, saved and the next one starts. This simple change really makes training of large texts a lot faster!

Ok but that alone is not blog worthy, right? Right! One of the most awaited features has made it's appearance: Confidence scores.

The recognition server now provides information about how confident it was on the recognition result

Moreover it also not only provides simon with the most likely result but with the ten most probable ones. simon now ranks them based on the recognition confidence and can ignore them if the recognition was just not sure enough (with a configurable threshold).

Now the cool part: If two results (or more) are very likely and simon can not determine which one you meant, simon will simply display a nice list from which you can select (of course with your voice) what you meant.

This looks like this:

The feature is already quite stable and works well in combination with other plugins. There are of course safeguards in place to prevent recursive "did-you-mean-popups".

Of course the confidence scores of the results are also relayed to the plugins and if they want to they can even retrieve the whole list of recognition results including the phonetic transcription of the result. This brings even more flexibility to the plugin developers without making plugin development more complicated (the base classes have appropriate implementations that you don't need to overwrite if you don't want the additional information).

If you are running a svn snapshot and are upgrading: You will need to manually copy the julius.jconf file from `kde4-config --prefix`/share/apps/simond/default.jconf to ~/.kde/share/apps/simond/models//active/julius.jconf (overwriting the old one) as simon(d) will not do that automatically.

Freitag, 10. Juli 2009

simon 0.2 released

Almost three years after the start of the development, the first stable version of the open source speech recognition suite simon has finally been released: simon 0.2 is ready for download.

With simon you can control your computer with your voice. You can open programs, URLs, type configurable text snippets, simulate shortcuts, control the mouse and more.

Because of simons architecture, it is not bound to a specific language and can be used with any dialect. It is also specifically designed to handle speech impairments which makes simon a viable alternative to conventional input methods for physically disabled people.

simon 0.2 is based off of the open source Julius speech recognition engine and the HTK (which - due to licensing restrictions - has to be installed seperately).

In comparison to the 0.1 series that never made it past alpha quality, simon 0.2 does not only bring stability improvements.

simon 0.2 is now based on KDE 4 and thus perfectly integrates in every KDE setup. This move also brings KIO to simon which allows for network transparency, transparent compression and more.

The seperate Juliusd application has been discontinued and replaced by the much advanced simond which features network audio streaming, centralized model management with automatic backups and more. simond is a command line application which makes it easy to set up a central simon server without the heavy X dependencies. For users of graphical environments the front-end ksimond has been introduced.

Moreover, the command architecture has been completely overhauled and now uses a much more flexible plugin architecture and supports individual triggers per plugin. New plugins include the list plugin (which can be used to display options), the composite plugin (similar to "macros"), a number input plugin and an artificial intelligence. Combined with the improved commands of previous simon versions this makes a total of 10 command plugins out of the box!

The import of the shadow dictionary now also supports PLS and SPHINX dictionaries which opens the door for dictionaries like the German GPL dictionary from Voxforge.

Because of the growing user base simon has been translated to English, German and French and also partly to Spanish, Dutch and Czech.

simon 0.2 is also the first version of simon ever to ship complete with an extensive user manual - available in English and German.

Next to the source package, the release is also available in convenient binary packages for 32-bit and 64-bit users of both GNU/Linux (Ubuntu and OpenSUSE) as well as Microsoft Windows operating systems and can be downloaded from the sourceforge project page.

Donnerstag, 9. Juli 2009

Two Final Issues

The last round of testing of the simon 0.2 codebase only resulted in two found bugs.

The first one is quite annoying in that it essentially limits simon functionality. The HTK does not like words that start with the character "'". That makes "words" like "'em" (short version of "them") fail during the model compilation with a confusing error message.

As I really don't want to mess with the wordlist code (we would have to escape special characters under certain conditions) so late in the development process, I delayed that fix for the 0.3 series. In the mean time just stay away from 's at the beginning of the word, please. Words like "that's" are no problem, tough (as the "'" is not at the beginning of the word).

The second bug was a rather strange one: Some people reported that over time, the recognition became slower and slower for them. All of the users that reported that bug were using Windows. During testing, I found out that using the pseudo device called "SoundMapper" (or similar) caused this - when using the hardware device everything was working. So if you experience this issue, please check that you use the appropriate hardware device instead of meta-devices.

For users that don't read the blog, I added entries for both problems in the troubleshooting guide on our wiki.

And yes, I know that these are hardly the last two bugs in the 0.2 code - but they are the last to be fixed before the stable release which makes them kinda special ... for me anyways :)