Simon: Open-Source Speech Recognition: 2010

Dienstag, 21. Dezember 2010

simon at the CeBIT OpenSource 2011

Thanks to the generous support of the Linux New Media AG, the simon listens e.V. will have a stand at this years CeBIT OpenSource event.

The CeBIT is the biggest IT event worldwide with more than 400 thousand visitors last year so you can image that I'm pretty excited about this :)

I'm definitely looking forward to seeing some KDE folk there as well!

On a related note: Does anyone know a cheap place to stay at during the event? It seems like the superlatives of the event sadly extend to the hotel rates during the exhibition...

Mittwoch, 8. Dezember 2010

Robotics: Research Laboratories Tour

A couple of days ago, we received a rather intriguing mail: Our project partner on the current ECHORD EU-Project told us that they would, in the process of getting an overview of current developments, undertake a tour through the most promising robotics research laboratories in America and Asia.

Now, I'm not exactly an expert when it comes to robotics but maybe you are?

Maybe you know a really great lab that works on cutting edge technologies related to robotics (especially human / robot interaction)? Maybe you even work at one?

If so, please post in the comment section or simply drop us a line at office (ate) simon-listens.org. Thanks!

Donnerstag, 25. November 2010

CeBIT Open Source 2011

Just a tiny update: This years CeBIT again contains a special section called "CeBIT Open Source". This dedicated section contains funded exhibition stands for selected open source projects.

I have just sent out the application form for a simon stand at the CeBIT 2011 :)

Whish us luck!

Dienstag, 23. November 2010

Integrating speech recognition with other applications

As many of you will probably already know, we pay a lot of attention to make speech recognition actually usable by integrating it with existing applications. We do this by simulating conventional interaction patterns (mainly mouse and keyboard) through our command infrastructure.

simon 0.4, however, will also allow application developers to use the speech recognition much more effectively by providing plugins to call DBus and JSON functions.

If the application to be controlled has either of those interfaces, you can utilize these new command plugins to write simon scenarios that call methods in the application through the IPC layer. This way you can directly execute code with voice commands which makes the system much more robust and powerful than, for example, using global shortcuts for the same purpose.

Moreover, simon 0.4 provides a dbus interface to allow third party applications to execute simon commands directly as well.

Sonntag, 7. November 2010

How simon learned to talk

Finally I find the time for a long overdue Blog update :). I already promised this in September when I blogged about the dialog system but I want to write a bit about simons text to speech infrastructure.

Because the next version of simon will be able to interact with the user through dialogs, we wanted to enable simon to actually "talk" with the user through the means of text to speech systems.

Of course we didn't reinvent the wheel but rather looked around at available open source solutions. We needed it to be cross plattform and work at least with English, German and Italian.

Naturally, Jovie (formerly KTTSD, KDEs text to speech system) is the obvious choice but it is not yet cross plattform as it uses speech dispatcher which only works on Linux. Also, it wasn't very stable when I tried it and had quite a few rough edges and missing features.

Furthermore the best (open) German voices I could find where HTS voices developed with and for the OpenMARY framework. They should theoretically also work with festival so they could be used with Jovie as well if someone wrote a festival configuration set for it. OpenMARY is cross plattform and provides very high quality synthesis but is a very big and heavy Java dependency which needs a lot of resources and is quite slow - even on current hardware (synthesizing a paragraph of text takes around 10 seconds on a Nettop).

So we decided to do what we always do and leave the final choice to the end user:

simons TTS framework now allows you to use Jovie (default), a generic webservice (like OpenMARY) or to record sound snippets yourself.

The last option is especially helpful if you are dealing with languages where no good open voices exist yet or your users who have trouble understanding them.

Simply create a new TTS set for your speaker (the one recording the sound bytes) and record the needed texts with him / her. When recording texts, simon will show you a list of recently synthesized texts so you can easily record whole dialogs quite quickly. Instead of using the Jovie or OpenMARY to synthesize the text, simon will then play back these recordings.

These TTS sets can be exported and imported so you can share your sound snippets with others - for example accompanying the scenario containing the dialog which uses them.

Multiple TTS backends can be used simultaniously which means that you can use pre-recorded sound bytes primarily but fall back to a TTS system for dialog paths you have not (yet) recorded.

You can find an online demonstration of the OpenMARY voices on their homeage and a demonstration of simons dialog system using Jovie on youtube.

Sonntag, 24. Oktober 2010

openSUSE conference 2010

This years openSUSE conference has sadly already ended. It was a nice event with lots of cool people.

As planned we arrived just in time for the conference party which of course was a lot of fun - I even got a genuine Jos Hug (tm).

People tried to steal the big plush Geekos all night long. The above attempt was - surprisingly - not successful :)

Altough we were only there for a couple of days, we got to see a bit of Nürnberg and it's a very nice city. Especially the K4 "Kulturzentrum" (literally translated: "centrum of culture") which hosted fridays open movie night was very interesting.

The simon talk yesterday was also very well received (we actually took about twice as long than scheduled because apparently nobody needed the room) and quite some people jotted down the contact details. Hopefully I'll be hearing from you soon!

Mittwoch, 6. Oktober 2010

simon at the openSUSE conference

Just a quick update: I'm proud to say that I will give a talk about simon at this years openSUSE conference (20.-23. October in Nürnberg).

To make up for being the only simon listens guy at Akademy, I'm going to bring a colleague of the simon listens e.V. this time: Mathias Stieger.

The talk is scheduled for Saturday but we'll try to make it there as soon as possible - probably Thursday. After all we're not going to miss the party, right?

See you all in Nürnberg!

Mittwoch, 29. September 2010

Monolog++

simon 0.3.0 has been released about two weeks ago and this means that even tough temperatures outside disagree it's once again summer in trunk!

Our newest addition is actually one that has been in the works for quite some time (in a separate branch) and presents itself to the user as a new plugin: The Dialog plugin.

While simon 0.3.0 is ideally suited to silently execute the commands you tell him, the next version of simon will talk back.

The basic idea is quite simple: The dialog system lets the user define an arbitrary amount of states which each has some transitions to move on to other states of the dialog. Each transition can also execute other simon commands if configured to do so.

An example use case could look like this:
Every day at 10 am the system displays a dialog that asks the user if he has already taken his medication. Yes -> "Great!"; No -> "Do you need help?" -> etc.

But you could also create quite complex menus like this:
You: "Computer!"
Computer: "Hi! What do you want to do? Say any of the following options: Read e-Mails; Browse the web; Check calendar; Close"
You: "Check calendar"
Computer: "Alright. These are your upcoming events: Birthday at Susies Place in Graz"
You: "Where am I?"
Computer: "You are in Graz, Austria."
You: "Thanks"

Sounds ridiculous, right? Well, let's look at this example in a little bit more detail...

States, Options and Transitions
The above dialog is a quite simple state based dialog. You can see three states: "Welcome", "Calendar" and "Location". We can define them in the dialog configuration.

You can see that dialog options to continue to other states can be added there as well. The text of the state actually goes through a templating engine so you can define paramters for that in the "Template options" page.

But more interesting is probably the "Bound values" page. There you can define variables and bind them to values. Those values can be static, determined at runtime through QtScript (Javascript) or values from plasma dataengines.

For example you can bind $currentTime$ to the Local/Time of the date and time plasma data engine. And because there already lots of great plasma data engines this means that the dialog system is already quite powerful.

TTS
Remember that I said that simon will talk back to you in simon 0.4? Yes we also have an all new TTS layer but I'll cover this in a separate blog post as this one is already too long :)

Demo
So to show off the current state of development, I created a very short demo video displaying the dialog above. While the code is not production ready, I didn't cheat: both the upcoming events and the location is determined dynamically, at runtime, through plasma data engines (upcoming events use the calendar data engine to get data from your Akonadi calendar).

For RRS readers here a direct link.

Sonntag, 19. September 2010

simon at the AAL Forum 2010: Again

The whole simon listens team attended this years AAL forum in Odense, Denmark.

The AAL Forum is a plattform for projects of the ambient assited living joint program of the European Union and related projects. After having attended the Akademy this year it was quite interesting to see the other side of software development with almost all the projects there being quite well funded :).

Despite the very steep attendance fee (€ 450) the exhibition was quite active. In just three days we collected more than 30 business cards of interested people - many of them looking for project partners for their next projects.

All in all it was very interesting so to see related projects, discovering similarities and potential synergies. Cooperations across projects - even in the same call - are still far too seldom in my opinion.

Quite some people were surprised to find out that simon is open source and completely free ("Where are the hidden costs?") so we also got to introduce some people to the concept of free software.

On Thursday we we then got to see Dj Ruth Flowers at the networking dinner. And it was just awesome seeing so many suits dance :P.

I really wouldn't have thought that such a formal event could be turned around into a wild party just with a good DJ. Suffices to say: The booths were quite empty the following day.

We then spent the last evening in Copenhagen before flying home which as it turns out is a great city - and they have great cocktail bars as well :)

Montag, 13. September 2010

Application centric speech recognition for your desktop: simon 0.3.0 released

The new version 0.3.0 of the open source speech recognition simon has been released and boasts the all new scenario system allowing you to build your own customized speech recognition system with just a few mouseclicks.

With simon you can control your computer with your voice. You can open programs, URLs, type configurable text snippets, simulate shortcuts, control the mouse and keyboard and much more.

Because of simons architecture, it is not bound to a specific language and can be used with any dialect. It is also specifically designed to handle speech impairments which makes simon a viable alternative to conventional input methods especially for physically disabled people and senior citizens.

simon is based off the open source large vocabulary continuous speech recognition engine Julius.

New in simon 0.3

simon 0.3 introduces an application centric approach to speech recognition by using packaged use cases of the speech recognition called "scenarios". Scenarios contain the complete configuration for one specific task like controlling Firefox or using the voice controlled on screen keyboard. These scenarios can then be shared with other simon users and are collected in a central online repository which can be accessed directly from within the application.

Besides the scenario system the new version also provides the user not only with the possibility of creating his own model through training but also to use an existing acoustic model (base model) to get started even quicker - entirely without training.If the user wants more control or would like to improve recognition accuracy, personalized training is possible through the optional HTK (not included in simon due to license restrictions). simon then offers to adapt the used base model to your own voice or to create a new model entirely from scratch.

Additionally, we have been working hard to make simon even easier to use. Some of the more notable results of these efforts are the new introductory wizard that guides you through the initial setup as well as the speech model generation adapter that automatically fix a vast variety of common beginners mistakes for you.

Furthermore simon 0.3 introduces three new applications to the suite. Sam, an acoustic modeling tool is geared towards professionals who want to tinker with their speech model and get the best recognition out of it. It is also a great tool to create and test large models which can then be distributed as base models for other simon users. To create base models you also need a lot of speech data which can be easily collected through the newly introduced combo of ssc and sscd. ssc stands for simon sample collector and is the client to the sscd server. Together they provide a powerful, cross platform tool to collect samples from lots of different speakers - even allowing you to record with multiple microphones and / or sound cards simultaneously.

Demonstration

Readers of the RSS feed: Watch it on Youtube

Download

You can download simon 0.3 as source archive but there are also packages available for Windows, OpenSUSE and Ubuntu on our Sourceforge page. Up to date installation instructions are available on the simon listens wiki.

Sonntag, 12. September 2010

simon at the AAL Forum 2010

I'm happy to announce that the simon listens team will attend this years ambient assisted living forum in Odense, Denmark!

The AAL Forum (15th-17th September) is an annual event as part of the Ambient Assisted Living Joint Program of the European Union. The main goal of this program is to improve everyday live for healthy seniors.

While this might not sound as the most exciting topic at first, this is a fast moving, exciting field of research that covers everything from home automation to assistive robotics.

We will be represented through a booth in the exhibition hall and Franz Stieger, our chairman, will give both a short introductory talk simon listens and another one about the project in the context of robotics enabled assisted living.

It isn't all work and no play, tough. In the spirit of the conference an internationally proclaimed 69 year old DJ called Ruth flowers will apparently rock the the social event of the conference. I can't wait to see that :)

Samstag, 17. Juli 2010

Astromobile Kickoff Meeting or Creating a Terminator powered by KDE

It's a very busy time for simon :). On Saturday I came back from a weeks worth of hacking and socializing on this years Akademy in Finland and just a couple of days later the whole simon team set off to the our next adventure: From Tuesday to Thursday we were attending the kick off meeting of our Astromobile project at the facilities of our project partner: The ARTS Lab of the Scuola Superiore Sant' Anna in Pisa.

In the Astromobile project we are trying to help seniors who are well able to take care of their daily routine but would benefit from a little assistance here and there.

Often these people don't want to have a full time care giver staying with them both for financial reasons but and to preserve their autonomy.

The Astromobile project tries to address this issue and bridge the gap between living autonomously and full on assisted living by creating a special robot that is able to for example remind seniors to take their medication, provide a way to call for help in case of an emergency as well as using modern technologies like video chats to set up a communication link with family and friends.

In this project, simon listens is responsible for the touch screen and the voice interface to the robot using our KDE4 based simon system.

Just after the introduction of the Scuola Superiore Sant' Anna by one of their professors our robot plattform - the SCITOS G5 from a german company called MetraLabs - arrived.

After coordinating the most important steps we received a short introduction to the internal workings of the robot plattform and were pleasently surprised when we booted it up for the first time.

Yes you are seeing the KDM login screen of Fedora 12 :)

(Full disclosure: The robot contains both GNOME and KDE and actually logs into GNOME by default - but still :)

We then got a quick walk through on how to talk to the many sensors, how to use the integrated path planning, mapping features and the script language, AngelScript, that is used to communicate to the robot platform.

During our stay we also got to see the living lab of the ARTS Lab which is basically a 200 square meter appartment containing lots of smart home prototypes. This is also the place where we will be testing our solution with elder people.

While we were in the area we also visited a pilot project which deployed automatic garbage disposal robots called "Dustbots" which were also developed by the Scuola Superiore Sant'Anna.

They are running Ubuntu, btw :)

simon says: Hello Planet KDE!

This is the first blog post that is going to be aggregated to the planet and as such I feel that a short introduction is in order.

My name is Peter Grasch and for the past couple of years I have been working on an open source speech recognition software called simon.

With simon you can control your computer with voice commands.

simon uses the KDE 4 libraries, Julius and the HTK and is developed under the GPL license. You can find more information on our sourceforge page and in an interview on the dot.

Together with Franz Stieger, Mathias Stieger and Alexander Breznik I am also chairing the non profit research organization called "Simon Listens" which uses the simon software and other open source projects to research speech recognition and its applications through multiple research projects funded by the Austrian Benefit project and the European Union.

Through one of this projects I have also been lucky enough to be able to attend this years Akademy where I gave a talk and held a workshop about simon.

In Tampere I got the opportunity to, among many other interesting people, meet some of the KDE accessibility guys: Jeremy Whiting and Gunnar Schmidt. We discussed the current state of accessibility in KDE SC, the most pressing problems and how they could be tackled.

There is definitely a lot of work laying ahead of us but there are also some highly motivated people (yes Jeremy, I am looking at you :) working on this.

So lets join forces, buckle down and try to make KDE SC accessible to everybody!

Dienstag, 6. Juli 2010

Day 4 at Akademy

Well the conference part of Akademy already ended on Sunday so for the past two days I've been hacking at Demola.

So far:

I met Jeremy Whiting, a KDE accessibility guy and had a very interesting discussion about kttsd and speech dispatcher.

I attended the KDE Accessibility BoF which was sadly more like a list of things KDE desparately needs to do. To all KDE developers out there: We need to do more!

I became a "Nokia Certified Qt Developer"! Yay :)

I broke my entire desktop because after seeing all those cool features in the current KDE trunk I just had to try it :P

And of course I am already working on new features in the simon suite (ssc at the moment, actually).

Also, my presentation is already online.

To everyone at Akademy or somewhere near Tampere: The simon Workshop is tomorrow from 9:30 to 11:00 in Area 2. Hopefully I'll see you there!

Sonntag, 4. Juli 2010

simon at Akademy

Ok first of all: Akademy is awesome.

But today, for me, it kinda got serious: My first international talk :)

Well everything went fine - more or less - and even tough we were a bit pressed for time (aseigos keynote ran a bit longer but it was well worth the time), I think I got the most important ideas behind simon across.

The presentations were all recorded so those who where not able to be there in person can watch it online soon.

So far the feedback has been overwhelmingly positive and has resulted in many fruitful discussions.

In other ~~news~~ common knowledge, KDE devs are really nice and exceptionally smart people so I am really looking forward to the coming days and the upcoming hacking sessions, workshops and BoFs.

Samstag, 3. Juli 2010

simon 0.3 alpha 3

Just a short post because I am actually sitting in an Akademy presentation right now: simon 0.3 alpha 3 was released which contains some critical fixes (no new features).

You can download it on sourceforge.

Donnerstag, 24. Juni 2010

New test version: simon 0.3 alpha 2

Well, Qt 4.6.3 has been released and as promised, so has simon 0.3 alpha 2.

What's new with this release? Well: A lot.

The most visible change is probably the completely revamped soundstack which is something I was meaning to do for a long time. Because usually this would have been as dull as it sounds, we managed to spice it up with some serious features:

Internal sound server takes care of multiple recordings / playbacks at once; An especially developed priority system tells simon if other streams should be stopped or are supposed to run in parallel.
Can handle more than one device simultaniously! The UI and logic have been adapted as well and with the current version you can:
- Record your trainings samples with two (or more) microphones at once
- Use multiple devices for recognition (for example in different rooms)
- Use multiple playback devices for example to drive speakers and headphones at the same time
Volume calibration helps you to determine the optimal recording value to get the most out of your trainings samples.
Pause notifications ensure that you don't cut off important information in your trainings data to ensure best results with as little needed data as possible.
Even the sample collection engine and sam were adapted and the SSC / SSCD system even stores information about the used recording devices in it's database. This way you can reconstruct which device recorded what samples possibly finding faulty / bad quality samples in large databases even faster.

But don't think we "only" worked on sound handling...
We removed any trace of Julius on the client and replaced their adin system with our own sound streaming including our own level based voice activity detection which is of course fully configurable through the graphical interface.

The recognizer interface (simond) now only needs to be able to recognize wave files (which I guess most recognizers are able to do) so replacing Julius gets even easier.

Handling of alternate keyboard layouts was improved and simon is now able to "type" for example Greek letters.

This release also contains the announced welcome wizard to make the initial configuration even easier.

Packages are currently being built and uploaded.

Happy testing!

Samstag, 15. Mai 2010

What happened to 0.3 alpha 2?

It's been a whole month since the last release of the 0.3 series.

The reason for this is not because nothing happened since then (quite the contrary) but rather that we are waiting for the next patch level Qt release.

For 0.3 alpha 2 we finally switched from portaudio to QtMultimedia which fixed some long standing issues on Ubuntu. However, it also introduced a few new issues because Qt 4.6.2 still has some nasty bugs in this area.

Luckily, the Qt software team is incredibly responsive and the most important bugs have already been fixed in their current development version.

Because two of these bugs are blockers for simon (QtMultimedia always uses default output device on Linux, Not supporting 16 kHz recordings on Windows) we sadly have to wait for Qt 4.6.3 to be released before we can release a new test version.

Mittwoch, 12. Mai 2010

simon at the Akademy 2010

The conference program for this years Akademy is online and it looks like we got a pretty sweet timeslot.

Scenarios coming to life

With simon 0.3 we introduce the scenario system: A package based recognition architecture allows users to simply choose which use cases they want the speech recognition to handle from a vast online repository.

That's the idea. The problematic part is that "vast online repository". We obviously can't anticipate all use cases of simon, nor do we have the necessary manpower to design all of those scenarios. But what we can do - and what we already did - was to make it very easy to create and share scenarios from within simon.

Now we depend on the community to pick up the concept and start creating / uploading scenarios.

This is why I was excited to see the first user contributed scenario uploaded to kde-files.org: The day before yesterday, Ken Maclean (of Voxforge fame) created and uploaded a scenario to control the music player rythmbox.

Considering a stable version of simon hasn't even been released yet I am looking forward to many more user contributed scenarios being uploaded soon!

Want to get involved? Contact us at support ate simon-listens dot org to find out more!

Mittwoch, 5. Mai 2010

simon at the LinuxWochen 2010

This Saturday, the 8th of May, we will present the current prototype of simon at the biggest IT event in Austria - the "Linuxwochen" in Vienna (program).

The admission is free.

Akademy

Our talk proposal for the Akademy 2010 was accepted!

Still waiting for the "official" banner to insert at this point but:
I'm going to Akademy!

Samstag, 17. April 2010

Goodbye Portaudio! Long live QtMultimedia!

The sound stack of simon was long a source of many issues. This was mainly because it relied on portaudio which sadly isn't supported that well by the sound configuration of e.g. Ubuntu because it interferes with their Pulseaudio setup. Long story short: Users of Ubuntu often had completely unusable simon installations because it crashed often and seemingly at random. Because those crashes happened in portaudios code and not in simons, there was little for us to do.

In the last week, I finally found some time and threw out all the old sound handling code and replaced it by a completely new, QtMultimedia based system. QtMultimedia is still a very young library and too has issues but I suspect that those will get fixed pretty quickly.

While I was at it, I also implemented a much cleaner way to stream audio to simond. Older versions used Julius libsent to do this because of their voice activity detection implementation. We now implemented a similar system (configurable, level based voice activity detection) in simon and now have complete control over the audio stream. Because of the new implementation I also implemented the feature to keep recognition samples - complete with their recognition results - on the server. This could for example be used to gather training data during normal usage. All you'd need to do is check if the words were correctly recognized and add them to the model.

Because all sound in/output is handled through a central point, I implemented a quite primitive sound server that will handle multiple simultanious streams correctly. Recordings while simon is activated will now work much faster (because the sound device handle will simply stay open) and are of course completely stable. You even get automatic pausing / unpausing for interrupted streams (If you for example start to record one sample, while recording this one start to record another sample the first one will pause until you are done recording the second).

The new implementation also has a much better level meter integrated into the recording widget so you can check your current microphone volume while you record. If you start to clip, simon will now automatically display a warning message telling you to re-record the sample.

Btw, QtMultimedia also works e.g. on Symbian devices so a simond client on a mobile phone should be trivial now.

All this has already been merged to the master branch and works very well in my tests. However, just like any new code it might contain bugs so try it at your own risk :).

Sonntag, 11. April 2010

Usability

Considering that simon was designed to be as easy as possible, someone who just downloads and installs simon might say that we failed.

To many new users the concepts behind simon are - at first - too complicated and simply getting the recognition to work seems needlessly hard.

However, those that stick with it seem to "get" the ui pretty soon and it proves very powerful for expert users.

This is why one of the goals for 0.3 was to make this initial learning curve as flat as possible at get simon up and running quickly.

After I released the first alpha of 0.3 about a month ago, I posted a review request to the KDE Usability mailing list asking them for ideas of how to improve our interface. I got great feedback (thanks!) and it quickly became clear that it would be best if simon provided an assistant on the first start that would guide new users through an initial setup.

Some users might remember this concept from 0.1. Back then we had a first run wizard but it took ages to complete because it consited of dozens of pages with quite complicated instructions.

In 0.3 however, with the introduction of scenarios and base models we designed another such wizard. It now includes just 5 pages (including welcome and finish pages) and includes short but precise instructions for every step. If the users follows through, they will be rewarded with a completely functional simon within minutes.

We are still fine tuning the wizard (updating the descriptions, everything is already fully functional) which is why it still resides in its own branch ("hci") for those of you who want to check it out.

The wizard will be included in the next release.

Thanks again to the KDE usability team for their valuable input!

Dienstag, 9. März 2010

simon 0.3: First alpha released

The first alpha of simon 0.3 was just released.

simon 0.3 is not officially feature complete but this is basically what you will get when it is released.

simon 0.3 alpha 1 does not replace the current stable version (0.2) but should only be installed by interested testers.

simon 0.3 alpha 1 at sourceforge
Changelog

Cheers!

Mittwoch, 24. Februar 2010

Benefit project

On the 1st of February 2010 the friendly society simon listens started to work on a new project. For the next one and a half years, the simon listens team will investigate ways and means to make the simon speech recognition solution even more usable – especially for the elderly.

Abstract:
With the help of verbal control provided by simon using terms of everyday language, useful scenarios and areas of application shall be created to enable an easy use of new communication technologies such as the internet, telephone and multimedia applications for elderly people. Moreover, additional security can be provided, for example, a reminder for the user to take a medication.

In the course of this project we will join forces with the Signal Processing and Speech Communication Laboratory of the Graz University of Technology, the HTBLA Kaindorf/Sulm, the Rehabilitation Clinic Maria-Theresia, the KFU Research Center for Austrian German and the Huminatis Graz to ensure that we have the necessary expertise to tackle such an ambitions project.

The solution created in this project will be released under the GPL license. All code will be freely available to the community.

Thanks to the generous support of the bmvit (federal ministry of transport, innovation and technology) of Austria and the FFG (Austrian Research Promotion Agency) for making this possible!

Dienstag, 23. Februar 2010

Model Compilation Adapter

In simon 0.2 we introduced some mechanisms to catch common errors during the compilation of the model and display nicer error messages to the user explaining ways to solve the issue manually. In simon 0.3 simon, however, simon will automatically repair some common mistakes without the user even noticing.

To explain what I am talking about, I first have to talk about simons architecture a bit so bear with me...

During normal operation, the simon client gathers the instructions (words, grammar, etc.) that will then be sent to simond. simond in turn compiles the model out of the given input files. To do that, simond first converts them to a format usable by the underlying tools (HTK, Julius). This conversion step was not needed in 0.2 because simon 0.2 only used the raw file formats of HTK / Julius. However, in simon 0.3 we need more control over the model and also want to give the user some advanced features that were not possible with just the information contained in those raw formats.

In simon 0.3 we introduced a new step between gathering the data and compiling it to a usable model: Adapting the input files.

This sounds like a boring but nescessairy conversion and indeed it is.

But what makes it interesting is that at the point of adaption we have all the input data that will be turned in to a mode in a format that is easily parsable. This means that it is an ideal place to do some last minute optimizations on the temporary files that are then used to generate the model.

The model adaption manager will for example automatically remove words from lexicon that have no training data associated. It will also clean the grammar of sentences that have no associated words. It will even remove samples containing words that are not in your dictionary. Basically, simon should be able to handle a lot of case that would cause an error in simon 0.2, automatically.

Sonntag, 24. Januar 2010

Git

Like all the cool kids, simon moved to git.

And so far, while the syntax is a bit strange at first it works really, really well.

The cheap branches are great and svn2git made the transition painless.

Repository URL (read-only):

git://speech2text.git.sourceforge.net/gitroot/speech2text/speech2text

The old svn repository is still available and up to date but if everything works well enough it will eventually be removed in the next couple of days.

Sonntag, 17. Januar 2010

Model adaption

Keeping in line with the last couple of blog posts that all were breakthroughs on their own, this one is definitely up there as well.

Since revision 1117 simon now supports to use static models or adapt speaker independent models to your own voice in addition to building a new, speaker dependent model from scratch (which is still the default obviously). This means that new users can set up a complete working speech recognition literally in seconds. Pick the scenarios you want, point simon to the voxforge speech model, press "Connect" and start talking.

Of course this only works if you have a fairly "standard" voice and the voxforge model is still not perfect. So if you want a little higher recognition rate go ahead, train a few samples and tell simon to adapt the voxforge base model with it. As little as one minute of speech will yield visible results (you will still need to install the HTK for this, tough).

The only user interaction needed is to click a radio button - simon will do all the work for you.

While I was at it I also improved the julius error reporting so that the recognition process now writes a log file (~/.kde/share/apps/simond/models/<user>/active/julius.log) so that you can easily debug low recognition rates, mic troubles etc. When the recognition fails completely, simon will display the log along with a short description of what simon thinks that happened.

Of course all of this is completely untested and will most likely contain bugs so try it at your own risk. By the way: Current trunk needs KDE 4.4 to compile.