Simon: Open-Source Speech Recognition: 2009

Samstag, 26. Dezember 2009

SSC: Large scale sample acquisition

Two new applications have joined the simon application suite: ssc and sscd.

ssc stands for Simon Sample Collector and is specifially designed for large scale sample acquisition.

sscd is the central data server which it's data in a MySQL database (no graphical frontend).

ssc is the graphical client that records the samples, manages the users and institutions and uploads the data to the server.

We use the software to collect samples in various medical institution (rehabilitation clinics mostly at the moment) for further analysis. Every user of the software has a quite extensive profile which stores birth year, education, mothers tongue, diagnosis and other relevant factors that may influence the speech.

The system allows the user to create "institution" (the clinic for example) and associate users with zero or many institutions.

ssc users are not interchangable with simon users (more information) and there is no attempt to creating speech models with ssc / sscd. Those tools are simple data-acquisition tools and probably only useful if you are a speech researcher who wants to gather data from a lot of speakers. Then, however, it is a neat little utility that works very well for our uses already.

simon scenarios

First of all: Merry Christmas to everybody!

Some of you might still remember that back when simon 0.2 was released, I talked about modularizing the recognition into smaller "packages" or "scenarios". Well this features has been implemented now and is almost ready.

The idea is to provide pre-defined application packages (like default values on steroids) as well as the ability to create such packages and share them with the community easily. This should significantly improve our biggest weakness: The time it takes a new user to get started with simon.

To show you what I mean, I created a little screencast that shows the current development version of simon in action:

Mittwoch, 16. September 2009

Automatic BOMP import

Disclaimer: This Blog post is probably only interesting if you want to use simon for the German language.

If you do, then you most likely already know the problem with the BOMP dictionary. It is a great quality, huge dictionary that even contains terminal information. In short: It is the best shadow dictionary for simon there is - and I am not saying the best for the German language; it's the best. Period.

Only problem: The licence. It doesn't permit free distribution. Instead, you had to send an e-Mail to a specific address at the university of Bonn and wait for a reply that contained the dictionary as attachment. This process could take a couple of days.

So we contacted the team that is holding the copyright and they permitted us an exception in the licence. We are now allowed to distribute the dictionary to our users ourselves as long as we still gather their name and e-Mail addresses (for statistic purposes) and they still have to accept the BOMP licence terms. So we integrated this process directly into simon. When selecting to import a HADIFIX dictionary you can now select to manually specify a file or to download and import the HADIFIX BOMP automatically.

Freitag, 28. August 2009

Calculator Plugin and Keyboard Plugin

Thanks to the "Österreichische Forschungsförderungsgesellschaft" (literal trans.: 'Austrian Researchfundingassociation') the SIMON listens team has been expanded with the two summer interns Mario Strametz and Dominik Neumeister.

After some general testing and getting to know the system, they are now working on two promising command plugins: A calculator plugin and a keyboard plugin.

The calculator plugin is a natural extension of the existing input-number-plugin.

As seen it is still quite basic but already usable to a certain extend. However, it is under heavy development and we expect first stable versions by the end of next week.

The calculator is - beside the obvious - also targeted to school kids doing their math homework so upon pressing ok it provides the option to not only write out the result but also the calculation leading up to it (e.g.: "1+1=2" instead of just "2"). The finished version will also include formatting options like formatting the output as an amount of money, etc.

At the same time, the two are working on a keyboard plugin (no screenshot there yet as development has just started). However, our "keyboard" will not only be a regular on screen keyboard.

The keyboard plugin will not have a fixed amount of fields (keys), nor will their values be fixed to that of a qwerty keyboard.

Instead, the user will be able configure them as he likes in configuration sets (sensible defaults will of course be provided) and even spread the keys out across multiple tabs.

While this just seems overly complicated on paper it makes advanced configurations possible with e.g. a text-snippet tab that combines his most often used text snippets or allows the user to add - for him - important special characters (e.g. Currency symbols for an accountant) right where he wants them.

I will update this blog as the development progresses so check back!

Publicity

Hi fellow readers! Long time no see!

As some of you might have seen, there was an article about simon on the dot. Thanks for Troy Unrau for making that happen!

The article spawned a lot of discussion and interested and several sites brought it up. Most notably the discussion on lwn focusing on the license issues. The article also hit digg (50 digs), osnews, several twitter/identi.ca feeds and a lot of blogs everywhere.

Of course this also showed on our download statistics. We had more downloads in the last week than we had in the whole month before that! The forum has also been noticeably busier than usual but the low number of support requests showed that the extensive documentation of simon 0.2 really helps a lot.

The simon homepage runs google analytics so there has been quite some interesting data about our (newly found) user base:

55% of all visitors were running GNU/Linux (Windows: 39%; Mac: 5%)

Our 5000 hits were spread out to 106 countries using 75 different languages; The most used languages were English (2500), German (1000), French (500), Chinese (200).

In the open source scene, firefox rules the browser battle (58%)

More people are using konqueror (9%) than Internet Explorer (7%) (of course this is because of the KDE-specific audience this month but I still found it interesting; konqueror was actually on 2nd place after firefox)

Donnerstag, 6. August 2009

sam

I already mentioned it in the last post: A new application has been added to the simon application suite: sam.

sam is targeted towards power users who want to tweak and improve their acoustic model manually to improve recognition rates even further.

sam will include a sophisticated testing framework to immediatly receive feedback on changes in the model configuration. In fact during optimizing models manually, I realized that IMHO a well working, automated model testing framework is the most essential part in manual optimization as it makes the impact of changes immediatly visisble.

In contrast to simon, sam will not hide any of the internal workings from the user (due to the different target group) so the logs of both the building and the testing of the model are displayed and the whole operation can be double-checked for errors or warnings.

An initial, working version is already available through SVN.

Selecting the input files:

Building the model:

Testing the model:

Test results:

As you can see, simon will run the recognition with the generated models on the trainingssamples to see if simon correctly recognizes their contents. The algorithm already recognizes and considers confidence scores of the recognition results which is why in the screenshot you can see the recognition rate of e.g. "NULL" not being 100% even tough every instance of it was recognized correctly (5/5).

Btw: This is a well trained, rather small model which really works very well in practice so don't be alarmed by the very high recognition rate...

Greetings,
Peter

Donnerstag, 30. Juli 2009

Look out - cool stuff coming your way!

Ok I have way to little time at the moment for simon development let alone regular blog updates.

However here is a quick overview of the latest updates:

simon can now import dictionaries to the active lexicon. While you obviously not want the whole BOMP or Voxforge Dictionary in your active dictionary it is a little step towards easy export and import of the speech model.

The URL to the BOMP has been corrected - they had moved.

simon can now import prompts files through the import training data wizard.

simon can now be launched through the ksimond context menu.

Some phoneme segmentation issues have been fixed.

And finally: A new application has been added to the simon suite: "sam".

sam stands for simon acoustic modeller and is an application targeted towards power users to tweak and test their speech models. Of course sam is nowhere near usable right now but the first lines of code have been written so I thought I should mention it here.

Montag, 20. Juli 2009

simon 0.3: One Week In

About a week ago, I announced the simon 0.2 stable release. Fueled by this milestone and a lot of positive feedback all around, simon 0.3 development has already started ... and is already showing results!

I'll start small: simon now supports a "Power Training" mode which starts the recording immediatly as the text to say is shown. The recording is then, upon preceding to the next page, automatically stopped, saved and the next one starts. This simple change really makes training of large texts a lot faster!

Ok but that alone is not blog worthy, right? Right! One of the most awaited features has made it's appearance: Confidence scores.

The recognition server now provides information about how confident it was on the recognition result

Moreover it also not only provides simon with the most likely result but with the ten most probable ones. simon now ranks them based on the recognition confidence and can ignore them if the recognition was just not sure enough (with a configurable threshold).

Now the cool part: If two results (or more) are very likely and simon can not determine which one you meant, simon will simply display a nice list from which you can select (of course with your voice) what you meant.

This looks like this:

The feature is already quite stable and works well in combination with other plugins. There are of course safeguards in place to prevent recursive "did-you-mean-popups".

Of course the confidence scores of the results are also relayed to the plugins and if they want to they can even retrieve the whole list of recognition results including the phonetic transcription of the result. This brings even more flexibility to the plugin developers without making plugin development more complicated (the base classes have appropriate implementations that you don't need to overwrite if you don't want the additional information).

If you are running a svn snapshot and are upgrading: You will need to manually copy the julius.jconf file from `kde4-config --prefix`/share/apps/simond/default.jconf to ~/.kde/share/apps/simond/models//active/julius.jconf (overwriting the old one) as simon(d) will not do that automatically.

Freitag, 10. Juli 2009

simon 0.2 released

Almost three years after the start of the development, the first stable version of the open source speech recognition suite simon has finally been released: simon 0.2 is ready for download.

With simon you can control your computer with your voice. You can open programs, URLs, type configurable text snippets, simulate shortcuts, control the mouse and more.

Because of simons architecture, it is not bound to a specific language and can be used with any dialect. It is also specifically designed to handle speech impairments which makes simon a viable alternative to conventional input methods for physically disabled people.

simon 0.2 is based off of the open source Julius speech recognition engine and the HTK (which - due to licensing restrictions - has to be installed seperately).

In comparison to the 0.1 series that never made it past alpha quality, simon 0.2 does not only bring stability improvements.

simon 0.2 is now based on KDE 4 and thus perfectly integrates in every KDE setup. This move also brings KIO to simon which allows for network transparency, transparent compression and more.

The seperate Juliusd application has been discontinued and replaced by the much advanced simond which features network audio streaming, centralized model management with automatic backups and more. simond is a command line application which makes it easy to set up a central simon server without the heavy X dependencies. For users of graphical environments the front-end ksimond has been introduced.

Moreover, the command architecture has been completely overhauled and now uses a much more flexible plugin architecture and supports individual triggers per plugin. New plugins include the list plugin (which can be used to display options), the composite plugin (similar to "macros"), a number input plugin and an artificial intelligence. Combined with the improved commands of previous simon versions this makes a total of 10 command plugins out of the box!

The import of the shadow dictionary now also supports PLS and SPHINX dictionaries which opens the door for dictionaries like the German GPL dictionary from Voxforge.

Because of the growing user base simon has been translated to English, German and French and also partly to Spanish, Dutch and Czech.

simon 0.2 is also the first version of simon ever to ship complete with an extensive user manual - available in English and German.

Next to the source package, the release is also available in convenient binary packages for 32-bit and 64-bit users of both GNU/Linux (Ubuntu and OpenSUSE) as well as Microsoft Windows operating systems and can be downloaded from the sourceforge project page.

Donnerstag, 9. Juli 2009

Two Final Issues

The last round of testing of the simon 0.2 codebase only resulted in two found bugs.

The first one is quite annoying in that it essentially limits simon functionality. The HTK does not like words that start with the character "'". That makes "words" like "'em" (short version of "them") fail during the model compilation with a confusing error message.

As I really don't want to mess with the wordlist code (we would have to escape special characters under certain conditions) so late in the development process, I delayed that fix for the 0.3 series. In the mean time just stay away from 's at the beginning of the word, please. Words like "that's" are no problem, tough (as the "'" is not at the beginning of the word).

The second bug was a rather strange one: Some people reported that over time, the recognition became slower and slower for them. All of the users that reported that bug were using Windows. During testing, I found out that using the pseudo device called "SoundMapper" (or similar) caused this - when using the hardware device everything was working. So if you experience this issue, please check that you use the appropriate hardware device instead of meta-devices.

For users that don't read the blog, I added entries for both problems in the troubleshooting guide on our wiki.

And yes, I know that these are hardly the last two bugs in the 0.2 code - but they are the last to be fixed before the stable release which makes them kinda special ... for me anyways :)

Sonntag, 14. Juni 2009

simond: Thread Termination

Some users told us that when simon crashed / closed under certain conditions, the simond would be trapped in an infinite loop.

To accept any new connections, simond had to be restarted.

This was caused by the implementation of the TCP/IP adin. Once a client would be connected (data connection to synchronize the model), a new socket would be opened for the audio stream. This was implemented using the accept() method which blocks until a connection is made.

While this was in a different thread as the main event loop this was no problem most of the time. However, when the client connected through the data channel but didn't establish an audio connection this socket would be blocking in accept(). When the client exited, the thread should by terminated by the main event loop but the thread event loop would still block waiting for a connection.

This resulted in the debug output "QThread::start: Thread termination error" which was printed over and over again.

Because I really don't want to modify too much of Julius to keep it synchronized to the SVN version I added a little workaround in the stop() routine: I simply connected to the socket and disconnected immediately.

So starting with revision 862 simond should not hang any longer and should be more stable in general.

Samstag, 13. Juni 2009

Manuals

The manuals for simon, simond and ksimond are now also available as PDF files at our wiki.

Also, I managed to get khelpcenter on windows working. Sadly this really blows up our windows package as this adds the whole khtml library to it... Anyways: Say hello to F1 in the simon 0.2 final!

Freitag, 12. Juni 2009

Overcoming Limitations

Remember the post a couple of days ago about the limitations of the event simulation?

Well they are history now :)

To fix the problem with the dead keys, I added a hash table to the CoreEvents class (the plattform independent part of the event simulation backends) containing the unicode characters ('â', 'é', etc.) and what characters it consits of (in case of 'â': '^' and 'a').

After implementing this and finding all the common dead keys (there are a lot of them!) I had to restructure things anyway in the XEvents backend (to allow for AltGr+Shift at the same time) so the event simulation on X11 should now work better and faster too!

Long story short, simon now supports the following dead keys:

ŵêẑûîôâŝĝĥĵŷĉ

ŴÊẐÛÎÔÂŜĜĤĴŶĈ

ẃéŕźúíóṕǘáśǵḱĺýćńḿ

ẂÉŔŹÚÍÓṔǗÁŚǴḰĹÝĆŃḾ

ẁèùìòǜàỳǹ

ẀÈÙÌÒǛÀỲǸ

ȩŗţşḑģḩķļçņ

ȨŖŢŞḐĢḨĶĻÇŅ

ẽũĩõãỹṽñ

ẼŨĨÕÃỸṼÑ

ẉẹṛṭẓụịọạṣḍḥḳḷỵṿḅṇ

ẈẸṚṬẒỤỊỌẠṢḌḤḲḶỴṾḄṆṂ

ẇėṙṫżıȯṗȧṡḋḟġḣẏẋċḃṅṁ

ẆĖṘṪŻIȮṖȦṠḊḞĠḢẎẊĊḂṄṀ

ẅëẗüïöäḧÿẍ

ẄËTÜÏÖÄḦŸẌ

ẘůåẙ

ŮÅ

ēūīōāḡȳ

ĒŪĪŌĀḠȲ

ěřťžǔǐǒǎšďȟǰǩľčň

ĚŘŤŽǓǏǑŠĎȞǨĽČŇ

űő

ŰŐ

ęųįǫą

ĘŲĮǪĄ

I told you there are a lot of them...

Dienstag, 9. Juni 2009

French Translation

Quick update: The user yanncantin translated simon to French.

Thanks!

The translations are not yet checked in but I hope to get that done by tonight.

Limitations of the Event Simulation

simon provides the possibility to simulate user input like keystrokes or mouse clicks.

This functionality is provided by the simoneventsimulation library. The library internally uses plattform dependant backends for X11 (xtst) and Microsoft Windows (WinAPI).

The backends need to provide a common sendKey(unsigned int /*unicode*/) method mandated by the shared interface. Since neither Xtst nor WinAPI provide a way to "write" such a character directly we first need to find out how the user would generate such a key.

Internally both methods use a simple switch / case to first determine if this is a special key (like the "Home" key) and if not try to dertmine the keycode by extracting the "base character" (for example: The base character of € is e) and then trying to find out which modifiers (shift, altgr, control) were pressed to get to the keycode given.

This works fine for "normal" characters and special characters like €. However one group of "keys" is missing: The dead keys.

Dead keys are used to generate characters like â or é. They are generated by first pressing one key like ^, releasing it, and then pressing the next key. This is substantially different from modifier keys as there the modifier key is not released until the combination is complete.

This is why simon in the current form does not know how to handle characters that need dead keys. I hope to get this integrated before simon goes stable as this is a substantial feature for languages like French.

Freitag, 5. Juni 2009

Portaudio and Pulseaudio

So as I already said yesterday, I seemed to have missed the gamechanging "make-portaudio-work-with-pulseaudio" patch. So I went back, updated the sources and created new packages for the portaudio snapshot of yesterday.

Guess what? It didn't help a bit. simon still crashes seemingly random and nowhere (no backtraces) when you use portaudio and pulseaudio is active.

I give up. There is apparently no real way to have portaudio and pulseaudio coexist together peacfully.

So if you are using pulseaudio (which is the default of Ubuntu and apparently also on Fedora) expect ugly crashes from simon.

The only workaround I could find is to launch simon over padsp and select OSS devices in simon. That seemed to work fine (no crashes there).

Using pasuspender doesn't help at all.

If you do experience a crash due to pulseaudio it is usually bad enough (catastrophic) that it somehow manages to get simond in an infinite loop so you will want to restart that too if simon crashes.

I am sorry for the inconvenience but sadly there is little that I can do within a realistic timeframe. I don't want to get involved into either portaudio nor pulseaudio development so I keep hoping for phonon to develop a recording API.

Maybe for simon 0.3.

Donnerstag, 4. Juni 2009

Bad Timing

Ok this has to be bad karma or something.

I just finished packaging a portaudio snapshot because the stock Ubuntu-Version of portaudio is just too old.

So I upload them to the ubuntu bugtracker after careful testing on Kubuntu 32-bit and Kubuntu 64-bit. Everything works fine.

Yesterday I tested the whole setup (simon and the portaudio snapshot) on Ubuntu 32-bit and with it the Pulseaudio soundserver. simon crashes instantly when I deactivate it (which happens to happen all the time).

Back to square one. First I try to debug the problem but even when building every package I can find with debug information I get no valid backtraces at all. Valgrind doesn't help. Pasuspender? Crash. OSS devices? Crash. Nothing. The only thing that kind-of-works is using padsp and then using OSS devices in simon. But thats not good enough IMO.

So I start to look around and after quite some discouraging "there was some effort but now it is dead" posts on mailing lists and a lot of pushing the blame around (portaudio support in pulseaudio or pulseaudio support in portaudio?) I finally find light at the end of the tunnel. Apparently the problem annoyed enough audacity users that they integrated a patch into the audacity fork of portaudio that allows to use non-mmap devices which is apparently needed by pulseaudio.

David, It works!!!!!!!!!

Good enough for me. So I download the patch and a new portaudio snapshot and try to apply it. It fails. Of course.

So after going through it line by line and double checking the portaudio code I realize that the patch was already integrated. So why doesn't it work? Is the Audacity fork of portaudio just so much more advanced?

But then I look at the portaudio svn log and see this:

r1412 | aknudsen | 2009-05-24 18:54:22 +0200 (Son, 24. Mai 2009) | 2 Zeilen
Apply Kevin Kofler's non-mmap patch

Checking back to the date of the snapshot I used for packaging: 2009-05-19. I made the packages over the course of the last 3 days but used the "old" code, mind you.

So in an effort to update the stock version that is 2 years too old, I updated it with packages of a newer portaudio version that is ... 5 days too old. Oh the irony.

Ok I am off packaging portaudio again. And this time I am using the SVN version.

Dienstag, 2. Juni 2009

simon 0.2 and KDE 4.1

simon and KDE backward compatibility has always been an issue. Because I use KDE/trunk to develop simon, I tend to use new classes as they appear.

However, OpenSUSE still has KDE 4.2 only in the factory respository and *buntu 8.10 only has it in its backports repository from jaunty.

So while we will not support 4.0 (Ubuntu Hardy), we decided support KDE >= 4.1. This means that users of all major distributions should have no problem installing simon 0.2.

simon 0.2 on KDE 4.1

Portaudio Snapshot Packages

After I realized that a lot of the sound issues on *buntu can be traced back to the very old portaudio version used.

I packaged a snapshot from a couple of days ago and uploaded it to the Launchpad bugtracker.

Just uninstall the libportaudio2 and portaudio19-dev packages if they are already installed from the universe repository and install the new packages. This should fix a lot of sound issues (all of them if you are running Kubuntu and thus don't use pulseaudio) instantly.

Mittwoch, 20. Mai 2009

Ubuntu 9.04 and simon

Ubuntu Jaunty Jackalope and simon are not exactly best friends. If you tried it, you most likely experienced some very strange sound issues.

simon hangs, something else hangs, everything hangs...

I installed Kubuntu 9.04 today and had the very same issues.

Turns out the problem is a pretty old portaudio version.
The current snapshot fixed all those problems for me. You can get it on the portaudio website. Download the pa_snapshot.tgz, compile and install.

I haven't yet found a pre-packaged debian package to spare you the compiling part. If you know one, please let me know in the comments.

Also, it worked for me but obviously YMMV. Still issues? Again: let me know in the comments.

Freitag, 15. Mai 2009

XML - Again...

Ok it seems my last blog post has triggered quite a reply.

However, I think there still seems to be a bit of confusion. Let's try again...

Why am I so into XML-based standards? Because I understand them.

Even if this was not your intention, this is a bit misleading as it suggests that I don't understand XML files and if I would, I would share your points of view.

I can assure you that I understand the principles of XML files very well. Hell, a lot of parts of simon even use XML files (take a look at how the commands are stored for example).

But at the moment it just doesn't make sense to use a custom PLS modification (adding terminal tags). We need the lexicon to be readable by julius and HTK. This implies that we have to store the lexicon in that format (or add support to Julius; HTK is essential closed-source so we would still have to keep a seperate HTK lexicon around). It essentially does not matter what I believe is the better format for the job.

So the only possible way to incorporate a XML based dictionary storage format would be to add an additional layer. This, however, means that the features supported can only be the smallest common denominator of both formats. So no fancy IPA (no support in HTK), no nice multiple-graphemes per word (HTK could be compared to 1NF if you are familiar with database normalization), etc. In the end this additional layer would bring nothing beneficial to the table because we can't use it's nice features as long as we have to keep HTK compliant too. All it would do is introduce another source for errors.

All these considerations are irrelevant when we take Julius and HTK out of the equation. Then, adopting and modifying PLS is not such a bad idea (altough I would like to store commands and the dictionary in the same file for the upcoming package-based structure). Removing the dependency on HTK is something I would like very much but it doesn't seem feasible right now and in the near future.

And, by the way: I don’t like to read SAMPA. I prefer the IPA when editing the pronouncing dictionary.

The HTK does not support UTF-8. However, I would prefer using the SAMPA even if it did. I find that it is much easier to read and learn the SAMPA (especially if you speak german). Also, I do prefer to be able to transcribe my words with the keyboard instead of using sign-tables to pick out the symbols.

As the IPA and X-Sampa can be converted to and from each other without loosing anything I don't really see a problem there.

Sometimes, I ask myself the question: why don’t they switch from SAMPA to the IPA? Why don’t they switch their homepage from ISO-8895-1 to UTF-8?

This somehow confused me a bit. Our homepage is UTF-8 encoded? So are all the files produced by simon (except where it is not possible because of third party products that don't support it)...

Then I saw that you linked to the SPHINX homepage and not to our homepage...

export functionality is a low priority feature

OK. From my point of view, Voxforge needs an export functionality.

This is especially confusing as I was talking about export functionality of simon. You talk about an export functionality from Voxforge which would be an import feature from simons point of view. And as I stated in my previous blog post this is something that I am indeed very interested in.

I don’t know about the exotic BOMP standard, I couldn’t find an entry in the Wikipedia. So I assume that BOMP is not a relevant standard.

BOMP is no standard at all. It is a dictionary following the HADIFIX "standard". HADIFIX is a speech synthesis project that uses phonetic dictionaries to know how to pronounce the words. Those dictionaries have to follow a specific format which could be called the "HADIFIX standard" (I have not found a definition of it anywhere).
The import functionality was implemented because a very large, high quality phonetic dictionary (the "BOMP" dictionary) exists following that format.

Simon allows me to record just single words, not utterances. I am not convinced by that concept.

I wouldn't be either. Fortunately, this is not true. Take a look at the Training module. You can easily import "normal" Texts. Try to input a text file containing this: "I am an utterance. And here comes another.". Even the standard examples shipped with simon contain sentences and not individual words, btw.

You see, there are several aspects. The world is not just about simon. It is about Voxforge, too.

Please don't lecture me. It is disrespectful and unnecessary. I am very grateful of the effort that Ken and all the contributers put into Voxforge and actively promote participation when people ask me about dictation with simon.

I am also investigating how to best use the voxforge model with simon and have stated on several occasions that I have intention to integrate the possibility to contribute to voxforge from within simon.

Followup 16.05.2009
Today I have been contacted by ralfherzog by e-Mail where he explained the misunderstanding.

Donnerstag, 14. Mai 2009

XML Standards: Clarification

One of the largest contributers to the german voxforge acoustic model and one of the main contributer to the german GPL lexicon called ralfherzog keeps posting about simons (missing) import / export functionalities in his "testing simon"-blog. Normally I answer him directly per mail but I think this warrants a blog entry as this might be interesting to other readers as well.

First off some facts:

simon does support importing PLS dictionaries

simon does not support any explicit export functionalities what-so-ever. There are no export functions for the training data, the lexicon, vocabulary or anything.

simon does not support the import of training data based on a supplied prompts file - be that in plain text or XML.

None of those missing features are due to idelogical reasons but mostly due to time constraints. However, I am not as convinced as ralfherzog that they are that essential.

As far as I know, simon is the only application using PLS dictionaries so an export functionality is a low priority feature. The same goes for the training data. An integration with voxforge is planned for the future which would in my opinion be the only practical use case for export features right now anyways.

Some might wonder why we don't use PLS as the default dictionary format in the first place but the answer is very simple. The PLS standard does not allow for any terminal information to be stored with the dictionary. The current storage format is a standard Julius vocabulary file and an accompanying HTK dictionary. Those are the respective file formats of the underlying components and as they are not (yet) exchangeable I see no reason to introduce new file formats.

The import of training data is something that is included in simon 0.2 but only in a very basic form. Its current state is usable if you have training data gathered by a previous simon installation. However, everything else is not yet supported. I would personally like to see importing of a "normal" HTK prompts file but don't see the advantage in SSML. SSML is not designed for that paticular usage and just introduces unnescessairy overhead. Yes, content validation is a nice thing that makes XML a very good choice for many, many things but prompts are imho not one of them. So maybe we might see a import function for SSML formated prompts for data that is already gathered and stored in that format but making it the primary storage format of prompts in simon is probably not going to happen anytime soon. Its the same as with PLS: HTK expects the prompts in that format so why introduce an additional source of errors by introducing another conversion step?

Homepage

We finally decided to bring our homepage up to speed.

Much of the information on it was (and still is) outdated and sometimes even plain wrong. So we decided to restructure it a bit to make the core information more accessible to the new use and to get rid of the outdated content.

Obviously, there is still a lot to do. But even just after the new menu was implemented, it is already much easier to find what you are looking for.

Personally, I don't like the external links to the wiki. The howtos, tips & tricks, etc. are still on the simon wiki but are linked from the main homepage. This of course is a bit confusing when you click a link in the sidebar and suddenly end up on a completely different homepage. However, we don't want to miss the advantages of having the (ever changing) content on our wiki as it is much easier to update.

So dear lazyweb: Is there a convenient solution (like a mediawiki plugin) for embedding wiki content in a typo3 page?

Ideas and of course feedback for the homepage would be much appreciated!

Microblogging

Yes I finally gave in to all that peer pressure :)

You can now follow me on identi.ca/bedahr.

Montag, 27. April 2009

Linuxtage 09

On Saturday we gave a presentation at this years Linuxtage in Graz.

In short: It was awesome :)

We had the largest of the three lecture halls and it was nearly full which means we had about 30 - 40 people listening. The presentation of the software worked out perfectly which just goes to show that the current version is very usable already.

Sadly we could not get into all the discussion points presented during the presentation (after all we had just 45 minutes) so we continued the discussion in the cafeteria afterwards.

All in all, it was a great way to get people interested in the project and it was a pleasure to meet and talk to other free software enthusiasts.

On a related note: One of the people I talked to at the conference, Sven Guckes, pointed me to an Austrian phonetic database called Adaba. This database also comes with a lot of sound samples spoken from professional speakers so this could even be a starting point for a speaker independant german acoustic model. However, the licence is commercial so we have to see what we can do...

Correction (11.05.2009)
Seems like I mixed up two conversations I had. Astrid Gruber was the one that pointed me to Adabe. Thank you very much!

Freitag, 24. April 2009

A little fun with the AI Plugin

I just set up a little conversation model which allows for a handful of simple sentences to be recognized. I then loaded the AI Plugin, turned up the speakers and had a little conversation with my notebook :)

This is how it went:

Me: Hello
Computer: Hi there!
Me: What is your name
Computer: My name is alice.
Me: My name is Peter
Computer: Your name is Peter?
Me: Yes
Computer: That's cool.
Me: Yes
Computer: Pardon me?
Me: Yes
Computer: Ah.
Me: What are you
Computer: I am alice. The female chat robot-electronic brain...
Me: Cool
Computer: Who is the best robot?
Me: you are the best
Computer: Thank you very much human...
Me: you are welcome
Computer: The pleasure was all mine.
Me: Bye
Computer: Thanks for chatting, Peter!

While certainly not very helpful - this is pretty cool :)

Dienstag, 21. April 2009

A little Story about Accidents, Releases and Conferences

Ok the update schedule on this blog is getting a bit ridiculous :)

However, I have an excuse this time: About a month and a half a go I had a pretty bad accident (no cars or other persons involved). I broke both legs and am currently sitting in a wheelchair. As a result, the simon development has been a bit lower on my todo list(s) than usual. I put some explanations up on the sourceforge page and on the german ubuntu forum (actually while still in the ICU) but forgot the blog somehow. For those of you that though that the project might have been halted: I am sorry.

I am expected to learn to walk and subsequently begin to work full time again in July.

On the plus side: We still managed to release the first release candidate of simon!
It is pretty much just a packaged version of what we had already finished before my accident but it still is pretty stable and defenitely a lot better than beta 3.
There are still two issues (which are pretty bad: one crash and one bug that just stops the recognition without any visual indication) but both are related to julius and are hopefully fixed soon. Anyways, they are non-blockers - the problems only occur during restarting the recognition which doesn't happen once you set up your model.

For our readers from Austria: We are presenting the current prototype at the Grazer Linuxtage 09 in Graz. If you live near Graz: Come by and take a look - admission is free!

Montag, 2. März 2009

Practical Tests

While this blog is defenitely not updated nearly enough, things have still progressed at an amazing pace. The software is maturing quickly and altough the windows version still crashes from time to time, those crashes seem to be completely random and mostly happen when opening menus and such so I get to blame KDE or Qt for that ;). Anything else seems alright and works really well in practical tests.

Speaking of practical tests: The current prototype has been tested with two teenagers suffering spastic paralysis. One of them had only a mild speech impairment and after just one hour of training reached already a recognition rate of astonishing 97 % percent on a lexicon with 26 words. Those 26 words are enough to control MediaPortal, Firefox, Skype as well as sending pre-defined Text snippets as e-Mails.

The other test person had a severe speech impairment which meant that even as a native speaker I couldn't understand him at first myself. However, with just a few trainings sessions we already reached impressive 73 % recognition rate on the same 26 words.

This proves that simon really is usable for people who could never use commercial offerings because of their speech impairment.

Tomorrow we have an important presentation to doctors and therapists to demonstrate simons usefulness for handicapped people and it's opportunities when used as a therapeutic tool.

Wish us luck!

Mittwoch, 4. Februar 2009

Documentation Update

Yes! I did it! I managed to devote a whole day to documentation - getting to know docbook, writing installation instructions on the wiki and generally updating and adding documentation all around.

But I am not done yet: We actually have plans for a real manual. With pages and content and everything :)

On a less cheery note: Someone reported problems with our binary packeges. Both the debian package and the generic binary package apparently have some issues. I have installed a clean Ubuntu 8.10 in a virtual machine and am going to double-check everything tomorrow.

Mittwoch, 28. Januar 2009

simon in action

As promised, I created a little teaser video that shows what you can do with simon 0.2 beta 3 and a little training.

The video is available for download here.

Btw, I use the mousless browsing addon to controll firefox.

Montag, 26. Januar 2009

simon 0.2 beta 3 released

Finally. Unlike the second beta which wasn't really that exciting this is new release is IMHO again a huge leap forward.

Why? Well three new command plugins and one old one that actually works for the first time in this release? In my book that effectively doubles our plugin-count.

Composites
Composites take other simon commands as their arguments and execute them in order. Moreover you can introduce delays between them. A typical use-case for such a command would be:
Open a Chat with Anna.
This takes three actions: Open Kopete, type "Anna" to find her in the contact list and press return.
Those three actions have already been possible with old versions of simon but they had to be executed by the user. With the new composite plugin you can create a command that launches the executable command "kopete", the text-macro command to type "Anna" and the shortcut command to press return and associate a meaningful trigger to the combination.

Lists
List commands also take other commands as their parameters. However, instead of executing them one by one, lists - when triggered - provide the user with a small dialog pop-up that presents the commands that make up the list and let the user choose the command they want with numbers from 1-9. Of course lists can contain other lists so you can easily generate sub-menus if you want to. Composites in lists are no problem either.
This enables simon uses to effectively limit their vocabulary much more if they don't want to invest much time into training or just can't speak very well.
An example of a combination of those lists and composites would be a "Contacts" list-command which contains place commands to e-Mail addresses (mailto:bla@blub.com works great as place-command) and a composite plug-in to start a chat with Anna (see the composite part).

Number Input
While not as exciting as lists and composites this plug-in was important too: It let's you input complex numbers easily by providing a calculator-like interface.

Desktop Grid
The desktop grid was even part of the 0.1 series but it was never controllable by voice and just included as a teaser for... well this version :)
Using the desktop grid the user can click any point on a desktop by saying a combination of numbers. When the desktop grid is activated, it devides the desktop in nine areas. Saying a number of 1-9 will then divide this region again in 9 parts, etc.
That way the user can precicly click any point on the screen - again only using numbers from 1-9.

Of course there have been other advancements too. For example the recording widget now displays the current "loudness" using the progress bar while recording and a lot of bugs have been fixed.

All in all 0.2 beta 3 is a really cool release and, if all goes well, might just be our last beta release...

UPDATE 27.01.2009
I seem to have forgotten one final commit which, sadly, fixed some critical bugs (ksimond crash on startup and some plugin-loading stuff). I committed the patch now and am rebuilding the packages as I write.
Anyone who downloaded the release yesterday will have to download and install again. Don't forget to uninstall any previously installed 0.2 versions first!
Sorry for the inconvenience!

Freitag, 23. Januar 2009

A Tale of Poor Recognition Rates

Some of our german speaking users who tried the 0.2 series of simon might have noticed that the recognition rates are extremely poor. No matter how many trainings-samples, simon doesn't recognize a word.

While I never had that problem myself, I saw it myself on Mathias' notebook. Interestingly, using the "real" julius on the same model worked very well. The problem had to be in simon somewhere. And so the digging began...

The first thing I did was to compare julius log produced by simond and the one generated by the "real" julius. The only difference between them was the comma seperator: Julius used "." and simond used "," (which is correct as this was a german windows xp). Well that can't be it, can it?

After a ~~bit~~ lot of fiddling, I gave up and changed the locale to English/USA. And just like that things worked fine.

It turns out, that Julius respects the locales decimal point even when parsing the hmm model files. And as the HTK uses "." as it's seperator and julius expects "," the model is parsed incorrectly. That never happened to me, as my system locale is en_US.

So if you use any version of simon 0.2 with a system locale that uses any other comma seperator than ".", you will have mediocre recognition rates unless you open up the hmmdefs file (Windows: %appdata%\.kde\share\apps\simond\models\\active\hmmdefs; KDE: ~/.kde/share/apps/simond/models//active/hmmdefs) and replace "." with your locales decimal point or change your locale to English.

Donnerstag, 15. Januar 2009

Dutch translation

Quick update: Thanks to Daniël Heres, the third beta of simon will also be available in Dutch!

Second Beta

Well on monday I pushed out the second beta release of 0.2.

I really didn't want to make a lot of fuss about it which is also why I didn't put announcements on all the "cool" places (like kde-announce). The second beta is only a bug-fix release (Changelog) and as such does not introduce any new features.

To be honest, the main reason for releasing the second beta was that we got a couple of new testers on monday and I wanted them to test the latest version of simon. As there was no reason to keep the compiled binaries to myself I put them on sourceforge.

Thanks to everyone who reported bugs since the first beta! Keep 'em coming...

Freitag, 2. Januar 2009

Happy New Year

Well I guess I am a bit late but still ... Happy new year to everyone!

As some of you might know, I stayed with an old friend over new years so my responses to forum posts, opened bugs etc. might have been a bit slower.

But I am back and - as a sign of good will - already fixed some of the open bugs on the way back home (at least the hours on the train were useful for something). The fixes are not yet available through SVN but I will commit them tomorrow after some testing.

For now I just want to thank all the users and supporters of simon who helped to make simon what it is today.

Looking forward to another great year!

ps.: Besides some bug-fixing I started to work on new command plugins that might *cough* pop up in the near future - stay tuned...