Freitag, 23. Januar 2009

A Tale of Poor Recognition Rates

Some of our german speaking users who tried the 0.2 series of simon might have noticed that the recognition rates are extremely poor. No matter how many trainings-samples, simon doesn't recognize a word.

While I never had that problem myself, I saw it myself on Mathias' notebook. Interestingly, using the "real" julius on the same model worked very well. The problem had to be in simon somewhere. And so the digging began...

The first thing I did was to compare julius log produced by simond and the one generated by the "real" julius. The only difference between them was the comma seperator: Julius used "." and simond used "," (which is correct as this was a german windows xp). Well that can't be it, can it?

After a bit lot of fiddling, I gave up and changed the locale to English/USA. And just like that things worked fine.

It turns out, that Julius respects the locales decimal point even when parsing the hmm model files. And as the HTK uses "." as it's seperator and julius expects "," the model is parsed incorrectly. That never happened to me, as my system locale is en_US.

So if you use any version of simon 0.2 with a system locale that uses any other comma seperator than ".", you will have mediocre recognition rates unless you open up the hmmdefs file (Windows: %appdata%\.kde\share\apps\simond\models\\active\hmmdefs; KDE: ~/.kde/share/apps/simond/models//active/hmmdefs) and replace "." with your locales decimal point or change your locale to English.

