I'll start small: simon now supports a "Power Training" mode which starts the recording immediatly as the text to say is shown. The recording is then, upon preceding to the next page, automatically stopped, saved and the next one starts. This simple change really makes training of large texts a lot faster!
Ok but that alone is not blog worthy, right? Right! One of the most awaited features has made it's appearance: Confidence scores.
The recognition server now provides information about how confident it was on the recognition result
Moreover it also not only provides simon with the most likely result but with the ten most probable ones. simon now ranks them based on the recognition confidence and can ignore them if the recognition was just not sure enough (with a configurable threshold).
Now the cool part: If two results (or more) are very likely and simon can not determine which one you meant, simon will simply display a nice list from which you can select (of course with your voice) what you meant.
This looks like this:
The feature is already quite stable and works well in combination with other plugins. There are of course safeguards in place to prevent recursive "did-you-mean-popups".
Of course the confidence scores of the results are also relayed to the plugins and if they want to they can even retrieve the whole list of recognition results including the phonetic transcription of the result. This brings even more flexibility to the plugin developers without making plugin development more complicated (the base classes have appropriate implementations that you don't need to overwrite if you don't want the additional information).
If you are running a svn snapshot and are upgrading: You will need to manually copy the julius.jconf file from `kde4-config --prefix`/share/apps/simond/default.jconf to ~/.kde/share/apps/simond/models/
4 Kommentare:
Cool!
Does this also give feedback to the recognition system? For example: when 1 is chosen does simon give a higher confidence to 1 for the next time the same is recognized?
Sorry, not at the moment.
This would be a very, very major change as we would have to record the samples and store them for recompiling the model.
And with the few samples that make up a typical simon based model, wrong recognition results would screw things up pretty quick...
@Peter
Couldn't you store it without recompiling model? And use it when the same phonemes and results appear?
For what? Only changing the model will influence the recognition.
And to answer your question: not really. Well I could hack something together but I would prefer to do it the clean way - which makes it not much less work than the "real thing".
Btw: I had to change the comment style as I couldn't post any comments myself (neither with Firefox nor Konqueror). I hope this commenting mode works better...
Kommentar veröffentlichen