Yes, reading the manual is sometimes required but I still feel comfortable to say that users don't need to have in depth knowledge about speech recognition to build their own speech models with Simon - and that's something we've always been proud of.
However, the initial learning curve is undoubtedly a bit steep. So let's look at the interface that so often left new users baffled.
Analyzing Simons InterfaceAfter the initial first run wizard (that sadly many new users seem to skip entirely) the following was the first screen that's shown to new users.
While very pretty (thanks to the Oxygen team), it only provided links to resources where users can find further help. The interface afforded absolutely no interaction pattern and left users stranded.
|Simon 0.3.75: Main Screen|
However, even if the user loaded scenarios in the first run wizard, all those tabs will be completely empty. That's because the user is looking at the "Standard" scenario - an empty default scenario. To change this, users are supposed to use the unlabeled drop down in the toolbar.
The reason for this weird interaction pattern was mainly because scenarios are a recent addition to Simon: They were only introduced in Simon 0.3 and while there was a huge amount of internal refactoring associated with that, the UI always felt a bit "tacked on".
So during the last month I was re-evaluating parts of Simons interface to make it more intuitive for new users.
First of all, I identified some principles I wanted to convey to the user and then designed the new interface around them:
- Scenarios are opaque. Users can of course edit them if they want but the average user will probably never touch their components. In any case there is a strict hierarchy that must be maintained at all times: Scenario A (containing Components A), Scenario B (containing Components B), etc.
- Base models are the easiest way to get started. If setting up Simon to use a static base model requires users to search for an archive on a wiki, download, extract it and to point Simon to individual files called cryptic names like "hmmdefs" or "tiedlist" then the interface has clearly failed. It must be easy and intuitive for users to create, share and use base models.
- Around half of all recognition problems are Microphone related. For the voice activity detection (the part of Simon that separates "Speech" from "Silence") to work, the volume must be set correctly. Especially with ALSA forgetting volume levels this is often a source of problems of which the only symptom was that the recognition simply didn't work.
The ResultThe screenshot below shows the new Simon main screen.
|Simon 0.3.80: Welcome Screen|
ScenariosThere is now a prominent list of your currently used scenarios in the main screen.
The tabs showing the components of the scenario are gone and have been replaced with a little "Open <scenario name>" button.
Clicking it opens the scenario for editing. While in "edit mode", the overview is hidden. The "Back to overview"-bar drops down smoothly animated to draw the users attention.
|Simon 0.3.80: Wordlist|
TrainingNext to the scenario list, Simons main screen now also shows a list of all available training-texts of the loaded scenarios. Clicking "Start Training" will start the standard trainings wizard without opening the "edit mode" of the scenario.
Selecting a trainings-text on the right also selects the scenario it belongs to on the left. This is done both as a visualization of which scenario will benefit the most from the training and as a matter of convenience: If the user wants to remove or add another related trainings-text (which would mean he'd need to "open" the scenario), the correct scenario is already selected.
Speech modelsSpeech models are now packaged into .sbm files ("Simon Base Model"). The package contains all the required model files as well as some meta data (name, model type and build date).
The welcome page shows information about the active model and, if available, the used base model.
|Simon 0.3.80: Base Model Settings|
Additionally, I've already put in a request to add a new category to kde-files.org and am planning to enable speech model sharing through GHNS.
This package abstraction was also a big step towards supporting other backends next to HTK / Julius but I'll elaborate on that in a different blog post.
RecognitionLast but not least, the Simon main screen now permanently displays the current microphone volume.
The volume calibration widget has been improved to integrate the voice activity parameters and will now no longer require the user to tell it that the volume has been adjusted.
|Simon 0.3.80: No applicable command for recognition result|
The last recognized command is also displayed. If the command didn't trigger any action, Simon will now display a small note next to the recognized sentence to help scenario developers to track down problems.
Final WordsI am not a Usability expert by any means. Having spent so much time with the interface, I wouldn't have noticed a lot of the issues had it not been for the valuable feedback from the community. I especially want to thank Frederik Gladhorn and Bjoern Balzaks for their input.
The interface is of course still far from perfect. However, I'm quite happy about how the recent refactoring has turned out and am looking forward to more improvements in the future.
Have a suggestion or some feedback? Let me know in the comments!