Simon: Open-Source Speech Recognition: Simon: Usability

Dienstag, 1. Mai 2012

Simon: Usability

One of the simultaneously most important and challenging tasks for me has always been to keep Simon usable for the "average" user.

Yes, reading the manual is sometimes required but I still feel comfortable to say that users don't need to have in depth knowledge about speech recognition to build their own speech models with Simon - and that's something we've always been proud of.

However, the initial learning curve is undoubtedly a bit steep. So let's look at the interface that so often left new users baffled.

Analyzing Simons Interface

After the initial first run wizard (that sadly many new users seem to skip entirely) the following was the first screen that's shown to new users.

While very pretty (thanks to the Oxygen team), it only provided links to resources where users can find further help. The interface afforded absolutely no interaction pattern and left users stranded.


Simon 0.3.75: Main Screen

After a bit of looking around, the user would probably notice the "Wordlist", "Grammar", etc. tabs containing the components of the currently loaded scenario.
However, even if the user loaded scenarios in the first run wizard, all those tabs will be completely empty. That's because the user is looking at the "Standard" scenario - an empty default scenario. To change this, users are supposed to use the unlabeled drop down in the toolbar.

The reason for this weird interaction pattern was mainly because scenarios are a recent addition to Simon: They were only introduced in Simon 0.3 and while there was a huge amount of internal refactoring associated with that, the UI always felt a bit "tacked on".

So during the last month I was re-evaluating parts of Simons interface to make it more intuitive for new users.

First of all, I identified some principles I wanted to convey to the user and then designed the new interface around them:

Scenarios are opaque. Users can of course edit them if they want but the average user will probably never touch their components. In any case there is a strict hierarchy that must be maintained at all times: Scenario A (containing Components A), Scenario B (containing Components B), etc.
Base models are the easiest way to get started. If setting up Simon to use a static base model requires users to search for an archive on a wiki, download, extract it and to point Simon to individual files called cryptic names like "hmmdefs" or "tiedlist" then the interface has clearly failed. It must be easy and intuitive for users to create, share and use base models.
Around half of all recognition problems are Microphone related. For the voice activity detection (the part of Simon that separates "Speech" from "Silence") to work, the volume must be set correctly. Especially with ALSA forgetting volume levels this is often a source of problems of which the only symptom was that the recognition simply didn't work.

Obviously, the interface needed a major revamp. So over the last month I have been working on and off on some tweaks for what will become Simon 0.4.

The Result

The screenshot below shows the new Simon main screen.

Simon 0.3.80: Welcome Screen

But let's look at the changes individually.

Scenarios

There is now a prominent list of your currently used scenarios in the main screen.

The tabs showing the components of the scenario are gone and have been replaced with a little "Open <scenario name>" button.

Clicking it opens the scenario for editing. While in "edit mode", the overview is hidden. The "Back to overview"-bar drops down smoothly animated to draw the users attention.

Simon 0.3.80: Wordlist

Training

Next to the scenario list, Simons main screen now also shows a list of all available training-texts of the loaded scenarios. Clicking "Start Training" will start the standard trainings wizard without opening the "edit mode" of the scenario.

Selecting a trainings-text on the right also selects the scenario it belongs to on the left. This is done both as a visualization of which scenario will benefit the most from the training and as a matter of convenience: If the user wants to remove or add another related trainings-text (which would mean he'd need to "open" the scenario), the correct scenario is already selected.

Speech models

Speech models are now packaged into .sbm files ("Simon Base Model"). The package contains all the required model files as well as some meta data (name, model type and build date).

The welcome page shows information about the active model and, if available, the used base model.

Simon 0.3.80: Base Model Settings

The base model settings page provides a way to create the new sbm files from HTK model files ("Create from model files"). The currently used active model can be exported as sbm container to share or archive created models.

Additionally, I've already put in a request to add a new category to kde-files.org and am planning to enable speech model sharing through GHNS.

This package abstraction was also a big step towards supporting other backends next to HTK / Julius but I'll elaborate on that in a different blog post.

Recognition

Last but not least, the Simon main screen now permanently displays the current microphone volume.

The volume calibration widget has been improved to integrate the voice activity parameters and will now no longer require the user to tell it that the volume has been adjusted.

Simon 0.3.80: No applicable command for recognition result

The last recognized command is also displayed. If the command didn't trigger any action, Simon will now display a small note next to the recognized sentence to help scenario developers to track down problems.

Final Words

I am not a Usability expert by any means. Having spent so much time with the interface, I wouldn't have noticed a lot of the issues had it not been for the valuable feedback from the community. I especially want to thank Frederik Gladhorn and Bjoern Balzaks for their input.

The interface is of course still far from perfect. However, I'm quite happy about how the recent refactoring has turned out and am looking forward to more improvements in the future.

Have a suggestion or some feedback? Let me know in the comments!

4 Kommentare:

ahiemstra hat gesagt…: It's a good start. :)

One of the first things I notice however, is that you have two levels of tabs on the "Wordlist" page. This is generally considered to be bad usabilty. Might I suggest to remove the topmost tab bar and replace it with a vertical list similar to that used in the settings dialog? It is a pattern that is used more often in other applications as well and I personally consider it to be quite nice.

I would personally also try and reduce the amount of noise on the start page. Right now, it looks rather busy. Try changing the group boxes to flat for example and reducing the amount of text on the page. Also, try to get the alignment of items similar. For example, in the recognition group, the "Last recognition result" text is left aligned, whereas the Device: text is centre-aligned, creating an unbalanced group. The same applies to the "Training" group where the text and table are left-aligned but the button is right-aligned.

Speaking of which, why is the Training group separate from the Scenarios group? It seems to me they could be merged, which would avoid repeating information (the scenario name). Also, I would suggest to use a two-line list approach rather than a table, with the name and icon on the first line and the "Pages" and "Relevance" numbers on the second line. You could move the acoustic model group to the place where now the Training group is and make it possible to have a longer list of scenarios by extending the scenarios group down into the spot where the acoustic model group used to be.

Anyway, just some suggestions. For the rest, keep up the good work, nobody ever does UIs perfect in one go. :); 1. Mai 2012 um 16:22
Unknown hat gesagt…: Hi!

> Might I suggest to remove the topmost tab bar and replace it with a vertical list
Good idea! -> On my todo list.

> Try changing the group boxes to flat for example and reducing the amount of text on the page.
Flat group boxes do look better, you're right. There is no separation from the headline ("Welcome...") to the content then, however. I removed that and now it does really look a lot cleaner.

Other than that, I'm afraid there is no text I can safely remove without sacrificing either discoverability or functionality.

And yes, the program looks *very* busy as the volume calibration is active (and therefore constantly moving). But that's intentional: During "normal" operation, Simon is always hidden in the system tray. The only time it's visible is when the user changes his configuration.

> For example, in the recognition group, the "Last recognition result" text is left aligned, whereas the Device: text is centre-aligned, creating an unbalanced group.
That was actually just triple-spacing. Fixed.

> The same applies to the "Training" group where the text and table are left-aligned but the button is right-aligned.
Actually, I like it that way. It's like a "Next >" button in a wizard - and that's exactly how it should be interpreted.

> Speaking of which, why is the Training group separate from the Scenarios group?
That's actually not the scenario name but the name of the text. Scenarios can provide more than one trainings-text and they can be named arbitrarily. There is no 1:1 for Scenario <> Text.

Thanks for all the suggestions! That's what the welcome page looks like now: http://wstaw.org/m/2012/05/02/plasma-desktopsa4123.png

Best regards,
Peter; 1. Mai 2012 um 23:44
ahiemstra hat gesagt…: You are welcome and I'm glad you could use some of them at least. :)

> And yes, the program looks *very* busy as the volume calibration is active (and therefore constantly moving). But that's
> intentional: During "normal" operation, Simon is always hidden in the system tray. The only time it's visible is when the user
> changes his configuration.

Which is of course fine, just keep in mind that you don't want to overload that page with information, even if it is not open too often. :) You are right that there is not a lot of unnecessary text there and with the changes to the group boxes it looks a lot cleaner.

> That's actually not the scenario name but the name of the text. Scenarios can provide more than one trainings-text
> and they can be named arbitrarily. There is no 1:1 for Scenario <> Text.

Ah, right, I did not realise that. Then it makes sense to keep them separate indeed.

> Actually, I like it that way. It's like a "Next >" button in a wizard - and that's exactly how it should be interpreted.

Hmm, that works yes. One comment then would be that it might be a good idea to swap the "Open Scenario" and "Manage Scenarios" buttons. The Open button has a direct relation to the content above it, whereas the Manage button does not. Looking at it from a dialog point of view, personally I would expect "Next" to do something with the content.

Speaking of which, I notice there is a slight difference in the language used on the buttons in the top row in comparison with those on the bottom row. The buttons on the top row imply something will happen when you push them whereas the bottom ones do not. Using "Configure Acoustic Model" and "Configure Audio" would match the top row and also match better with KDE's language, for example the actions in many a Settings menu.

> That's what the welcome page looks like now: http://wstaw.org/m/2012/05/02/plasma-desktopsa4123.png

Alright, one last nitpick: There seems to be some additional spacing around the four central boxes which causes them to be indented compared to the menu and the status bar, quite noticeable on the right-bottom corner. Just something you might want to look at. :)

P.S.: Sorry if I sound a little nitpicky. :); 3. Mai 2012 um 16:09
Unknown hat gesagt…: I don't know why but I only got your last comment as an email notification. Something wrong with blogger perhaps but on the off chance you deleted it for some reason I don't want to repost it without your consent so here is just my reply:
I reworded and rearranged the buttons, thanks for the suggestion.

And about the spacing around the widgets: The welcome screen is just a page in a tab widget (the one which is then filled up with wordlist / grammar / etc). The spacing issue should pretty much disappear once I refactor the tab widget into something nicer :)

And I love your nitpicking. I'm sure you'll find more stuff to fix in Simon. You know where to find the code, right? Go ahead!

Best regards,
Peter; 4. Mai 2012 um 11:55

Kommentar veröffentlichen