Simon: Open-Source Speech Recognition: simon meets MeeGo

Freitag, 14. Oktober 2011

simon meets MeeGo

I'm happy to report that since August, I can now officially call myself a Qt Ambassador!

As an Ambassador, I had the opportunity to apply for a loaned Nokia N950 to develop / port applications to MeeGo/Harmattan. I took Nokia up on their offer and the result is simone - a trimmed down, mobile version of simon. In other words: "simon embedded" or "simone".

The client features push to talk or automatic voice activity detection (configurable) and because of simons client / server architecture uses little power on the device itself. Even with voice activity detection running you should get many hours of continuous speech recognition out of a single charge.

On the one hand, simone can be used to replace the headset of a "full" simon installation but also includes a couple of default actions on the device. For example, you can use a voice controlled quick dial feature or start / stop a turn-by-turn navigation.

For more information and a live demo, have a look at the youtube demonstration:

If you can't see the embedded video, try this direct link.

24 Kommentare:

Anonym hat gesagt…: Very nice!

I assume it is written in QT? Would it be hard to port it to other nokia devices? Like the N8? I am sure many users would appreciate simone.

Thanks; 15. Oktober 2011 um 04:54
Unknown hat gesagt…: Yes, it's written in Qt (the interface is QML).

It should be fairly trivial to port it to a Symbian device...

You can find the code on our repository:
http://speech2text.git.sourceforge.net/git/gitweb.cgi?p=speech2text/speech2text;a=tree;f=simone;h=721c40801c714bc1f5c600031ee73709655f9680;hb=refs/heads/sound

Best regards,
Peter; 15. Oktober 2011 um 11:45
Shantanu hat gesagt…: Cool, I'll try to check out the code and install it on my laptop today.
Also, I'll discuss with Plasma Active team on using Simon with Plasma Active on tablets :); 20. Oktober 2011 um 21:13
Unknown hat gesagt…: @Shantanu: Yes, I heard someone talking about simon on plasma active at the Desktop Summit as well (altough that was mostly joking I guess). If you do bring it up, please keep me in the loop, I'm quite interested in this!; 22. Oktober 2011 um 10:09
James (Jeffrey) T Wang hat gesagt…: Hi Peter,

I don't suppose you know how this compares to Sirri built into iP4S?

Is it potentially as sophisticated, or is it miles from reaching parity?

I noticed the project's been around for a few years now.

Thanks for any time you can spare.
All the best.; 6. November 2011 um 01:37
Anonym hat gesagt…: wow this is great ... can Iget a .deb someware or a do you have a repo ... I would love to try it :); 10. November 2011 um 21:51
Unknown hat gesagt…: @Jed: Sirri and simon are following different goals. The commands on the phone are really just a little extra to the client - mostly it's going to be used as an input note for a larger simon setup.

@Anonymous: There is no released deb yet as there are still some bugs left and I'm swamped with University right now. But if you want to try it out, I can simply send you a test build to use at your own risk. Just e-Mail us at support at simon-listens.org; 16. November 2011 um 01:03
James (Jeffrey) T Wang hat gesagt…: Hi Peter,

Thanks for the feed-back!

I don't understand your explanation about the differences.
Could you possibly explain in more detail?

Sorry for the delay in my response.
Unfortunately I didn't get an email once you replied.

All the best.; 25. November 2011 um 04:00
Unknown hat gesagt…: @Jed: The main difference is that simon allows (and to a certain extend expects) the user to train / adapt the model for his own use. As such, simon is a more personal solution and - because of this - doesn't allow recognition of such a vast array of words (the more generalized a model is, the more training data you have which makes it easier to recognize more words).

Because the recognition is not able to recognize "free" text (or at least a large enough vocabulary that it looks like it) we can't offer the same natural interaction patterns that siri can.

We can, however, offer language and dialect independent recognition.

Keep in mind that most of our target audience has speech impairments and / or pronunciation that differs from the norm (often the case for e.g. elderly people).

I hope that answers your question. If you'd like to find out more about simon, feel free to also get in touch with us per mail at support simon-listens°org.

Best regards,
Peter; 2. Dezember 2011 um 00:06
James (Jeffrey) T Wang hat gesagt…: @Peter,

Thank-you so much for the in-depth explanation!
That's unfortunate...

I was very excited when I saw the beta version for Maemo/MeeGo.
I was really hoping we might be getting something like Siri.

Do you know of any F/OSS projects that are more like Siri?
I searched everywhere, but all I could find was Simon.

Also....

When do you expect to finish Simon for Maemo6x?*
Do you intend to follow the SwipeUX guidelines?
Will be as functional as the desktop version?

Thanks again!
*meego-harmattan; 3. Dezember 2011 um 09:55
Unknown hat gesagt…: On Saturday 03 December 2011 17:55:52 you wrote:
> Do you know of any F/OSS projects that are more like Siri?
No, sorry. Creating a general, large vocabulary speech model that'd be required for such a project is very, very time consuming and costly.

You can use google's API (and their internal model) but that's hardly F/OSS...

> When do you expect to finish Simon for Maemo6x?
It really just requires some bug fixing and polishing but as it depends on the git version of simon it's not really "ready" until the next simon version is released. That's also the reason why it's not in the store (even marked as Beta or something).

So really it's a matter of asking when the next simon version is going to be ready and that's hard to answer. As soon as the features that are currently being developed (context dependence mostly) are in, I really, really want to get a release out the door. But that requires a lot of testing, documenting, etc. so it'll take a while, I'm afraid.

if you want to try it right now, I can provide you a deb, though. But be warned: It requires experimental software and is still fairly experimental itself :)

> Do you intend to follow the SwipeUX guidelines?
I've read and tried to adhere to Nokias UI guidelines, yes (for example the switch / checkbox distinction). If you spot something that's inconsistent, please let me know!

> Will be as functional as the desktop version?
No. What you saw in the video is pretty much all that's going to be available on the device. I don't think it makes a lot of sense to provide model training and grammar configuration on a smartphone...

Best regards,
Peter; 5. Dezember 2011 um 06:06
James (Jeffrey) T Wang hat gesagt…: ["No, sorry. Creating a general, large vocabulary speech model that'd be required for such a project is very, very time consuming and costly."]

Aw, bummer man! ;-P

["You can use google's API (and their internal model) but that's hardly F/OSS…"]

What's it called? I'll look into it a bit more…

["So really it's a matter of asking when the next simon version is going to be ready and that's hard to answer. As soon as the features that are currently being developed (context dependence mostly) are in, I really, really want to get a release out the door. But that requires a lot of testing, documenting, etc. so it'll take a while, I'm afraid.
if you want to try it right now, I can provide you a deb, though."]

I will gladly test it, just as soon as the White N9 is available in Australia.
It's like some kind of damn rare unicorn right now!
It's trickling out to stores in Finland now, so hopefully other countries will follow very soon.
It better be here by xmas, or I won't be having a white one (I'm in Qld Australia, so that's rare).

["I've read and tried to adhere to Nokias UI guidelines, yes (for example the switch / checkbox distinction).
If you spot something that's inconsistent, please let me know!"]

I will, just as soon as I can compare a White 64GB to a Black 64GB, then I'll buy, & then the testing/hacking will begin!

["No. What you saw in the video is pretty much all that's going to be available on the device."]

Plus the "context dependence" you're currently finishing for Simon?

["I don't think it makes a lot of sense to provide model training & grammar configuration on a smartphone…"]

LOL, I don't even know what that is, so I guess it shouldn't bother me!

Thanks mate!; 8. Dezember 2011 um 09:21
Unknown hat gesagt…: Hi Jed!

It looks that the Voice API of Googles Android isn't even really public and intended for general use. Sorry for getting your hopes up. I only had a quick look, tough, so I might have missed something - and even then I found a couple of un-official hacks to use it through the HTML5 element in chrome. Still, probably not the best way to go...

If you get your N9 and we still haven't had time to release it then, feel free to ask for a deb through support simon-listens.org!

The context dependence is intended for the desktop version only at this point. As the grammar won't be large enough I don't think it really makes sense on the device. If we start to see larger applications on the phone that would like to take advantage of that (speech recognition as a service), it'd be of course entirely possible to add that later on.

Best regards,
Peter; 14. Dezember 2011 um 09:15
James (Jeffrey) T Wang hat gesagt…: Thanks Peter, I will be in touch!
Could be a while....
The White N9 is like a unicorn in Australia :(

Seasons well wishes.; 15. Dezember 2011 um 12:12
James (Jeffrey) T Wang hat gesagt…: @Peter

How's SIMONE progressing nowadays?
Much more advanced now I imagine!!

Have you started a thread at TMO/FMC/N9-apps etc. so more users are aware & can test?
Have you placed builds on apps4meego so users can enable beta testing repo in the client, & provide regular feedback?

I've had a N9 for quite some time now & hopefully will have time to provide regular feedback soon.

Cheers!; 12. Juni 2012 um 23:46
Unknown hat gesagt…: Hi Jed,

To be entirely honest, Simone has been a bit neglected as of lately. That was mainly because a release did simply not make sense:
To make Simone work as it should, we needed to change some stuff in the backend protocol to the Simond server.
Because users are largely expected to set up their own server, we didn't want to release Simone before the accompanying Simond version was released.

This is new version of Simond (included in Simon 0.4) is now finally set to release this December.

After this we can push Simone to the appropriate Store, etc.

Best regards,
Peter; 11. November 2012 um 02:52
James (Jeffrey) T Wang hat gesagt…: @Peter

Oh so SIMONE is still on the radar...
But you won't have Beta builds for testing until January+ 2013?
Are you saying users will also need a server on their LAN to even use SIMONE?
As soon as you have builds ready for testing, make sure you start a thread at talk.maemo.org.*
Plenty of very technically competent people there.

Cheers
*if you're not sure where exactly & can tell you.; 11. November 2012 um 04:59
Unknown hat gesagt…: Well yeah, I'd like to publish a final once the needed Simond is public.

And yes, you need a server *somewhere*. It technically doesn't need to be a local one (the amount of data that's sent over this link is pretty small) but we don't have a public server available.

I don't think I'll do a big publicity campaign or anything (or even just announce it in talk.maemo.org) just because the scope of Simone is really limited. The commands on the device itself are pretty much only for demonstration purposes and Simon itself is not widespread enough that I can reasonably foresee a substantial community for Simone. Realistically, there are about a dozen or so people *really* interested in this at the moment - no need for marketing, they'll find it :)

Best regards,
Peter; 12. November 2012 um 00:40
James (Jeffrey) T Wang hat gesagt…: A thread at TMO is hardly marketing, & people won't just "find it".
You'll bring it to the attn of far more of your target audience, just by the simple act of starting a thread & engaging a little bit w/that community.
Through that process, there's also a high probability you'll get a usr or two who starts to regularly contribute to simond & or simone.; 12. November 2012 um 00:49
Unknown hat gesagt…: Hi Jed,

This is very much a question of what comprises the target audience. In this case, imho, it's people that use Simon already, know how it works and are looking for a sensible way to integrate relatively cheap mobile nodes.

As long as we haven't mastered dictation, the functionality on the device itself is just not powerful enough to compete with other offerings for the platform like the google speech app (a quick lookup shows "Quick Voice Input Keyboard" in the OVI store but I remember there to be a free app as well).

I agree that marketing is very important but I think the better approach here is to treat this not as a separate, stand-alone product, but as an accessory to Simon.

Best regards,
Peter; 12. November 2012 um 01:13
James (Jeffrey) T Wang hat gesagt…: Hi Peter,

I appreciate you point(s) but you still seem to be missing mine.
Engaging with the Maemo/MeeGo community* isn't a marketing exercise.
It's a prudent measure to connect w/a larger pool of users/devs who "on avg" are more technically competent than Android/iOS et al users.
If you get no contributors out of the whole exercise, at the very least you'll get some very enthusiastic users/testers.

Cheers.
*& arguably the single-best way to do so is via TMO.; 14. November 2012 um 01:32
James (Jeffrey) T Wang hat gesagt…: Anyway, I look forward to your final release of simond, & subsequently simone.
Once ready for wider testing (hopefully Q1 2013?) I hope to be among the 1st to "tinker" with it!

All the best.; 14. November 2012 um 01:38
Unknown hat gesagt…: Jed, Simone can now be found in the OVI store.

Best regards,
Peter; 19. Mai 2013 um 11:31
James (Jeffrey) T Wang hat gesagt…: Thanks for letting me know Peter...

Alas being in the Nokia Store does little if the community* isn't engaged, only way for that to occur, is to be active @TMO.
Being occasionally active in the relevant mail-lists & IRC channels can't hurt too.

I'll try it out soon & perhaps report my findings in 1 or more of those mediums.

All the best.
*Maemo/MeeGo/Nemo/Plasma/Sailfish/Qt etc; 29. August 2013 um 09:14

Kommentar veröffentlichen