Freitag, 14. Oktober 2011

simon meets MeeGo

I'm happy to report that since August, I can now officially call myself a Qt Ambassador!

As an Ambassador, I had the opportunity to apply for a loaned Nokia N950 to develop / port applications to MeeGo/Harmattan. I took Nokia up on their offer and the result is simone - a trimmed down, mobile version of simon. In other words: "simon embedded" or "simone".

The client features push to talk or automatic voice activity detection (configurable) and because of simons client / server architecture uses little power on the device itself. Even with voice activity detection running you should get many hours of continuous speech recognition out of a single charge.

On the one hand, simone can be used to replace the headset of a "full" simon installation but also includes a couple of default actions on the device. For example, you can use a voice controlled quick dial feature or start / stop a turn-by-turn navigation.

.

For more information and a live demo, have a look at the youtube demonstration:


If you can't see the embedded video, try this direct link.

15 Kommentare:

Anonym hat gesagt…

Very nice!

I assume it is written in QT? Would it be hard to port it to other nokia devices? Like the N8? I am sure many users would appreciate simone.

Thanks

Peter Grasch hat gesagt…

Yes, it's written in Qt (the interface is QML).

It should be fairly trivial to port it to a Symbian device...

You can find the code on our repository:
http://speech2text.git.sourceforge.net/git/gitweb.cgi?p=speech2text/speech2text;a=tree;f=simone;h=721c40801c714bc1f5c600031ee73709655f9680;hb=refs/heads/sound

Best regards,
Peter

J@mBeL hat gesagt…

Great work as always! Thumbs up Peter

Shantanu hat gesagt…

Cool, I'll try to check out the code and install it on my laptop today.
Also, I'll discuss with Plasma Active team on using Simon with Plasma Active on tablets :)

Peter Grasch hat gesagt…

@Shantanu: Yes, I heard someone talking about simon on plasma active at the Desktop Summit as well (altough that was mostly joking I guess). If you do bring it up, please keep me in the loop, I'm quite interested in this!

Jed hat gesagt…

Hi Peter,

I don't suppose you know how this compares to Sirri built into iP4S?

Is it potentially as sophisticated, or is it miles from reaching parity?

I noticed the project's been around for a few years now.

Thanks for any time you can spare.
All the best.

Anonym hat gesagt…

wow this is great ... can Iget a .deb someware or a do you have a repo ... I would love to try it :)

Peter Grasch hat gesagt…

@Jed: Sirri and simon are following different goals. The commands on the phone are really just a little extra to the client - mostly it's going to be used as an input note for a larger simon setup.

@Anonymous: There is no released deb yet as there are still some bugs left and I'm swamped with University right now. But if you want to try it out, I can simply send you a test build to use at your own risk. Just e-Mail us at support at simon-listens.org

Jed hat gesagt…

Hi Peter,

Thanks for the feed-back!

I don't understand your explanation about the differences.
Could you possibly explain in more detail?

Sorry for the delay in my response.
Unfortunately I didn't get an email once you replied.

All the best.

Peter Grasch hat gesagt…

@Jed: The main difference is that simon allows (and to a certain extend expects) the user to train / adapt the model for his own use. As such, simon is a more personal solution and - because of this - doesn't allow recognition of such a vast array of words (the more generalized a model is, the more training data you have which makes it easier to recognize more words).

Because the recognition is not able to recognize "free" text (or at least a large enough vocabulary that it looks like it) we can't offer the same natural interaction patterns that siri can.

We can, however, offer language and dialect independent recognition.

Keep in mind that most of our target audience has speech impairments and / or pronunciation that differs from the norm (often the case for e.g. elderly people).

I hope that answers your question. If you'd like to find out more about simon, feel free to also get in touch with us per mail at support simon-listens°org.

Best regards,
Peter

Jed hat gesagt…

@Peter,

Thank-you so much for the in-depth explanation!
That's unfortunate...

I was very excited when I saw the beta version for Maemo/MeeGo.
I was really hoping we might be getting something like Siri.

Do you know of any F/OSS projects that are more like Siri?
I searched everywhere, but all I could find was Simon.

Also....

When do you expect to finish Simon for Maemo6x?*
Do you intend to follow the SwipeUX guidelines?
Will be as functional as the desktop version?

Thanks again!
*meego-harmattan

Peter Grasch hat gesagt…

On Saturday 03 December 2011 17:55:52 you wrote:
> Do you know of any F/OSS projects that are more like Siri?
No, sorry. Creating a general, large vocabulary speech model that'd be required for such a project is very, very time consuming and costly.

You can use google's API (and their internal model) but that's hardly F/OSS...

> When do you expect to finish Simon for Maemo6x?
It really just requires some bug fixing and polishing but as it depends on the git version of simon it's not really "ready" until the next simon version is released. That's also the reason why it's not in the store (even marked as Beta or something).

So really it's a matter of asking when the next simon version is going to be ready and that's hard to answer. As soon as the features that are currently being developed (context dependence mostly) are in, I really, really want to get a release out the door. But that requires a lot of testing, documenting, etc. so it'll take a while, I'm afraid.

if you want to try it right now, I can provide you a deb, though. But be warned: It requires experimental software and is still fairly experimental itself :)

> Do you intend to follow the SwipeUX guidelines?
I've read and tried to adhere to Nokias UI guidelines, yes (for example the switch / checkbox distinction). If you spot something that's inconsistent, please let me know!

> Will be as functional as the desktop version?
No. What you saw in the video is pretty much all that's going to be available on the device. I don't think it makes a lot of sense to provide model training and grammar configuration on a smartphone...

Best regards,
Peter

Jed hat gesagt…

["No, sorry. Creating a general, large vocabulary speech model that'd be required for such a project is very, very time consuming and costly."]

Aw, bummer man! ;-P

["You can use google's API (and their internal model) but that's hardly F/OSS…"]

What's it called? I'll look into it a bit more…

["So really it's a matter of asking when the next simon version is going to be ready and that's hard to answer. As soon as the features that are currently being developed (context dependence mostly) are in, I really, really want to get a release out the door. But that requires a lot of testing, documenting, etc. so it'll take a while, I'm afraid.
if you want to try it right now, I can provide you a deb, though."]

I will gladly test it, just as soon as the White N9 is available in Australia.
It's like some kind of damn rare unicorn right now!
It's trickling out to stores in Finland now, so hopefully other countries will follow very soon.
It better be here by xmas, or I won't be having a white one (I'm in Qld Australia, so that's rare).

["I've read and tried to adhere to Nokias UI guidelines, yes (for example the switch / checkbox distinction).
If you spot something that's inconsistent, please let me know!"]

I will, just as soon as I can compare a White 64GB to a Black 64GB, then I'll buy, & then the testing/hacking will begin!

["No. What you saw in the video is pretty much all that's going to be available on the device."]

Plus the "context dependence" you're currently finishing for Simon?

["I don't think it makes a lot of sense to provide model training & grammar configuration on a smartphone…"]

LOL, I don't even know what that is, so I guess it shouldn't bother me!

Thanks mate!

Peter Grasch hat gesagt…

Hi Jed!

It looks that the Voice API of Googles Android isn't even really public and intended for general use. Sorry for getting your hopes up. I only had a quick look, tough, so I might have missed something - and even then I found a couple of un-official hacks to use it through the HTML5 element in chrome. Still, probably not the best way to go...

If you get your N9 and we still haven't had time to release it then, feel free to ask for a deb through support simon-listens.org!

The context dependence is intended for the desktop version only at this point. As the grammar won't be large enough I don't think it really makes sense on the device. If we start to see larger applications on the phone that would like to take advantage of that (speech recognition as a service), it'd be of course entirely possible to add that later on.

Best regards,
Peter

Jed hat gesagt…

Thanks Peter, I will be in touch!
Could be a while....
The White N9 is like a unicorn in Australia :(

Seasons well wishes.