Simon: Open-Source Speech Recognition: Astromobile: Introducing simontouch

Mittwoch, 29. Februar 2012

Astromobile: Introducing simontouch

Some of you might remember the announcement of the Astromobile project a while back.

Part of the project was a voice- and touchscreen controlled kiosk software running on the robot.

Initially we were thinking about continuing our XBMC based solution, but soon decided to start from scratch.
XBMC is a great media center but it didn't fit very well with the rest of our solution.

So more out of necessity instead of huge aspirations, we decided to write a small, purpose built software called Simontouch that should - among other features - combine simple multimedia playback with communication features (phone and email).

Simontouch (to be found in the simon-tools repository) uses a QML user interface, Phonon powered video and audio playback, voice and video calling provided by Skype and a simple email client powered by Akonadi and Nepomuk.

Direct link to video

Meanwhile, our colleagues at the Scuola Superiore Sant'Anna have been working on top-notch localization and navigation as well as a great design for the robot:

Direct link to video

Our next trip to Pisa is scheduled for the middle of March and we're planning to bring all this technology together for a state of the art assistive robot - powered by KDE.

By the way: We are planning to take part in GSoC again this year. If you have any cool ideas regarding Simon or KDE Accessibility in general, check out the ideas page!

2 Kommentare:

eliasp hat gesagt…: How does Simon make a difference between you commanding it by saying something like "Up" and simply using "up" in the middle of a sentence?

Does it only take orders when you didn't say something for a certain amount of time?

Besides that: simply impressive!

The main interface would be nicer IMHO, if you would use a single icon for each category instead of the current "icon crowd".; 29. Februar 2012 um 12:51
Unknown hat gesagt…: Hi Kevin!
To be honest I wasn't going to submit a talk for GLT this year (spoke there the last 3 years, though). Doing one specifically about the Astromobile project might be a good idea, though. We'll see :)

Hi eliasp!
Basically, the audio stream is segmented before it reaches the recognition (and subsequently the actions / commands). This is done through a relatively simple level based algorithm. So yeah: Short pauses between utterances are necessary (the default is around 500 ms but that can be configured).
I don't know I kinda like the "icon crowd" :). It's quite hard to find concise icons for some section. Still, I'll pass the suggestion along to the rest of the team and ask them what they think. Thanks for the constructive feedback, though!; 2. März 2012 um 09:20

Kommentar veröffentlichen