The Google Summer of Code application period for students closes in a couple of days and I still have one last idea for simon for any student still looking for a project: Ubiquitous Speech Recognition.
Some of you might already know that simon already supports recording (and recognizing) from multiple microphones simultaneously. Sound cards and microphones are comparatively cheap and the server / client architecture of simon would even allow for input from mobile phones, other PCs, etc.
We also have gadgets and home appliances getting smarter and smarter every year. KNX is getting increasingly popular, is already included in many new electrical installations and allows home automation for a very fair price.
Voice control is an intuitive way to interact with all kinds of devices and - compared to alternatives like touch screens and the like - also quite cheap. simon already has more than enough interfaces to connect up your favorite home automation controllers / hardware interfaces. Something that people are already doing.
However, speech recognition has traditionally relied on controlled environments. False-positives are still a major issue and recognition accuracy depends on being optimized for a certain situation.
Still: Adapting the recognition to certain situations is already part of another GSoC idea (that fortunately already has a very promising student attached to it) so that leaves the voice activity detection part as the remaining hassle.
The voice activity detection (in short: VAD) tells the system when to listen to the user and tries to distinguish between background noise and user input. Normally this is just one comparatively minor part in a speech recognition system but when your whole apartment (or at least parts of it) are listening for voice input this becomes kind of important :).
The current system in simon just compares the current "loudness" to a configurable threshold. This is fine for headset users but almost useless in the above scenario.
And here is where it's your turn to get creative: Try to find a novel approach to separate voice commands from background noise.
For example: Use webcams and computer vision algorithms to determine if the user is even near a microphone at the time of the heard "command".
You could also define "eye contact" with a camera as the signal to activate the recognition. Or maybe you could deactivate the system unless the user raises his hand before he speaks?
Another idea would be to let different microphones work together and subtract the similarities (to filter out global noise).
You can also use noise conditioning to remove the music playing over the PC speakers automatically from the input signal.
Or why not use the reception strength of the users bluetooth phone to determine in which room he currently is?
Bonus points for coming up with other ideas in the comment section!