Simon: Open-Source Speech Recognition: Juni 2011

This year, we have been given the opportunity to work with two students as part of Googles annual Summer of Code. Adam is working on context dependent speech recognition (see below) and Alessandro is working on the Voxforge integration. Moreover, another student, Saurabh, is working on the Workspace integration as part of the Season of KDE.

So as kind of a start to hopefully a series of blog posts of our new contributers, I asked Adam to talk a bit about his progress and future plans about the context dependent speech recognition. This is what he wrote:

As part of the Google Summer of Code, I have been working to add
context-based activation and deactivation of scenarios in the KDE speech
recognition program simon. The simon program allows users to create or
download scenarios which, when activated, allow them to control other
programs such as web browsers, text editors, and games with speech
commands.

When the number of commands that must be considered for speech
recognition in simon becomes too large (for example, if the scenarios
that are active have a large number of possible commands), the speed and
accuracy of the speech recognition can suffer to the point of
unusability. Context-based activation and deactivation of scenarios will
allow scenarios to be deactivated when they are not needed (for example,
when the program that they control is not opened, or when the program is
not the active window) so that the number of commands being considered
by speech recognition will be kept low enough to ensure accuracy and
performance.

The context gathering system has been developed so that scenarios have a
"compound condition" which is a group of conditions under which the
scenario should activate. The compound condition becomes satisfied when
all of its conditions (which gather contexts) are satisfied. When the
compound condition becomes satisfied or unsatisfied, it communicates
this to its scenario, which then indicates to the scenario manager
whether or not it should be activated.

Compound conditions will be created with a user interface similar to
simon's command adding and editing interface. A scenario with no
conditions in its compound condition will always be active. This means
that any scenario made before this feature was added will maintain its
former functionality, but can be easily changed to (de)activate under
certain conditions.

The conditions of which the compound condition is composed are developed
as plugins (similarly to the command managers in simon), so it will be
easy to add new types of conditions. For example, one of the currently
developed plugins gathers information about running processes, so a
scenario can be activated under the condition that some process is
running or not running (for example a Rekonq scenario could have the
condition "'rekonq' is running"). The extensibility allowed by this
plugin system means that conditions such as "'Firefox' is the active
window" or "The user is connected to the internet" or "Fewer than 3
scenarios are currently active in simon" or any other type of condition
that could be determined by simon can be easily developed and used to
guide scenario activation and deactivation.

The next steps of my project include making the scenarios actually
activate and deactivate in response to conditions, making a parent/child
scenario relationship so that a single scenario can have child scenarios
with independent grammars and conditions (so that parts of the scenario
can be activated and deactivated independently), making more condition
plugins, and exploring the possibilities of what else simon would be
able to do with the contexts that it will be able to gather (for example
switching speech models based on the microphone that is being used).

Simon: Open-Source Speech Recognition

Mittwoch, 8. Juni 2011

GSoC Guest Post: Context Detection