this is a rather different use case than what you've been thinking of
for KVM. It could mean significant improvement of the quality of life of
disabled programs like myself. It's difficult to convey what it's like
to try to use computers with speech recognition for something other than
writing so, bear with me when I say something is real but don't quite
prove it yet. also, please take it as read that the only really usable
speech recognition environment out there is NaturallySpeaking with
Google close behind in terms of accuracy but not even in the same planet
for ability to extend for speech enabled applications.
I'm trying to figure out ways of making it possible to drive Linux from
Windows speech recognition (NaturallySpeaking). The goal is a system
where Windows runs in a virtual machine (Linux host), audio is passed
through from a USB headset to the Windows environment. And the output of
the recognition engine is piped through some magic back to the Linux host.
the hardest part of all of this without question is getting clean
uninterrupted audio from the USB device all the way through to the
Windows virtual machine. virtual box, VMware fail mostly in delivering
reliable audio to the virtual machine.
I expect KVM to not work right with regards to getting clean
audio/real-time USB but I'm asking in case I'm wrong. if it doesn't work
or can't work yet, what would it take to make it possible for clean
audio to be passed through to a guest?
--- Why this is important, approaches that failed, why think this will
work. Boring accessibility info ---
The history of trying to make Windows or DOS based speech recognition
drive Linux has a long and tortured history. almost all of them involve
some form of an open loop system that ignores system context and counts
on the grammar to specify the context and the subsequent keystrokes
injected into the target system.
This model fails because it effectively speaking keyboard functions
which wastes the majority of the power of a good grammar in a speech
recognition environment.
Most common configuration for speech recognition in a virtualized
environment today is that Windows is the host with speech recognition
and Linux is the guest. It's just a reimplementation of the open-loop
system described above where your dictation results are keystrokes
injected into the virtual machine console window. Sometimes works,
sometimes drops characters.
One big failing of the Windows host/Linux guest environments is in
addition to dropping characters,it seems to drop segments of the audio
stream on the Windows side. It's common but not frequent for this to
happen anyway when running Windows with any sort of CPU utilization but
it's almost guaranteed as soon as a virtual machine starts up.
Another failing is that the context the recognition application is aware
of is the window of the console. It knows nothing about the internal
context of the virtual machine (what application has focus). And
unfortunately it can't know anything more because of the way that
NaturallySpeaking uses the local Windows context.
Inverting the relationship between guest and host where Linux is the
host and Windows is the guest solves at least the focus problem. In the
virtual machine, you have a portal application the canal control the
perception of context and tunnels the character stream from the
recognition engine into the host OS to drive it open loop. The portal
application[1] can also communicate which grammar sequence has been
parsed and what action should be taken on the host site. At this point,
we now have the capabilities of a closed-loop speech recognition
environment where a grammar can read context to generate a new grammar
to fit the applications state. This means smaller utterances which can
be disambiguated versus the more traditional large utterance
disambiguation technique.
A couple other advantages of Windows as a guest is that it only run
speech recognition in the portal. There's no browsers, no flash,
JavaScript, viruses and other "stuff" taking up resources and
distracting from speech recognition working as well as possible. The
downside is that the host running the virtual machine needs to make the
VM very high almost real-time priority[2] so that it doesn't stall and
speech recognition works as quickly and as accurately as possible.
Hope I didn't bore you too badly. Thank you for reading and I hope we
can make this work.
--- eric
[1] should I call it cake?
[2] I'm looking at you Firefox, sucking down 30% of the CPU doing nothing
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html