On Wed, Mar 25, 2009 at 6:17 PM, Olivier Galibert <galibert@xxxxxxxxx> wrote: > For speech recognition, software is only part of the problem and, > fundamentally, the easiest one (take the algorithms, implement them, > optimize/debug at will). The real problem is the data needed to build > the models to feed the algorithms. There isn't as far as I know any > reasonable set of corpus available under an open source license usable > to build a decent speech recognizer. Which makes open source speech > recognition something not doable yet. There are some small databases available [1], although admittedly too small for accurate general purpose use. There are some models available [2], built from databases which are not themselves redistributable. There are also a number of model-building tools available [3-5], which may be sufficient for small command-and-control tasks. But you are right. For general-purpose voice recognition, we don't have the data we need. Still, I think it may be worth putting the software in place so that those who wish to purchase licenses to commercial data have everything else they need, and to encourage the production of better quality free data [6]. References: [1] http://www.speech.cs.cmu.edu/databases/ [2] http://www.speech.cs.cmu.edu/sphinx/models/ [3] http://www.speech.sri.com/projects/srilm/ [4] http://cmusphinx.sourceforge.net/html/download.php#SphinxTrain [5] http://cmusphinx.sourceforge.net/html/download.php/#cmulclmtk [6] http://www.voxforge.org/ -- Jerry James http://loganjerry.googlepages.com/ -- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-devel-list