Em Terça 27 Março 2007 17:33, Zoltán Németh escreveu: > 2007. 03. 27, kedd keltezéssel 15.06-kor William Lovaton ezt írta: > > Hi there, > > > > I am trying to implement language detection with PHP for a web site I am > > trying to build. The idea is to take a piece of text and try to guess > > the language it is written in. > > > > I have two options but I'd like to know if you guys have a better idea. > > > > 1) I implemented a detector using spell checking, so if I run the text > > through many spell checkers the one with less errors is probably the > > right language for that text. It works quite well and I am pleased with > > it. The only thing I don't like is that loading many spell checkers is > > a bit of a waste, it may require a lot of CPU and a lot of memory > > depending on the dictionary and the number of dictionaries you load. > > Besides, it adds one extra module dependency (pspell). > > > > 2) The other option is implemented in PEAR and it's called > > Text_LanguageDetect: > > [] http://pear.php.net/package/Text_LanguageDetect > > > > It seems to use a very different technique called N-Gram-Based Text > > Categorization, I haven't tested it yet but I will very soon and see how > > good it works, it says it's in alpha state but I guess it doesn't > > requiere pspell, doesn't consume a lot of memory and it should be fast. > > The only thing I am worried about is how accurate is it... I'll check > > soon and post my comments later. > > > > 3) <Insert a very good idea here, please> > > > > I'd really like to hear what different alternatives all of you have for > > this problem. > > I've definitely no experience with this problem, just guessing ;) > > what if you build some arrays of language specific stuff and check for > that. I mean you could store stuff like "if it contains 's, 've, 'm many > times it's probably english"... I don't really know how to store those > rules, and I'm not sure they are good enough (or are there good enough > rules) to tell several languages apart... > > greets > Zoltán Németh > > > Thanks a lot, > > > > > > -William Good tip!! =] Portuguese-Brazilian: ç, ã, õ, á, é, í, ó, ú, à, è, ì, ò, ù, ü -- Davi Vidal davividal@xxxxxxxxxxxxxxxx davividal@xxxxxxxxx -- Agora com fortune: "Take a lesson from the whale; the only time he gets speared is when he raises to spout." -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php