Re: Language detection with PHP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2007. 03. 27, kedd keltezéssel 15.06-kor William Lovaton ezt írta:
> Hi there,
> 
> I am trying to implement language detection with PHP for a web site I am
> trying to build.  The idea is to take a piece of text and try to guess
> the language it is written in.
> 
> I have two options but I'd like to know if you guys have a better idea.
> 
> 1) I implemented a detector using spell checking, so if I run the text
> through many spell checkers the one with less errors is probably the
> right language for that text.  It works quite well and I am pleased with
> it.  The only thing I don't like is that loading many spell checkers is
> a bit of a waste, it may require a lot of CPU and a lot of memory
> depending on the dictionary and the number of dictionaries you load.
> Besides, it adds one extra module dependency (pspell).
> 
> 2) The other option is implemented in PEAR and it's called
> Text_LanguageDetect:
> [] http://pear.php.net/package/Text_LanguageDetect
> 
> It seems to use a very different technique called N-Gram-Based Text
> Categorization, I haven't tested it yet but I will very soon and see how
> good it works, it says it's in alpha state but I guess it doesn't
> requiere pspell, doesn't consume a lot of memory and it should be fast.
> The only thing I am worried about is how accurate is it... I'll check
> soon and post my comments later.
> 
> 3) <Insert a very good idea here, please>
> 
> I'd really like to hear what different alternatives all of you have for
> this problem.
> 

I've definitely no experience with this problem, just guessing ;)

what if you build some arrays of language specific stuff and check for
that. I mean you could store stuff like "if it contains 's, 've, 'm many
times it's probably english"... I don't really know how to store those
rules, and I'm not sure they are good enough (or are there good enough
rules) to tell several languages apart...

greets
Zoltán Németh

> Thanks a lot,
> 
> 
> -William
> 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux