Language detection with PHP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi there,

I am trying to implement language detection with PHP for a web site I am
trying to build.  The idea is to take a piece of text and try to guess
the language it is written in.

I have two options but I'd like to know if you guys have a better idea.

1) I implemented a detector using spell checking, so if I run the text
through many spell checkers the one with less errors is probably the
right language for that text.  It works quite well and I am pleased with
it.  The only thing I don't like is that loading many spell checkers is
a bit of a waste, it may require a lot of CPU and a lot of memory
depending on the dictionary and the number of dictionaries you load.
Besides, it adds one extra module dependency (pspell).

2) The other option is implemented in PEAR and it's called
Text_LanguageDetect:
[] http://pear.php.net/package/Text_LanguageDetect

It seems to use a very different technique called N-Gram-Based Text
Categorization, I haven't tested it yet but I will very soon and see how
good it works, it says it's in alpha state but I guess it doesn't
requiere pspell, doesn't consume a lot of memory and it should be fast.
The only thing I am worried about is how accurate is it... I'll check
soon and post my comments later.

3) <Insert a very good idea here, please>

I'd really like to hear what different alternatives all of you have for
this problem.

Thanks a lot,


-William

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux