Search Postgresql Archives

Re: fts, compond words?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/8/05, Teodor Sigaev <teodor@xxxxxxxxx> wrote:
> > (a + foo1 + bar) | (a + foo2 + bar)
>
> That a simple case, what about languages as norwegian or german? They has
> compound words and ispell dictionary can split them to lexemes. But, usialy
> there is more than one variant of separation:
>
> forbruksvaremerkelov
>         forbruk vare merke lov
>         forbruk vare merkelov
>         forbruk varemerke lov
>         forbruk varemerkelov
>         forbruksvare merke lov
>         forbruksvare merkelov
> (notice: I don't know translation, just an example. When we working on compound
> word support we found word which has 24 variant of separation!!)
>
> So, query 'a + forbruksvaremerkelov' will be awful:
>
> a + ( (forbruk & vare & merke & lov) | (forbruk & vare & merkelov) | ... )
>
> Of course, that is examle just from mind, but solution of phrase search should
> work reasonably with such corner cases.
>

WARNING: What follows is wild, hand waving speculation as I don't
fully understand the implications of compound words! ;-)

My naive impression is that it would be both possible and a good idea
to stem any compound words to their versions containing the most
individual lexemes.  As an analogy, this would be similar to
transforming composed (Normalization Form C) UTF-8 characters into
their decomposed (Normalization Form D) versions.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux