Search Postgresql Archives

Re: tsearch2 document and word limit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry, but no way except patching sources of tsearch2....

Tsearch2 (not GiST) has pointed limitations to save storage size mainly and to reduce rank calculation time. Our (oleg and me) expirience in search engines shows, that full positions info for long document hasn't a big importance to ranking.
Did you try normalize rank by length of document?


http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2-ref.html:
...
Both of these ranking functions take an integer normalization option that specifies whether a document's length should impact its rank. This is often desirable, since a hundred-word document with five instances of a search word is probably more relevant than a thousand-word document with five instances. The option can have the values:
* 0 (the default) ignores document length.
* 1 divides the rank by the logarithm of the length.
* 2 divides the rank by the length itself.
...




David Beavan wrote:
Hi

I have been toying with the implementation of tsearch2 to index some large text documents. I have run into problems where I am up against limits:

no more than 255 occurrences of a particular word are indexed.
word positions greater than 16384 are added as position 16384 and end up as one occurrence.


These are problematic because I need to rank based on number of word occurrences, and these limits are preventing this.

Does anybody have any suggestions as to how this could be worked around, is the limit due to gist? would openfts help (im guessing not)?

Failing that does anybody have experience of combining another text indexing package with postgresql?

Dave



---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@xxxxxxxxxxxxxx

-- Teodor Sigaev E-mail: teodor@xxxxxxxxx WWW: http://www.sigaev.ru/

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
     subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
     message can get through to the mailing list cleanly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux