Search Postgresql Archives

GIN : Working with term positions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,
I'm using a GIN index for a text column on a big table. I use it to rank the rows, but I also need to get the term positions for each document of a subset of documents. I assume these positions are stored in the index, because doc says positions can be used for cover density ranking and because to_tsvector function gives them :

select * from to_tsvector('I get lexemes and I get term positions.');
            to_tsvector
----------------------------------------
 'get':2,6 'lexem':3 'posit':8 'term':7

I can get the term positions with to_tsvector but only by parsing the result string, is there any more handy way ? Something like :
select * from term_and_positions('I get lexemes and I get term positions.');
 term    | positions
---------+-----------
   'get' |     {2,6}
 'lexem' |       {3}


Then, from the term positions, I need to get the character offset of these term positions. I assume it is NOT stored in the GIN index. By character offset I mean the character count from string begining to the term. For the previous example it would be : 'get' --> {2,20}.

I thought about using ts_headline to return the whole text with term tagged and then parse it to compute the character offsets from the tags. But this function is very slow, seems like it does not use the GIN index at all. And I suppose it can't because there is no way to know from a term position where its substring is in the text.

Now I think the only solution is to make my own C function parsing the text like to_tsvector does and counting terms AND characters read from the begining of the text to match them. I got a look on the code, and it does not seems easy to do because characters offset or string length are never used by the parsetext function (ts_parse.c). If you have any other suggestion, would love to hear that !

Regards, Yoann Moreau

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux