Re: Queryplan within FTS/GIN index -search.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> "Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> writes:
>> Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>>> Any sane text search application is going to try to filter out
>>> common words as stopwords; it's only the failure to do that that's
>>> making this run slow.
>  
>> I'd rather have the index used for the selective test, and apply
>> the remaining tests to the rows retrieved from the heap.
> 
> Uh, that was exactly my point.  Indexing common words is a waste.
 
Perhaps I'm missing something.  My point was that there are words
which are too common to be useful for index searches, yet uncommon
enough to usefully limit the results.  These words could typically
benefit from tsearch2 style parsing and dictionaries; so declaring
them as stop words would be bad from a functional perspective, yet
searching an index for them would be bad from a performance
perspective.
 
One solution would be for the users to rigorously identify all of
these words, include them on one stop word list but not another,
include *two* tsvector columns in the table (with and without the
"iffy" words), index only the one with the larger stop word list, and
generate two tsquery values to search the two different columns.  Best
of both worlds.  Sort of.  The staff time to create and maintain such
a list would obviously be costly and writing the queries would be
error-prone.
 
Second best would be to somehow recognize the "iffy" words and exclude
them from the index and the index search phase, but apply the check
when the row is retrieved from the heap.  I really have a hard time
seeing how the conditional exclusion from the index could be
accomplished, though.  Next best would be to let them fall into the
index, but exclude top level ANDed values from the index search,
applying them only to the recheck when the row is read from the heap. 
The seems, at least conceptually, like it could be done.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux