FTS query, statistics and planner estimations…

Pierre Ducroquet <pierre.ducroquet@xxxxxxxxxxxxxx> · Wed, 09 Nov 2016 10:22:53 +0100

Hello

I recently stumbled on a slow query in my database that showed an odd 
behaviour related to the statistics of FTS queries.
The query does a few joins «after» running a FTS query on a main table.
The FTS query returns a few thousand rows, but the estimations are wrong, 
leading the optimizer to terrible plans compared to what should happen, and 
thus creates a far higher execution time.
I managed to isolate the odd behaviour in a single query, and I would like 
your opinion about it.

I have modified the table name, columns and query to hide sensitive values, 
but the issue remain the same. The table contains about 295,000 documents, and 
all is running under PostgreSQL 9.5.

EXPLAIN ANALYZE 
SELECT COUNT(*) 
FROM documents 
WHERE 
    to_tsvector('french', subject || ' ' || body) @@ plainto_tsquery('XXX');

Of course, there is an index on to_tsvector('french', subject || ' ' || body).

That query gives me the following results for several values of XXX :

 Request                          | Estimated rows | Real rows
----------------------------------+----------------+-----------
'word1'                           | 38050          | 37500
'word1 word2'                     | 4680           | 32000
'word1 word2 word3'               | 270            | 12300
'word1 word2 word3 word4'         | 10             | 9930
'word1 word2 word3 word4 word5'   | 1              | 9930

You can see that with more words in query, the estimation falls far behind 
reality.

Is that a known limitation of the FTS indexing ? Am I missing something 
obvious, or a poor configuration ?

Thanks a lot
Attachment:
signature.asc

Description: This is a digitally signed message part.