Hello I recently stumbled on a slow query in my database that showed an odd behaviour related to the statistics of FTS queries. The query does a few joins «after» running a FTS query on a main table. The FTS query returns a few thousand rows, but the estimations are wrong, leading the optimizer to terrible plans compared to what should happen, and thus creates a far higher execution time. I managed to isolate the odd behaviour in a single query, and I would like your opinion about it. I have modified the table name, columns and query to hide sensitive values, but the issue remain the same. The table contains about 295,000 documents, and all is running under PostgreSQL 9.5. EXPLAIN ANALYZE SELECT COUNT(*) FROM documents WHERE to_tsvector('french', subject || ' ' || body) @@ plainto_tsquery('XXX'); Of course, there is an index on to_tsvector('french', subject || ' ' || body). That query gives me the following results for several values of XXX : Request | Estimated rows | Real rows ----------------------------------+----------------+----------- 'word1' | 38050 | 37500 'word1 word2' | 4680 | 32000 'word1 word2 word3' | 270 | 12300 'word1 word2 word3 word4' | 10 | 9930 'word1 word2 word3 word4 word5' | 1 | 9930 You can see that with more words in query, the estimation falls far behind reality. Is that a known limitation of the FTS indexing ? Am I missing something obvious, or a poor configuration ? Thanks a lot
Attachment:
signature.asc
Description: This is a digitally signed message part.