Search Postgresql Archives

Re: [GENERAL] FTS query, statistics and planner estimations…

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





2016-11-09 11:19 GMT+01:00 Pierre Ducroquet <pierre.ducroquet@xxxxxxxxxxxxxx>:
On Wednesday, November 9, 2016 10:40:10 AM CET Francisco Olarte wrote:
> Pierre:
>
> On Wed, Nov 9, 2016 at 10:22 AM, Pierre Ducroquet
>
> <pierre.ducroquet@people-doc.com> wrote:
> > The query does a few joins «after» running a FTS query on a main table.
> > The FTS query returns a few thousand rows, but the estimations are wrong,
> > leading the optimizer to terrible plans compared to what should happen,
> > and
> > thus creates a far higher execution time.
>
> ....
>
> > but the issue remain the same. The table contains about 295,000 documents,
> > and
> ....
>
> >  Request                          | Estimated rows | Real rows
> >
> > ----------------------------------+----------------+-----------
> > 'word1'                           | 38050          | 37500
> > 'word1 word2'                     | 4680           | 32000
> > 'word1 word2 word3'               | 270            | 12300
> > 'word1 word2 word3 word4'         | 10             | 9930
> > 'word1 word2 word3 word4 word5'   | 1              | 9930
> >
> > You can see that with more words in query, the estimation falls far behind
> > reality.
>
> I'm not really familiar with FTS but, doing a few division of
> estimations and rows it seems it estimates as uncorrelated words, and
> you real rows clearly indicate some of them are clearly correlated (
> like w1/w2 and w4/s5, and partially w3/w45 ) and very common.
>
> > Is that a known limitation of the FTS indexing ? Am I missing something
> > obvious, or a poor configuration ?
>
> Someone more familiar with it needed for that, but what I've found
> several times is FTS does not mix too well with relational queries at
> the optimizer level ( as FTS terms can have very diverse degrees of
> correlation, which is very difficult to store in the statistics a
> relational optimizer normally uses ).

Indeed the words in the query are correlated, but I do hope that the FTS
indexing is able to cope with that. Otherwise it makes it far less usable than
what one would expect since real world queries will often contain sentences or
related words. Also, PostgreSQL 9.6 introduced phrase search in FTS, and I
don't see how that would work without a working multi-words query.

The PostgreSQL statistics are not multidimensional - so bad estimation is expected :(

Regards

Pavel


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux