Re: anti-join chosen even when slower than old plan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



12.11.10 12:56, CÃdric Villemain ÐÐÐÐÑÐÐ(ÐÐ):
I supposed it was an answer to my mail but not sure... please keep
CC'ed people, it is easier to follow threads (at least for me)
OK
2010/11/12 Vitalii Tymchyshyn<tivv00@xxxxxxxxx>:
I'd say there are two Qs here:

1) Modify costs based on information on how much of the table is in cache.
It would be great  if this can be done, but I'd prefer to have it as admin
knobs (because of plan stability). May be both admin and automatic ways can
be followed with some parallel (disableable) process modify knobs on admin
behalf. In this case different strategies to automatically modify knobs can
be applied.
OS cache is usualy stable enough to keep your plans stable too, I think.
Not if it is on edge. There are always edge cases where data fluctuates near some threshold.
2) Modify costs for part of table retrieval. Then you need to define "part".
Current ways are partitioning and partial indexes. Some similar to partial
index thing may be created, that has only "where" clause and no data. But
has statistics and knobs (and may be personal bufferspace if they are
introduced). I don't like to gather data about "last X percents" or like,
because it works only in clustering and it's hard for optimizer to decide if
it will be enough to scan only this percents for given query.
Modifying random_page_cost and sequential_page_cost thanks to
statistics about cached blocks can be improved if we know the
distribution.

It does not mean : we know we have last 15% in cache, and we are goign
to request those 15%.

You mean *_cost for the whole table, don't you? That is case (1) for me.
Case (2) is when different cost values are selected based on what portion of table is requested in the query. E.g. when we have data for the whole day in one table, data for the last hour is cached and all the other data is not. Optimizer then may use different *_cost for query that requires all the data and for query that requires only last hour data. But, as I've said, that is much more complex task then (1).

Best regards, Vitalii Tymchyshyn


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux