Re: anti-join chosen even when slower than old plan

Kenneth Marshall <ktm@xxxxxxxx> · Thu, 11 Nov 2010 07:51:22 -0600

On Wed, Nov 10, 2010 at 10:47:21PM -0500, Robert Haas wrote:
> On Wed, Nov 10, 2010 at 6:07 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> > "Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> writes:
> >> Robert Haas <robertmhaas@xxxxxxxxx> wrote:
> >>> Unfortunately, to know how much data we're going to grovel
> >>> through, we need to know the plan; and to decide on the right
> >>> plan, we need to know how much data we're going to grovel through.
> >
> >> And that's where they've been ending.
> >
> >> The only half-sane answer I've thought of is to apply a different
> >> cost to full-table or full-index scans based on the ratio with
> >> effective cache size.
> 
> Kevin, yes, good point.  Bravo!  Let's do that.  Details TBD, but
> suppose effective_cache_size = 1GB.  What we know for sure is that a 4
> GB table is not going to be fully cached but a 4 MB table may well be.
>  In fact, I think we should assume that the 4 MB table IS cached,
> because the point is that if it's used at all, it soon will be.  It's
> almost certainly a bad idea to build a plan around the idea of
> minimizing reads from that 4 MB table in favor of doing a substantial
> amount of additional work somewhere else.  I suppose this could break
> down if you had hundreds and hundreds of 4 MB tables all of which were
> accessed regularly, but that's an unusual situation, and anyway it's
> not clear that assuming them all uncached is going to be any better
> than assuming them all cached.
> 
> > This might have some connection to some rather half-baked ideas I've
> > been having in connection with the generalized-inner-indexscan problem.
> > I don't have anything in the way of a coherent presentation to make yet,
> > but the thing I'm being forced to realize is that sane modeling of a
> > complex subplan that's on the inside of a nestloop join requires
> > treating *every* scan type as having different costs "the first time"
> > versus "during rescan". ?If the total amount of data touched in the
> > query is less than effective_cache_size, it's not unreasonable to
> > suppose that I/O costs during rescan might be zero, even for a seqscan or
> > a non-parameterized indexscan. ?In fact, only parameterized indexscans
> > would be able to touch pages they'd not touched the first time, and so
> > they ought to have higher not lower rescan costs in this environment.
> > But once the total data volume exceeds effective_cache_size, you have to
> > do something different since you shouldn't any longer assume the data is
> > all cached from the first scan. ?(This isn't quite as hard as the case
> > you're talking about, since I think the relevant data volume is the sum
> > of the sizes of the tables used in the query; which is easy to
> > estimate at the start of planning, unlike the portion of the tables
> > that actually gets touched.)
> 
> Well, we don't want the costing model to have sharp edges.
> effective_cache_size can't be taken as much more than an educated
> guess, and what actually happens will depend a lot on what else is
> going on on the system.  If only one query is running on a system at a
> time and it is repeatedly seq-scanning a large table, the cost of
> reading pages in will be very small until the table grows large enough
> that you can't fit the whole thing in memory at once, and then will
> abruptly go through the roof.  But realistically you're not going to
> know exactly where that edge is going to be, because you can't predict
> exactly how much concurrent activity there will be, for example, or
> how much work_mem allocations will push out of the OS buffer cache.
> So I'm thinking we should start the costs at something like 0.05/0.05
> for tables that are much smaller than effective_cache_size and ramp up
> to 4/1 for tables that are larger than effective_cache_size.  Maybe
> just by linearly ramping up, although that has a certain feeling of
> being without mathemetical soundness.
> 
> > An idea that isn't even half-baked yet is that once we had a cost model
> > like that, we might be able to produce plans that are well-tuned for a
> > heavily cached environment by applying the "rescan" cost model even to
> > the first scan for a particular query. ?So that might lead to some sort
> > of "assume_cached" GUC parameter, and perhaps Kevin could tune his
> > reporting queries by turning that off instead of messing with individual
> > cost constants.
> 
> I think the real goal here should be to try to avoid needing a GUC.  A
> lot of people could benefit if the system could make some attempt to
> recognize on its own which queries are likely to be cached.  We
> already have parameters you can hand-tune for each query as necessary.
>  Being able to set some parameters system-wide and then get sensible
> behavior automatically would be much nicer.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 

I agree with the goal of avoiding the need for a GUC. This needs to
be as automatic as possible. One idea I had had was computing a value
for the amount of cache data in the system by keeping a sum or a
weighted sum of the table usage in the system. Smaller tables and
indexes would contribute a smaller amount to the total, while larger
indexes and tables would contribute a larger amount. Then by comparing
this running total to the effective_cache_size, set the random and
sequential costs for a query. This would allow the case of many 4MB
tables to favor disk I/O more than memory I/O. The weighting could
be a function of simultaneous users of the table. I know this is a
bit of hand-waving but some sort of dynamic feedback needs to be
provided to the planning process as system use increases.

Regards,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance