Re: anti-join chosen even when slower than old plan

Robert Haas <robertmhaas@xxxxxxxxx> · Thu, 11 Nov 2010 15:29:40 -0500

On Thu, Nov 11, 2010 at 2:35 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> Robert Haas <robertmhaas@xxxxxxxxx> writes:
>> Yeah.  For Kevin's case, it seems like we want the caching percentage
>> to vary not so much based on which table we're hitting at the moment
>> but on how much of it we're actually reading.
>
> Well, we could certainly take the expected number of pages to read and
> compare that to effective_cache_size.  The thing that's missing in that
> equation is how much other stuff is competing for cache space.  I've
> tried to avoid having the planner need to know the total size of the
> database cluster, but it's kind of hard to avoid that if you want to
> model this honestly.

I'm not sure I agree with that.  I mean, you could easily have a
database that is much larger than effective_cache_size, but only that
much of it is hot.  Or, the hot portion could move around over time.
And for reasons of both technical complexity and plan stability, I
don't think we want to try to model that.  It seems perfectly
reasonable to say that reading 25% of effective_cache_size will be
more expensive *per-page* than reading 5% of effective_cache_size,
independently of what the total cluster size is.

> Would it be at all workable to have an estimate that so many megs of a
> table are in cache (independently of any other table), and then we could
> scale the cost based on the expected number of pages to read versus that
> number?  The trick here is that DBAs really aren't going to want to set
> such a per-table number (at least, most of the time) so we need a
> formula to get to a default estimate for that number based on some simple
> system-wide parameters.  I'm not sure if that's any easier.

That's an interesting idea.  For the sake of argument, suppose we
assume that a relation which is less than 5% of effective_cache_size
will be fully cached; and anything larger we'll assume that much of it
is cached.  Consider a 4GB machine with effective_cache_size set to
3GB.  Then we'll assume that any relation less than 153MB table is
100% cached, a 1 GB table is 15% cached, and a 3 GB table is 5%
cached.  That doesn't seem quite right, though: the caching percentage
drops off very quickly after you exceed the threshold.

*thinks*

I wondering if we could do something with a formula like 3 *
amount_of_data_to_read / (3 * amount_of_data_to_read +
effective_cache_size) = percentage NOT cached.  That is, if we're
reading an amount of data equal to effective_cache_size, we assume 25%
caching, and plot a smooth curve through that point.  In the examples
above, we would assume that a 150MB read is 87% cached, a 1GB read is
50% cached, and a 3GB read is 25% cached.

> BTW, it seems that all these variants have an implicit assumption that
> if you're reading a small part of the table it's probably part of the
> working set; which is an assumption that could be 100% wrong.  I don't
> see a way around it without trying to characterize the data access at
> an unworkably fine level, though.

Me neither, but I think it will frequently be true, and I'm not sure
it will hurt very much when it isn't.  I mean, if you execute the same
query repeatedly, that data will become hot soon enough.  If you
execute a lot of different queries that each touch a small portion of
a big, cold table, we might underestimate the costs of the index
probes, but so what?  There's probably no better strategy for
accessing that table anyway.  Perhaps you can construct an example
where this underestimate affects the join order in an undesirable
fashion, but I'm having a hard time getting worked up about that as a
potential problem case.  Our current system - where we essentially
assume that the caching percentage is uniform across the board - can
have the same problem in less artificial cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance