Re: anti-join chosen even when slower than old plan

<gnuoytr@xxxxxxx> · Thu, 11 Nov 2010 15:56:25 -0500 (EST)

---- Original message ----
>Date: Thu, 11 Nov 2010 15:29:40 -0500
>From: pgsql-performance-owner@xxxxxxxxxxxxxx (on behalf of Robert Haas <robertmhaas@xxxxxxxxx>)
>Subject: Re:  anti-join chosen even when slower than old plan  
>To: Tom Lane <tgl@xxxxxxxxxxxxx>
>Cc: Kevin Grittner <Kevin.Grittner@xxxxxxxxxxxx>,Mladen Gogala <mladen.gogala@xxxxxxxxxxx>,"pgsql-performance@xxxxxxxxxxxxxx" <pgsql-performance@xxxxxxxxxxxxxx>
>
>On Thu, Nov 11, 2010 at 2:35 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>> Robert Haas <robertmhaas@xxxxxxxxx> writes:
>>> Yeah.  For Kevin's case, it seems like we want the caching percentage
>>> to vary not so much based on which table we're hitting at the moment
>>> but on how much of it we're actually reading.
>>
>> Well, we could certainly take the expected number of pages to read and
>> compare that to effective_cache_size.  The thing that's missing in that
>> equation is how much other stuff is competing for cache space.  I've
>> tried to avoid having the planner need to know the total size of the
>> database cluster, but it's kind of hard to avoid that if you want to
>> model this honestly.
>
>I'm not sure I agree with that.  I mean, you could easily have a
>database that is much larger than effective_cache_size, but only that
>much of it is hot.  Or, the hot portion could move around over time.
>And for reasons of both technical complexity and plan stability, I
>don't think we want to try to model that.  It seems perfectly
>reasonable to say that reading 25% of effective_cache_size will be
>more expensive *per-page* than reading 5% of effective_cache_size,
>independently of what the total cluster size is.
>
>> Would it be at all workable to have an estimate that so many megs of a
>> table are in cache (independently of any other table), and then we could
>> scale the cost based on the expected number of pages to read versus that
>> number?  The trick here is that DBAs really aren't going to want to set
>> such a per-table number (at least, most of the time) so we need a
>> formula to get to a default estimate for that number based on some simple
>> system-wide parameters.  I'm not sure if that's any easier.
>
>That's an interesting idea.  For the sake of argument, suppose we
>assume that a relation which is less than 5% of effective_cache_size
>will be fully cached; and anything larger we'll assume that much of it
>is cached.  Consider a 4GB machine with effective_cache_size set to
>3GB.  Then we'll assume that any relation less than 153MB table is
>100% cached, a 1 GB table is 15% cached, and a 3 GB table is 5%
>cached.  That doesn't seem quite right, though: the caching percentage
>drops off very quickly after you exceed the threshold.
>
>*thinks*
>
>I wondering if we could do something with a formula like 3 *
>amount_of_data_to_read / (3 * amount_of_data_to_read +
>effective_cache_size) = percentage NOT cached.  That is, if we're
>reading an amount of data equal to effective_cache_size, we assume 25%
>caching, and plot a smooth curve through that point.  In the examples
>above, we would assume that a 150MB read is 87% cached, a 1GB read is
>50% cached, and a 3GB read is 25% cached.
>
>> BTW, it seems that all these variants have an implicit assumption that
>> if you're reading a small part of the table it's probably part of the
>> working set; which is an assumption that could be 100% wrong.  I don't
>> see a way around it without trying to characterize the data access at
>> an unworkably fine level, though.
>
>Me neither, but I think it will frequently be true, and I'm not sure
>it will hurt very much when it isn't.  I mean, if you execute the same
>query repeatedly, that data will become hot soon enough.  If you
>execute a lot of different queries that each touch a small portion of
>a big, cold table, we might underestimate the costs of the index
>probes, but so what?  There's probably no better strategy for
>accessing that table anyway.  Perhaps you can construct an example
>where this underestimate affects the join order in an undesirable
>fashion, but I'm having a hard time getting worked up about that as a
>potential problem case.  Our current system - where we essentially
>assume that the caching percentage is uniform across the board - can
>have the same problem in less artificial cases.
>
>-- 
>Robert Haas
>EnterpriseDB: http://www.enterprisedb.com
>The Enterprise PostgreSQL Company
>
>-- 
>Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

On a thread some time ago, on a similar subject, I opined that I missed the ability to assign tables to tablespaces and buffers to tablespaces, thus having the ability to isolate needed tables (perhaps a One True Lookup Table, for example; or a Customer table) to memory without fear of eviction.

I was sounding beaten about the face and breast.  It really is an "Enterprise" way of handling the situation.

regards,
Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance