Re: anti-join chosen even when slower than old plan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 11, 2010 at 1:23 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> Robert Haas <robertmhaas@xxxxxxxxx> writes:
>> Let's back up a moment and talk about what the overall goal is, here.
>> Ideally, we would like PostgreSQL to have excellent performance at all
>> times, under all circumstances, with minimal tuning.  Therefore, we do
>> NOT want to add variables that will, by design, need constant manual
>> adjustment.  That is why I suggested that Tom's idea of an
>> assume_cached GUC is probably not what we really want to do.   On the
>> other hand, what I understand Mladen to be suggesting is something
>> completely different.  He's basically saying that, of course, he wants
>> it to work out of the box most of the time, but since there are
>> guaranteed to be cases where it doesn't, how about providing some
>> knobs that aren't intended to be routinely twaddled but which are
>> available in case of emergency?  Bravo, I say!
>
> Um ... those are exactly the same thing.  You're just making different
> assumptions about how often you will need to twiddle the setting.
> Neither assumption is based on any visible evidence, unfortunately.
>
> I was thinking of assume_cached as something that could be
> set-and-forget most of the time, and you're entirely right to criticize
> it on the grounds that maybe it wouldn't.  But to support a proposal
> that doesn't even exist yet on the grounds that it *would* be
> set-and-forget seems a tad inconsistent.  We can't make that judgment
> without a whole lot more details than have been provided yet for any
> idea in this thread.

Well, maybe I misunderstood what you were proposing.  I had the
impression that you were proposing something that would *by design*
require adjustment for each query, so evidently I missed the point.
It seems to me that random_page_cost and seq_page_cost are pretty
close to set-and-forget already.  We don't have many reports of people
needing to tune these values on a per-query basis; most people seem to
just guesstimate a cluster-wide value and call it good.  Refining the
algorithm should only make things better.

> I do think that something based around a settable-per-table caching
> percentage might be a reasonable way to proceed.  But the devil is in
> the details, and we don't have those yet.

I think one of the larger devils in the details is deciding how to
estimate the assumed caching percentage when the user hasn't specified
one.  Frankly, I suspect that if we simply added a reloption called
assumed_caching_percentage and made it default to zero, we would make
a bunch of DBAs happy; they'd knock down seq_page_cost and
random_page_cost enough to account for the general level of caching
and then bump assumed_caching_percentage up for hot tables/indexes (or
ones that they want to have become hot).  I think we can do better
than that, but the right formula isn't exactly obvious.  I feel safe
saying that if effective_cache_size=1GB and table_size=4MB, then we
ought to take the table as fully cached.  But it's far from clear what
caching percentage we should assume when table_size=400MB, and it
seems like the sort of thing that will lead to endless bikeshedding.
There's probably no perfect answer, but I feel we can likely come up
with something that is better than a constant (which would probably
still be better than what we have now).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux