Re: anti-join chosen even when slower than old plan

Robert Haas <robertmhaas@xxxxxxxxx> · Thu, 11 Nov 2010 12:57:54 -0500

On Thu, Nov 11, 2010 at 10:00 AM, Kevin Grittner
<Kevin.Grittner@xxxxxxxxxxxx> wrote:
> Mladen Gogala <mladen.gogala@xxxxxxxxxxx> wrote:
>
>> create a definitive bias toward one type of the execution plan.
>
> We're talking about trying to support the exact opposite.  This all
> started because a database which was tuned for good response time
> for relatively small queries against a "hot" portion of some tables
> chose a bad plan for a weekend maintenance run against the full
> tables.  We're talking about the possibility of adapting the cost
> factors based on table sizes as compared to available cache, to more
> accurately model the impact of needing to do actual disk I/O for
> such queries.
>
> This also is very different from trying to adapt queries to what
> happens to be currently in cache.  As already discussed on a recent
> thread, the instability in plans and the failure to get to an
> effective cache set make that a bad idea.  The idea discussed here
> would maintain a stable plan for a given query, it would just help
> choose a good plan based on the likely level of caching.

Let's back up a moment and talk about what the overall goal is, here.
Ideally, we would like PostgreSQL to have excellent performance at all
times, under all circumstances, with minimal tuning.  Therefore, we do
NOT want to add variables that will, by design, need constant manual
adjustment.  That is why I suggested that Tom's idea of an
assume_cached GUC is probably not what we really want to do.   On the
other hand, what I understand Mladen to be suggesting is something
completely different.  He's basically saying that, of course, he wants
it to work out of the box most of the time, but since there are
guaranteed to be cases where it doesn't, how about providing some
knobs that aren't intended to be routinely twaddled but which are
available in case of emergency?  Bravo, I say!

Consider the case of whether a table is cached.  Currently, we
estimate that it isn't, and you can sort of twaddle that assumption
globally by setting seq_page_cost and random_page_cost.  In 9.0, you
can twaddle it with a bit more granularity by adjusting seq_page_cost
and random_page_cost on a per-tablespace basis.  But that's really
intended to handle the case where you have one tablespace on an SSD
and another that isn't.  It doesn't really model caching at all; we're
just abusing it as if it does.  If 90% of a table is cached, you can't
simply multiply the cost of reading it by 0.1, because now many of
those reads will be random I/O rather than sequential I/O.  The right
thing to do is to estimate what percentage of the table will be
cached, then estimate how much random and sequential I/O it'll take to
get the rest, and then compute the cost.

To do that, we can adopt the approach proposed upthread of comparing
the size of the table to effective_cache_size.  We come up with some
function f, such that f(effective_cache_size, table_size) =
assumed_caching_percentage, and then from there we estimate random
I/Os and sequential I/Os, and from there we estimate costs.  This is a
good system, almost certainly better than what we have now.  However,
it's also guaranteed to not always work.  The DBA may know, for
example, that one particular table that is quite large is always fully
cached because it is very heavily access.  So why not let them pin the
assumed_caching_percentage for that table to 100%?  I don't see any
reason at all.  Most people will never need to touch that setting, but
it's there in case you really, really need it.

We've traditionally been reluctant to do this sort of thing (as the
email Tom just sent reflects) but I think we should soften up a bit.
A product gets hard to use when it has knobs that MUST be tuned to
make it work at all, and certainly AFAICT Oracle falls into that
category.  My rollback segment is full?  My table is out of extents?
Well allocate some more space then; I certainly wouldn't have imposed
an arbitrary cap on the table size if I'd known I was doing it.
However, that's not the same thing as having knobs that are
*available* when the shit really hits the fan.  By failing to provide
that type of knob, we're not facilitating ease of use; we're just
making it difficult for the small percentage of people who have
problems to fix them, which is another kind of non-ease-of-use.

In fact, we already have a few knobs of this type.  We have a
statistics target which can be overriden on a per-column basis, and
beginning in 9.0, you can override the planner's n_distinct estimates
in the same way.  Why?  Because we know that it's not achievable to
estimate n_distinct accurately in all cases without making ANALYZE
unreasonably slow.  I bet that very, VERY few people will ever use
that feature, so it costs nothing in terms of "oh, another setting I
have to configure".  But for people who are getting bitten by
inaccurate n_distinct estimates, it will be very nice to have that as
an escape hatch.  I see no harm, and much value, in providing similar
escape hatches elsewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance