Re: Why we don't want hints Was: Slow count(*) again...

Robert Klemme <shortcutter@xxxxxxxxxxxxxx> · Mon, 6 Jun 2011 10:14:43 +0200

On Thu, Feb 10, 2011 at 7:32 PM, Craig James <craig_james@xxxxxxxxxxxxxx> wrote:
> On 2/10/11 9:21 AM, Kevin Grittner wrote:
>>
>> Shaun Thomas<sthomas@xxxxxxxxx>  wrote:
>>
>>> how difficult would it be to add that syntax to the JOIN
>>> statement, for example?
>>
>> Something like this syntax?:
>>
>> JOIN WITH (correlation_factor=0.3)
>>
>> Where 1.0 might mean that for each value on the left there was only
>> one distinct value on the right, and 0.0 would mean that they were
>> entirely independent?  (Just as an off-the-cuff example -- I'm not
>> at all sure that this makes sense, let alone is the best thing to
>> specify.  I'm trying to get at *syntax* here, not particular knobs.)
>
> There are two types of problems:
>
> 1. The optimizer is imperfect and makes a sub-optimal choice.
>
> 2. There is theoretical reasons why it's hard for the optimizer. For
> example, in a table with 50 columns, there is a staggering number of
> possible correlations.  An optimizer can't possibly figure this out, but a
> human might know them from the start.  The City/Postal-code correlation is a
> good example.
>
> For #1, Postgres should never offer any sort of hint mechanism.  As many
> have pointed out, it's far better to spend the time fixing the optimizer
> than adding hacks.
>
> For #2, it might make sense to give a designer a way to tell Postgres stuff
> that it couldn't possibly figure out. But ... not until the problem is
> clearly defined.
>
> What should happen is that someone writes with an example query, and the
> community realizes that no amount of cleverness from Postgres could ever
> solve it (for solid theoretical reasons). Only then, when the problem is
> clearly defined, should we talk about solutions and SQL extensions.

I don't have one such query handy.  However, I think your posting is a
good starting point for a discussion how to figure out what we need
and how a good solution could look like.  For example, one thing I
dislike about hints is that they go into the query.  There are a few
drawbacks of this approach

- Applications need to be changed to benefit which is not always possible.
- One important class of such applications are those that use OR
mappers - hinting then would have to be buried in OR mapper code or
configuration.
- Hints in the query work only for exactly that query (this might be
an advantage depending on point of view).

I think the solution should rather be to tell Postgres what "it
couldn't possibly figure out".  I imagine that could be some form of
description of the distribution of data in columns and / or
correlations between columns.  Advantage would be that the optimizer
gets additional input which it can use (i.e. the usage can change
between releases), the information is separate from queries (more like
meta data for tables) and thus all queries using a particular table
which was augmented with this meta data would benefit.  Usage of this
meta data could be controlled by a flag per session (as well as
globally) so it would be relatively easy to find out whether this meta
data has become obsolete (because data changed or a new release of the
database is in use).

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance