Re: Why we don't want hints Was: Slow count(*) again...

Tobias Brox <tobixen@xxxxxxxxx> · Thu, 10 Feb 2011 23:55:29 +0300

On 4 February 2011 04:46, Josh Berkus <josh@xxxxxxxxxxxx> wrote:
> "Optimizer hints are used to work around problems in the optimizer and
> introduce upgrade and maintenance issues. We would rather have the
> problems reported and fixed. We have discussed a more sophisticated
> system of per-class cost adjustment instead, but a specification remains
> to be developed."

I have no clue about how hints works in Oracle ... I've never been
working "enterprise level" on anything else than Postgres.  Anyway,
today I just came over an interesting problem in our production
database today - and I think it would be a benefit to be able to
explicitly tell the planner what index to use (the dev team is adding
redundant attributes and more indexes to solve the problem - which
worries me, because we will run into serious problems as soon as there
won't be enough memory for all the frequently-used indexes).

We have users and transactions, and we have transaction types.  The
transaction table is huge.  The users are able to interactively check
their transaction listings online, and they have some simple filter
options available as well.  Slightly simplified, the queries done
looks like this:

   select * from account_transaction where account_id=? order by
created desc limit 25;

   select * from account_transaction where trans_type_id in ( ...
long, hard-coded list ...) and account_id=? order by created desc
limit 25;

and we have indexes on:

   account_transaction(account_id, created)

   account_transaction(account_id, trans_type_id, created)

(At this point, someone would probably suggest to make three
single-key indexes and use bitmap index scan ... well, pulling 25 rows
from the end of an index may be orders of magnitude faster than doing
bitmap index mapping on huge indexes)

For the second query, the planner would chose the first index - and
maybe it makes sense - most of our customers have between 10-30% of
the transactions from the long list of transaction types, slim indexes
are good and by average the slimmer index would probably do the job a
bit faster.  The problem is with the corner cases - for some of our
extreme customers thousands of transaction index tuples may need to be
scanned before 25 rows with the correct transaction type is pulled
out, and if the index happens to be on disk, it may take tens of
seconds to pull out the answer.  Tens of seconds of waiting leads to
frustration, it is a lot nowadays in an interactive session.  Also, I
haven't really checked it up, but it may very well be that this is
exactly the kind of customers we want to retain.

To summarize, there are two things the planner doesn't know - it
doesn't know that there exists such corner cases where the real cost
is far larger than the estimated cost, and it doesn't know that it's
more important to keep the worst-case cost on a reasonable level than
to minimize the average cost.  In the ideal world postgres would have
sufficiently good statistics to know that for user #77777 it is better
to chose the second index, but I suppose it would be easier if I was
able to explicitly hide the account_transaction(account_id, created)
index for this query.  Well, I know of one way to do it ... but I
suppose it's not a good idea to put "drop index foo; select ...;
rollback;" into production ;-)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance