Re: Cost estimate vs. actual - do I care?

Mark Mielke <mark@xxxxxxxxxxxxxx> · Sun, 01 Jan 2012 15:00:10 -0500

3), 2), 1).

The planner needs the right information to make the right decision. 
However, the planner rarely has perfect information, so the algorithms 
need to be able to cope with some amount of imperfection while still 
generally making the right decision. There are also a limited 
(relatively) set of possible plans and the plans often enough have 
different enough characteristics that the truly different plans (i.e. 
factor of 10 difference in terms of run time) won't be selected by 
accident, even with fairly bad estimates.

It probably depends mostly on your data set. For many data sets, the 
estimates might be possible to be off by a factor of 10 and still come 
up with the same "right" plan. For many other data sets, for example, 
the difference between an index scan or a sequential scan could result 
in performance differences by a factor of 10 or more and picking the 
"right one" is incredibly important.

If you are worried, you can test your theory with "set enable_seqscan=0" 
and other such things to see what alternative plans might be generated 
and whether they are indeed better or not. Generally, I only worry about 
it for queries that I know to be bottlenecks, and many of the times - 
the time investment I make into trying to prove that a better query is 
possible ends up only educating me on why I am wrong... :-)

The majority of the time for me, anyways, I don't find that the 
estimates are that bad or that the planner is wrong. It's usually the 
typical scenario where somebody added a query but forgot to make sure 
the query was efficient by ensuring that the indexes properly accelerate 
their query.

Good luck.

mark

On 01/01/2012 01:59 PM, Jay Levitt wrote:
I gather that a big part of making queries performant is making sure 
the planner's estimates reflect reality.

Given a random explain analyze line:

Limit  (cost=0.02..943.11 rows=50 width=16) (actual time=0.603..79.729 
rows=50 loops=1)

which is the truer statement?

1. As long as costs go up with actual time, you're fine.

2. You should try to ensure that costs go up linearly with actual time.

3. You should try to ensure that costs are as close as possible to 
actual time.

4. The number "4".

Jay Levitt

--
Mark Mielke<mark@xxxxxxxxx>

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance