Re: suggestions on improving a query

Tom Lane <tgl@xxxxxxxxxxxxx> · Tue, 13 Feb 2007 22:04:52 -0500

"Adam Rich" <adam.r@xxxxxxxxxxxxx> writes:
> This line:
> Index Scan using plp_total_idx on dockscore_plp  
> (cost=0.00..16733229.92 rows=4669988 width=80) 
> (actual time=98.323..322537.605 rows=25197 loops=1)
> Means the planner did what it did, because it estimated there would be
> nearly 5 million rows.  However, there were only 25,000.

No, you have to be careful about interpreting the numbers when
underneath a Limit node.  The rows estimate is an estimate of the total
number of rows if the plan node were run to completion ... but if the
Limit stops execution early, that's not what will happen.  The actual
rows count shows how many rows really got pulled from the node before
the Limit stopped things.

The real problem here is that the planner is guessing that it won't take
very long to find 10 rows satisfying the target = '1YC1' condition while
scanning in dockscore_plp.total order.  So it chooses a plan that would
have a long total runtime (notice the large cost estimates below the
Limit) expecting that only a small fraction of that total will actually
be expended.  The expectation seems a bit off unfortunately :-(.
I can't tell from the given data whether the problem is just an
overestimate of the frequency of target = '1YC1', or if there's an
additional effect.  For example, if that target value tended to only be
associated with larger values of dockscore_plp.total, then a plan like
this could lose big-time because it will have to scan a long way to find
those rows.

			regards, tom lane