On 28 May 2010 19:54, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Thom Brown <thombrown@xxxxxxxxx> writes: >> I get this: > >> Limit (cost=0.00..316895.11 rows=400 width=211) (actual >> time=3.880..1368.936 rows=400 loops=1) >> -> GroupAggregate (cost=0.00..41843621.95 rows=52817 width=211) >> (actual time=3.872..1367.048 rows=400 loops=1) >> -> Index Scan using "binaryID_2576_idx" on parts_2576 >> (cost=0.00..41683754.21 rows=10578624 width=211) (actual >> time=0.284..130.756 rows=19954 loops=1) >> Index Cond: (("binaryID")::text > >> '1082fa89fe499741b8271f9c92136f44'::text) >> Total runtime: 1370.140 ms > >> The first thing which strikes me is how the GroupAggregate step shows >> it got the 400 rows which matches the limit, but it estimated 52,817 >> rows. Shouldn't it have already known it would be 400? > > No. Rowcount estimates are always in terms of what the node would emit > if allowed to run to completion. Likewise cost. In this case both the > indexscan and the groupagg are terminated early once they satisfy the > limit. The planner is expecting this which is why the estimated cost > for the limit node is way less than those for its inputs. > > That looks like a perfectly reasonable plan from here, though it would > probably not get chosen with a larger limit or no limit at all, since > the ultimate costs are pretty large. Essentially this is a fast-start > plan rather than a lowest-total-cost plan, and that looks like the > best bet for a small limit value. > > regards, tom lane You're absolutely right, it's not chosen when without limit. I see what you mean though about terminating once it has enough rows. It's a shame I can't optimise it though as the real case that runs is with a limit of 4000 which takes a long time to complete. Thanks Thom -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance