> Ah, I see where I was confused: in the original query plan I'd been > imagining that charlie.sierra was a unique column, but your gloss on > that as being house.district_id implies that it's highly non-unique. > And looking at the rowcounts in the original plan backs that up: > there are about 600 house rows per district row. So my thought of > having district as the outer side of a nestloop scanning the index > on house.district_id would not really work very well --- maybe it > would end up cheaper than the mergejoin plan, but it's far from a > clear-cut win. > > On the whole I'm thinking the code is operating as designed here. Well, except for the part where it's choosing a plan which takes 486 seconds over a plan which takes 4 seconds. I guess what I'm really not understanding is why it's calculating a cost of 3.7m for the index scan, and then discarding that *entire* cost and not including it in the total cost of the query? This seems wrong, especially since that index scan, in fact, ends up being 85% of the execution time of the query: Merge Join (cost=7457.670..991613.190 rows=1092168 width=4) (actual time=57.854..481062.706 rows=4514968 loops=1) Merge Cond: (charlie.sierra = four.quebec_seven) Index Scan using whiskey_delta on charlie (cost=0.000..3775171.860 rows=84904088 width=8) (actual time=0.006..459501.341 rows=20759070 loops=1) If the cost of the index scan were included in the total cost of the query plan, then the planner *would* chose the nestloop plan instead. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance