"Jeremy Haile" <jhaile@xxxxxxxxxxx> writes: > I am running a query that joins against several large tables (~5 million > rows each). The query takes an exteremely long time to run, and the > explain output is a bit beyond my level of understanding. It is an > auto-generated query, so the aliases are fairly ugly. Yah :-( > select distinct city4_.region_id as region1_29_, city4_1_.name as > name29_, city4_.state_id as state2_30_ > from registered_voters registered0_ > inner join registered_voter_addresses addresses1_ on > registered0_.registered_voter_id=addresses1_.registered_voter_id > inner join registered_voter_addresses_regions regions2_ on > addresses1_.address_id=regions2_.registered_voter_addresses_address_id > inner join regions region3_ on > regions2_.regions_region_id=region3_.region_id > inner join cities city4_ on > addresses1_.city_id=city4_.region_id > inner join regions city4_1_ on > city4_.region_id=city4_1_.region_id > where region3_.region_id='093c44e8-f3b2-4c60-8be3-2b4d148f9f5a' > order by city4_1_.name AFAICS the planner is doing about the best you can hope the machine to do --- it's not making any serious estimation errors, and the plan is pretty reasonable for the given query. The problem is that you are forming a very large join result (4918204 rows) and then doing a DISTINCT that reduces this to only 1124 rows ... but the damage of computing that huge join has already been done. The machine is not going to be able to think its way out of this one --- it's up to you to think of a better formulation of the query. Offhand I'd try something involving joining just city4_/city4_1_ (which should not need DISTINCT, I think) and then using WHERE EXISTS(SELECT ... FROM the-other-tables) to filter out the cities you don't want. The reason this can be a win is that the EXISTS formulation will stop running the sub-select as soon as it's produced a single row for the current city, rather than generating thousands of similar rows that will be thrown away by DISTINCT as you have here. This assumes that the fraction of cities passing the query is substantial, as it appears from the rowcounts in your EXPLAIN output. If only a tiny fraction of them passed, then the time wasted in failing EXISTS probes might eat up the savings. regards, tom lane