On Wednesday 17 October 2007 20:23, Tom Lane wrote: > Chris Kratz <chris.kratz@xxxxxxxxxxxxxx> writes: > > On Wednesday 17 October 2007 14:49, Tom Lane wrote: > >> Evidently it's not realizing that every row of par will have a join > >> partner, but why not? I suppose a.activityid is unique, and in most > >> cases that I've seen the code seems to get that case right. > >> > >> Would you show us the pg_stats rows for par.activity and a.activityid? > > > > Here are the pg_stats rows for par.activity and a.activityid. > > Hmm, nothing out of the ordinary there. > > I poked at this a bit and realized that what seems to be happening is > that the a.programid = 171 condition is reducing the selectivity > estimate --- that is, it knows that that will filter out X percent of > the activity rows, and it assumes that *the size of the join result will > be reduced by that same percentage*, since join partners would then be > missing for some of the par rows. The fact that the join result doesn't > actually decrease in size at all suggests that there's some hidden > correlation between the programid condition and the condition on > par.provider_lfm. Is that true? Maybe you could eliminate one of the > two conditions from the query? > > Since PG doesn't have any cross-table (or even cross-column) statistics > it's not currently possible for the optimizer to deal very well with > hidden correlations like this ... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend Yes, you are correct. Programid is a "guard" condition to make sure a user cannot look at rows outside of their program. In this particular case the par table only has rows for this agency (at one point in time, all rows were in one table), so I was able to remove the check on programid on "a". This causes my example query to run in 200ms. That's wonderful. So, to recap. We had a filter on the join clause which really didn't in this case affect the selectivity of the join table. But the optimizer assumed that the selectivity would be affected causing it to think the join would generate only a few rows. Since it thought that there would be relatively few rows returned, it used a nestloop instead of another type of join that would have been faster with larger data sets. Thanks for all your help. -Chris ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org