>>> On Wed, Feb 1, 2006 at 2:36 pm, in message <4359.1138826175@xxxxxxxxxxxxx>, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > "Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> writes: >> Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >>> I'm interested to poke at this ... are you in a position to provide a >>> test case? > >> I can't supply the original data, since many of the tables have >> millions of rows, with some of the data (related to juvenile, paternity, >> sealed, and expunged cases) protected by law. I could try to put >> together a self- contained example, but I'm not sure the best way to do >> that, since the table sizes and value distributions may be significant >> here. Any thoughts on that? > > I think that the only aspect of the data that really matters here is the > number of distinct values, which would affect decisions about whether > HashAggregate is appropriate or not. And you could probably get the > same thing to happen with at most a few tens of thousands of rows. > > Also, all we need to worry about is the columns used in the WHERE/JOIN > conditions, which looks to be mostly case numbers, dates, and county > identification ... how much confidential info is there in that? At > worst you could translate the case numbers to some randomly generated > identifiers. OK, I could probably obliterate name, addresses, etc. in a copy of the data (those aren't significant to the query anyway) and provide a test case. However, I just found another clue. Since you were so confident it couldn't be the outer join, I went looking for what else I changed at the same time. I eliminated the code referencing that table, which contained an OR. I've seen ORs cause nasty problems with optimizers in the past. I took out the OR in the where clause, without eliminating that last outer join, and it optimized fine. I'll hold off a bit to see if you still need the test case. ;-) -Kevin