On Thu, Jan 13, 2011 at 4:49 PM, Robert Haas <robertmhaas@xxxxxxxxx> wrote: > On Thu, Jan 13, 2011 at 5:47 PM, Andy Colson <andy@xxxxxxxxxxxxxxx> wrote: >>>>> I don't believe there is any case where hashing each individual relation >>>>> is a win compared to hashing them all together. ÂIf the optimizer were >>>>> smart enough to be considering the situation as a whole, it would always >>>>> do the latter. >>>> >>>> You might be right, but I'm not sure. ÂSuppose that there are 100 >>>> inheritance children, and each has 10,000 distinct values, but none of >>>> them are common between the tables. ÂIn that situation, de-duplicating >>>> each individual table requires a hash table that can hold 10,000 >>>> entries. ÂBut deduplicating everything at once requires a hash table >>>> that can hold 1,000,000 entries. >>>> >>>> Or am I all wet? >>> >>> Yeah, I'm all wet, because you'd still have to re-de-duplicate at the >>> end. ÂBut then why did the OP get a speedup? Â*scratches head* >> >> Because it all fix it memory and didnt swap to disk? > > Doesn't make sense. ÂThe re-de-duplication at the end should use the > same amount of memory regardless of whether the individual relations > have already been de-duplicated. I don't believe that to be true. Assume 100 tables each of which produces 10,000 rows from this query. Furthermore, let's assume that there are 3,000 duplicates per table. Without DISTINCT: uniqify (100 * 10,000 = 1,000,000 rows) With DISTINCT: uniqify (100 * (10,000 - 3,000) = 700,000 rows) 300,000 rows times (say, 64 bytes/row) = 18.75MB. Not a lot, but more than the work_mem of 16MB. Or maybe *I'm* all wet? -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance