On Thu, Jan 13, 2011 at 5:05 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Robert Haas <robertmhaas@xxxxxxxxx> writes: >> On Thu, Jan 13, 2011 at 5:26 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >>> I don't believe there is any case where hashing each individual relation >>> is a win compared to hashing them all together. ÂIf the optimizer were >>> smart enough to be considering the situation as a whole, it would always >>> do the latter. > >> You might be right, but I'm not sure. ÂSuppose that there are 100 >> inheritance children, and each has 10,000 distinct values, but none of >> them are common between the tables. ÂIn that situation, de-duplicating >> each individual table requires a hash table that can hold 10,000 >> entries. ÂBut deduplicating everything at once requires a hash table >> that can hold 1,000,000 entries. > >> Or am I all wet? > > If you have enough memory to de-dup them individually, you surely have > enough to de-dup all at once. ÂIt is not possible for a single hashtable > to have worse memory consumption than N hashtables followed by a union > hashtable, and in fact if there are no common values then the latter eats > twice as much space because every value appears twice in two different > hashtables. If everything were available up-front, sure. However, and please correct me if I'm wrong, but doesn't postgresql work in a fairly linear fashion, moving from table to table performing a series of operations on each? That seems to indicate that is what the plan is: Compare: for each table LOOP scan table for result rows, append to results END LOOP hash / sort + unique results versus: for each table LOOP scan table for result rows, append to table-results hash / sort+unique table-results, append to results END LOOP hash / sort + unique results In the former case, all of the result rows from all tables are appended together before the de-duplification process can start. In the latter case, only enough memory for each table's result set is necessary for de-duplification, and it would only be necessary to allocate it for that table. Is that not how this works? -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance