On Thu, Jan 13, 2011 at 11:13 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> writes: >> In the former case, the query plan was a bitmap heap scan for each >> table. Then those results were Appended, Sorted, Uniqued, Sorted >> again, and then returned. > >> In the latter, before Appending, each table's results were run through >> HashAggregate. > > Probably the reason it did that is that each individual de-duplication > looked like it would fit in work_mem, but a single de-duplication > didn't. ÂConsider raising work_mem, at least for this one query. I raised work_mem to as high as 512MB (SET LOCAL work_mem = '512MB', within the transaction). Nice. Instead of 7-10 minutes the result is now about a minute (the same as with individual de-duplication). Your comment regarding "each individual de-duplication looked like it would fit in work_mem" doesn't really make sense, exactly. Maybe I'm misunderstanding you. What I'm asking is this: can postgresql apply a de-duplication to each member of a UNION (as I did with SELECT DISTINCT) in order to reduce the total number of rows that need to be de-duplicated when all of the rows have been Appended? The results of the various plans/tweaks are: Initial state: (work_mem = 16MB, no DISTINCT, run time of 7-10 minutes): Unique (Sort (Append ( Lots of Bitmap Heap Scans Here ) ) ) and (work_mem = 16MB, with DISTINCT, run time of ~ 1 minute): HashAggregate ( Append ( Lots Of HashAggregate( Bitmap Heap Scan ) ) ) and (work_mem = 64kB, DISTINCT, run time of *15+ minutes*): Unique (Sort ( Append ( Lots Of HashAggregate( Bitmap Heap Scan ) ) ) ) So I take from this the following: 1. if the result set fits in work_mem, hash aggregate is wicked fast. About 1 jillion times faster than Unique+Sort. 2. it would be nifty if postgresql could be taught that, in a UNION, to de-duplicate each contributory relation so as to reduce the total set of rows that need to be re-de-duplicated. It's extra work, true, and maybe there are some tricks here, but it seems to make a big difference. This is useful so that the total result set is small enough that hash aggregate might apply. NOTE: I have to have work_mem really low as a global on this machine because other queries involving the same tables (such as those that involve UNION ALL for SUM() or GROUP BY operations) cause the machine to run out of memory. Indeed, even with work_mem at 1MB I run the machine out of memory if I don't explicitly disable hashagg for some queries. Can anything be done about that? -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance