Re: queries with lots of UNIONed relations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 13, 2011 at 11:13 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> writes:
>> In the former case, the query plan was a bitmap heap scan for each
>> table. Then those results were Appended, Sorted, Uniqued, Sorted
>> again, and then returned.
>
>> In the latter, before Appending, each table's results were run through
>> HashAggregate.
>
> Probably the reason it did that is that each individual de-duplication
> looked like it would fit in work_mem, but a single de-duplication
> didn't. ÂConsider raising work_mem, at least for this one query.

I raised work_mem to as high as 512MB (SET LOCAL work_mem = '512MB',
within the transaction).  Nice. Instead of 7-10 minutes the result is
now about a minute (the same as with individual de-duplication).

Your comment regarding "each individual de-duplication looked like it
would fit in work_mem" doesn't really make sense, exactly. Maybe I'm
misunderstanding you.

What I'm asking is this: can postgresql apply a de-duplication to each
member of a UNION (as I did with SELECT DISTINCT) in order to reduce
the total number of rows that need to be de-duplicated when all of the
rows have been Appended?

The results of the various plans/tweaks are:

Initial state: (work_mem = 16MB, no DISTINCT, run time of 7-10 minutes):
Unique (Sort (Append ( Lots of Bitmap Heap Scans Here ) ) )

and (work_mem = 16MB, with DISTINCT, run time of ~ 1 minute):
HashAggregate ( Append ( Lots Of HashAggregate( Bitmap Heap Scan ) ) )

and (work_mem = 64kB, DISTINCT, run time of *15+ minutes*):
Unique (Sort ( Append ( Lots Of HashAggregate( Bitmap Heap Scan ) ) ) )

So I take from this the following:

1. if the result set fits in work_mem, hash aggregate is wicked fast.
About 1 jillion times faster than Unique+Sort.

2. it would be nifty if postgresql could be taught that, in a UNION,
to de-duplicate each contributory relation so as to reduce the total
set of rows that need to be re-de-duplicated. It's extra work, true,
and maybe there are some tricks here, but it seems to make a big
difference. This is useful so that the total result set is small
enough that hash aggregate might apply.

NOTE:

I have to have work_mem really low as a global on this machine because
other queries involving the same tables (such as those that involve
UNION ALL for SUM() or GROUP BY operations) cause the machine to run
out of memory. Indeed, even with work_mem at 1MB I run the machine out
of memory if I don't explicitly disable hashagg for some queries. Can
anything be done about that?


-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux