Re: Hash join gets slower as work_mem increases?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 01/29/2016 04:17 PM, Albe Laurenz wrote:
I have a query that runs *slower* if I increase work_mem.

The execution plans are identical in both cases, except that a temp file
is used when work_mem is smaller.

The relevant lines of EXPLAIN ANALYZE output are:

With work_mem='100MB':
->  Hash Join  (cost=46738.74..285400.61 rows=292 width=8) (actual time=4296.986..106087.683 rows=187222 loops=1)
       Hash Cond: ("*SELECT* 1_2".postadresse_id = p.postadresse_id)
       Buffers: shared hit=1181177 dirtied=1, temp read=7232 written=7230

With work_mem='500MB':
->  Hash Join  (cost=46738.74..285400.61 rows=292 width=8) (actual time=3802.849..245970.049 rows=187222 loops=1)
       Hash Cond: ("*SELECT* 1_2".postadresse_id = p.postadresse_id)
       Buffers: shared hit=1181175 dirtied=111

I ran operf on both backends, and they look quite similar, except that the
number of samples is different (this is "opreport -c" output):

CPU: Intel Sandy Bridge microarchitecture, speed 2899.8 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 90000
samples  %        image name               symbol name
-------------------------------------------------------------------------------
   112       0.0019  postgres                 ExecProcNode
   3020116  49.9904  postgres                 ExecScanHashBucket
   3021162  50.0077  postgres                 ExecHashJoin
3020116  92.8440  postgres                 ExecScanHashBucket
   3020116  49.9207  postgres                 ExecScanHashBucket [self]
   3020116  49.9207  postgres                 ExecScanHashBucket
   8190      0.1354  vmlinux                  apic_timer_interrupt

What could be an explanation for this?
Is this known behaviour?

There is a bunch of possible causes for such behavior, but it's quite impossible to say if this is an example of one of them as you have not posted the interesting parts of the explain plan. Also, knowing PostgreSQL version would be useful.

I don't think the example you posted is due to exceeding on-CPU cache as that's just a few MBs per socket, so the smaller work_mem is significantly larger.

What I'd expect to be the issue here is under-estimate of the hash table size, resulting in too few buckets and thus long chains of tuples that need to be searched sequentially. Smaller work_mem values usually limit the length of those chains in favor of batching.

Please, post the whole explain plan - especially the info about number of buckets/batches and the Hash node details.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux