Slow hash join performance with many batches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have several databases that have the same schema but different amounts of data in it (let's categorize these as Small, Medium, and Large). We have a mammoth query with 13 CTEs that are LEFT JOINed against a main table. This query takes <30 mins on the Small database, <2 hours to run on Large, but on the Medium database it takes in the vicinity of 14 hours.

Running truss/strace on the backend process running this query on the Medium database reveals that for a big chunk of this time Postgres creates/reads/unlinks a very large quantity (millions?) of tiny files inside pgsql_tmp. I also ran an EXPLAIN ANALYZE and am attaching the most time-consuming parts of the plan (with names redacted). Although I'm not too familiar with the internals of Postgres' Hash implementation, it seems that having over 4 million hash batches could be what's causing the problem.

I'm running PostgreSQL 9.3.5, and have work_mem set to 32MB.

Is there any way I can work around this problem, other than to experiment with disabling enable_hashjoin for this query/database?

Alex
Hash Right Join  (cost=609908622207072.24..149222936608255392.00 rows=6928136791749514240 width=1223) (actual time=33401772.349..50706732.014 rows=406394 loops=1)
  Hash Cond: (cte_1.join_col = table_1.join_col)
  <13 CTEs omitted>
  ->  CTE Scan on cte_1  (cost=0.00..16515608.32 rows=825780416 width=36) (actual time=292893.037..324100.993 rows=365136 loops=1)
  ->  Hash  (cost=47862637793642.02..47862637793642.02 rows=3409566476502940 width=1219) (actual time=33056746.437..33056746.437 rows=406394 loops=1)
        Buckets: 4096  Batches: 4194304  Memory Usage: 2kB
        ->  Hash Right Join  (cost=298580771630.21..47862637793642.02 rows=3409566476502940 width=1219) (actual time=5912925.770..33032636.805 rows=406394 loops=1)
              Hash Cond: (cte_2.join_col = table_1.join_col)
              ->  CTE Scan on cte_2  (cost=0.00..16515608.32 rows=825780416 width=36) (actual time=280043.909..312222.528 rows=365136 loops=1)
              ->  Hash  (cost=23618016658.04..23618016658.04 rows=1677961031534 width=1215) (actual time=5516337.065..5516337.065 rows=406394 loops=1)
                    Buckets: 4096  Batches: 4194304  Memory Usage: 2kB
                    ->  Hash Right Join  (cost=178906627.34..23618016658.04 rows=1677961031534 width=1215) (actual time=4067949.971..5495404.748 rows=406394 loops=1)
                          Hash Cond: (cte_3.join_col = table_1.join_col)
                          ->  CTE Scan on cte_3  (cost=0.00..16515608.32 rows=825780416 width=36) (actual time=280040.022..313331.309 rows=365136 loops=1)
                          ->  Hash  (cost=43588312.14..43588312.14 rows=825780416 width=1211) (actual time=3784880.335..3784880.335 rows=406394 loops=1)
                                Buckets: 4096  Batches: 32768  Memory Usage: 9kB
                                ->  Hash Right Join  (cost=689834.06..43588312.14 rows=825780416 width=1211) (actual time=3749003.819..3782275.100 rows=406394 loops=1)
                                      Hash Cond: (cte_4.join_col = table_1.join_col)
                                      ->  CTE Scan on cte_4  (cost=0.00..16515608.32 rows=825780416 width=36) (actual time=274018.453..306236.253 rows=365136 loops=1)
                                      ->  Hash  (cost=623636.13..623636.13 rows=406394 width=1207) (actual time=3474982.429..3474982.429 rows=406394 loops=1)
                                            Buckets: 4096  Batches: 16  Memory Usage: 6985kB
                                            <9 Merge Left Joins omitted>
-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux