Re: Out of Memory errors are frustrating as heck!

Gunther <raj@xxxxxxxx> · Tue, 23 Apr 2019 19:09:00 -0400



    On 4/23/2019 16:43, Justin Pryzby
      wrote:
    
    
      It wrote 40GB tempfiles - perhaps you can increase work_mem now to improve the
query time.
    
    I now upped my shared_buffers back from 1 to 2GB and work_mem
      from 4 to 16MB. Need to set vm.overcommit_ratio from 50 to 75
      (percent, with vm.overcommit_memory = 2 as it is.)
    
    
      We didn't address it yet, but your issue was partially caused by a misestimate.
It's almost certainly because these conditions are correlated, or maybe
redundant.
    
    That may be so, but mis-estimates happen. And I can still
      massively improve this query logically I am sure.  In fact it
      sticks out like a sore thumb, logically it makes no sense to churn
      over 100 million rows here, but the point is that hopefully
      PostgreSQL runs stable in such outlier situations, comes back and
      presents you with 2 hours of work time, 40 GB temp space, or
      whatever, and then we users can figure out how to make it work
      better. The frustrating thing it to get out of memory and we not
      knowing what we can possibly do about it.
    From my previous attempt with this tmp_r and tmp_q table, I also
      know that the Sort/Uniqe step is taking  a lot of extra time. I
      can cut that out too by addressing the causes of the "repeated
      result" rows. But again, that is all secondary optimizations.

    
        Merge Cond: (((documentinformationsubject.documentinternalid)::text = (documentinformationsubject_1.documentinternalid)::text) AND ((documentinformationsubject.documentid)::text = (documentinformationsubject_1.documentid)::text) AND ((documentinformationsubject.actinternalid)::text = (documentinformationsubject_1.actinternalid)::text))

      
      If they're completely redundant and you can get the same result after dropping
one or two of those conditions, then you should.
    
    I understand. You are saying by reducing the amount of columns in
    the join condition, somehow you might be able to reduce the size of
    the hash temporary table?
    
      Alternately, if they're correlated but not redundant, you can use PG10
"dependency" statistics (CREATE STATISTICS) on the correlated columns (and
ANALYZE).
    
    I think documentId and documentInternalId is 1:1 they are both
      primary / alternate keys. So I could go with only one of them, but
      since I end up needing both elsewhere inside the query I like to
      throw them all into the natural join key, so that I don't have to
      deal with the duplicate result columns. 

    
    Now running:
    integrator=# set enable_nestloop to off;
SET
integrator=# explain analyze select * from reports.v_BusinessOperation;
WARNING:  ExecHashIncreaseNumBatches: nbatch=8 spaceAllowed=16777216
WARNING:  ExecHashIncreaseNumBatches: nbatch=16 spaceAllowed=16777216
WARNING:  ExecHashIncreaseNumBatches: nbatch=32 spaceAllowed=16777216
WARNING:  ExecHashIncreaseNumBatches: nbatch=64 spaceAllowed=16777216
WARNING:  ExecHashIncreaseNumBatches: nbatch=128 spaceAllowed=16777216
WARNING:  ExecHashIncreaseNumBatches: nbatch=256 spaceAllowed=16777216
WARNING:  ExecHashIncreaseNumBatches: nbatch=512 spaceAllowed=16777216
WARNING:  ExecHashIncreaseNumBatches: nbatch=1024 spaceAllowed=25165824
WARNING:  ExecHashIncreaseNumBatches: nbatch=2048 spaceAllowed=50331648
WARNING:  ExecHashIncreaseNumBatches: nbatch=4096 spaceAllowed=100663296
WARNING:  ExecHashIncreaseNumBatches: nbatch=8192 spaceAllowed=201326592
WARNING:  ExecHashIncreaseNumBatches: nbatch=16384 spaceAllowed=402653184
WARNING:  ExecHashIncreaseNumBatches: nbatch=32768 spaceAllowed=805306368
WARNING:  ExecHashIncreaseNumBatches: nbatch=65536 spaceAllowed=1610612736

    I am waiting now, probably for that Sort/Unique to finish I think
      that the vast majority of the time spent is in this sort
     Unique  (cost=5551524.36..5554207.33 rows=34619 width=1197) (actual time=6150303.060..6895451.210 rows=435274 loops=1)
   ->  Sort  (cost=5551524.36..5551610.91 rows=34619 width=1197) (actual time=6150303.058..6801372.192 rows=113478386 loops=1)
         Sort Key: documentinformationsubject.documentinternalid, documentinformationsubject.is_current, documentinformationsubject.documentid, documentinformationsubject.documenttypecode, documentinformationsubject.subjectroleinternalid, documentinformationsubject.subjectentityinternalid, documentinformationsubject.subjectentityid, documentinformationsubject.subjectentityidroot, documentinformationsubject.subjectentityname, documentinformationsubject.subjectentitytel, documentinformationsubject.subjectentityemail, documentinformationsubject.otherentityinternalid, documentinformationsubject.confidentialitycode, documentinformationsubject.actinternalid, documentinformationsubject.code_code, documentinformationsubject.code_displayname, q.code_code, q.code_displayname, an.extension, an.root, documentinformationsubject_2.subjectentitycode, documentinformationsubject_2.subjectentitycodesystem, documentinformationsubject_2.effectivetime_low, documentinformationsubject_2.effectivetime_high, documentinformationsubject_2.statuscode, documentinformationsubject_2.code_code, agencyid.extension, agencyname.trivialname, documentinformationsubject_1.subjectentitycode, documentinformationsubject_1.subjectentityinternalid
         Sort Method: external merge  Disk: 40726720kB
         ->  Hash Right Join  (cost=4255031.53..5530808.71 rows=34619 width=1197) (actual time=325240.679..1044194.775 rows=113478386 loops=1)

    isn't it?
    Unique/Sort actual time   6,150,303.060 ms = 6,150 s <~ 2 h.

      Hash Right Join actual time 325,240.679 ms. 

    
    So really all time is wasted in that sort, no need for you guys
      to worry about anything else with these 2 hours.  Tomas just
      stated the same thing.
    
      
Right. Chances are that with a bettwe
        estimate the optimizer would pick
        

        merge join instead. I wonder if that would be significantly
        faster.
      
      The prospect of a merge join is interesting here to consider: with
      the Sort/Unique step taking so long, it seems the Merge Join might
      also take a lot of time? I see my disks are churning for the most
      time in this way:
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.50    0.00    2.50   89.50    0.00    0.50

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme1n1           0.00     0.00  253.00  131.00    30.15    32.20   332.50     2.01    8.40    8.41    8.37   2.59  99.60
nvme1n1p24        0.00     0.00  253.00  131.00    30.15    32.20   332.50     2.01    8.40    8.41    8.37   2.59  99.60

    I.e. 400 IOPS at 60 MB/s half of it read, half of it write.
      During the previous steps, the hash join presumably, throughput
      was a lot higher, like 2000 IOPS with 120 MB/s read or write. 

    
    But even if the Merge Join would have taken about the same or a
      little more time than the Hash Join, I wonder, if one could not
      use that to collapse the Sort/Unique step into that? Like it seems
      that after the Sort/Merge has completed, one should be able to
      read Uniqe records without any further sorting? In that case the
      Merge would be a great advantage.

    
    What I like about the situation now is that with that 4x bigger
      work_mem, the overall memory situation remains the same. I.e., we
      are scraping just below 1GB for this process and we see
      oscillation, growth and shrinkage occurring. So I consider this
      case closed for me. That doesn't mean I wouldn't be available if
      you guys want to try anything else about it.
    OK, now here is the result with the 16 MB work_mem:
     Unique  (cost=5462874.86..5465557.83 rows=34619 width=1197) (actual time=6283539.282..7003311.451 rows=435274 loops=1)
   ->  Sort  (cost=5462874.86..5462961.41 rows=34619 width=1197) (actual time=6283539.280..6908879.456 rows=113478386 loops=1)
         Sort Key: documentinformationsubject.documentinternalid, documentinformationsubject.is_current, documentinformationsubject.documentid, documentinformationsubject.documenttypecode, documentinformationsubject.subjectroleinternalid, documentinformati
onsubject.subjectentityinternalid, documentinformationsubject.subjectentityid, documentinformationsubject.subjectentityidroot, documentinformationsubject.subjectentityname, documentinformationsubject.subjectentitytel, documentinformationsubject.subjectenti
tyemail, documentinformationsubject.otherentityinternalid, documentinformationsubject.confidentialitycode, documentinformationsubject.actinternalid, documentinformationsubject.code_code, documentinformationsubject.code_displayname, q.code_code, q.code_disp
layname, an.extension, an.root, documentinformationsubject_2.subjectentitycode, documentinformationsubject_2.subjectentitycodesystem, documentinformationsubject_2.effectivetime_low, documentinformationsubject_2.effectivetime_high, documentinformationsubjec
t_2.statuscode, documentinformationsubject_2.code_code, agencyid.extension, agencyname.trivialname, documentinformationsubject_1.subjectentitycode, documentinformationsubject_1.subjectentityinternalid
         Sort Method: external merge  Disk: 40726872kB
         ->  Hash Right Join  (cost=4168174.03..5442159.21 rows=34619 width=1197) (actual time=337057.290..1695675.896 rows=113478386 loops=1)
               Hash Cond: (((q.documentinternalid)::text = (documentinformationsubject.documentinternalid)::text) AND ((r.targetinternalid)::text = (documentinformationsubject.actinternalid)::text))
               ->  Hash Right Join  (cost=1339751.37..2608552.36 rows=13 width=341) (actual time=84109.143..84109.238 rows=236 loops=1)
                     Hash Cond: (((documentinformationsubject_2.documentinternalid)::text = (q.documentinternalid)::text) AND ((documentinformationsubject_2.actinternalid)::text = (q.actinternalid)::text))
                     ->  Gather  (cost=29501.54..1298302.52 rows=1 width=219) (actual time=43932.534..43936.888 rows=0 loops=1)
                           Workers Planned: 2
                           Workers Launched: 2
                           ->  Parallel Hash Left Join  (cost=28501.54..1297302.42 rows=1 width=219) (actual time=43925.304..43925.304 rows=0 loops=3)
                           ...
                     ->  Hash  (cost=1310249.63..1310249.63 rows=13 width=233) (actual time=40176.581..40176.581 rows=236 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 70kB
                           ->  Hash Right Join  (cost=829388.20..1310249.63 rows=13 width=233) (actual time=35925.031..40176.447 rows=236 loops=1)
                                 Hash Cond: ((an.actinternalid)::text = (q.actinternalid)::text)
                                 ->  Seq Scan on act_id an  (cost=0.00..425941.04 rows=14645404 width=134) (actual time=1.609..7687.986 rows=14676871 loops=1)
                                 ->  Hash  (cost=829388.19..829388.19 rows=1 width=136) (actual time=30106.123..30106.123 rows=236 loops=1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 63kB
                                       ->  Gather  (cost=381928.46..829388.19 rows=1 width=136) (actual time=24786.510..30105.983 rows=236 loops=1)
                                       ...
               ->  Hash  (cost=2823846.37..2823846.37 rows=34619 width=930) (actual time=252946.367..252946.367 rows=113478127 loops=1)
                     Buckets: 32768 (originally 32768)  Batches: 65536 (originally 4)  Memory Usage: 1204250kB
                     ->  Gather Merge  (cost=2807073.90..2823846.37 rows=34619 width=930) (actual time=83891.069..153380.040 rows=113478127 loops=1)
                           Workers Planned: 2
                           Workers Launched: 2
                           ->  Merge Left Join  (cost=2806073.87..2818850.46 rows=14425 width=930) (actual time=83861.921..108022.671 rows=37826042 loops=3)
                                 Merge Cond: (((documentinformationsubject.documentinternalid)::text = (documentinformationsubject_1.documentinternalid)::text) AND ((documentinformationsubject.documentid)::text = (documentinformationsubject_1.documentid)::
text) AND ((documentinformationsubject.actinternalid)::text = (documentinformationsubject_1.actinternalid)::text))
                                 ->  Sort  (cost=1295969.26..1296005.32 rows=14425 width=882) (actual time=44814.114..45535.398 rows=231207 loops=3)
                                       Sort Key: documentinformationsubject.documentinternalid, documentinformationsubject.docum...
                           Workers Planned: 2
                           Workers Launched: 2
                           ->  Merge Left Join  (cost=2806073.87..2818850.46 rows=14425 width=930) (actual time=83861.921..108022.671 rows=37826042 loops=3)
                                 Merge Cond: (((documentinformationsubject.documentinternalid)::text = (documentinformationsubject_1.documentinternalid)::text) AND ((documentinformationsubject.documentid)::text = (documentinformationsubject_1.documentid)::
text) AND ((documentinformationsubject.actinternalid)::text = (documentinformationsubject_1.actinternalid)::text))
                                 ->  Sort  (cost=1295969.26..1296005.32 rows=14425 width=882) (actual time=44814.114..45535.398rows=231207 loops=3)
                                 ...
 Planning Time: 2.953 ms
 Execution Time: 7004340.091 ms
(70 rows)

    There isn't really any big news here. But what matters is that it
      works.

    
    thanks & regards,

      -Gunther Schadow