Re: Out of Memory errors are frustrating as heck!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/21/2019 23:09, Tomas Vondra wrote:
What I think might work better is the attached v2 of the patch, with a
single top-level condition, comparing the combined memory usage
(spaceUsed + BufFile) against spaceAllowed. But it also tweaks
spaceAllowed once the size needed for BufFile gets over work_mem/3.

Thanks for this, and I am trying this now.

So far it is promising.

I see the memory footprint contained under 1 GB. I see it go up, but also down again. CPU, IO, all being live.

foo=# set enable_nestloop to off;
SET
foo=# explain analyze select * from reports.v_BusinessOperation;
WARNING:  ExecHashIncreaseNumBatches: nbatch=32 spaceAllowed=4194304
WARNING:  ExecHashIncreaseNumBatches: nbatch=64 spaceAllowed=4194304
WARNING:  ExecHashIncreaseNumBatches: nbatch=128 spaceAllowed=4194304
WARNING:  ExecHashIncreaseNumBatches: nbatch=256 spaceAllowed=6291456
WARNING:  ExecHashIncreaseNumBatches: nbatch=512 spaceAllowed=12582912
WARNING:  ExecHashIncreaseNumBatches: nbatch=1024 spaceAllowed=25165824
WARNING:  ExecHashIncreaseNumBatches: nbatch=2048 spaceAllowed=50331648
WARNING:  ExecHashIncreaseNumBatches: nbatch=4096 spaceAllowed=100663296
WARNING:  ExecHashIncreaseNumBatches: nbatch=8192 spaceAllowed=201326592
WARNING:  ExecHashIncreaseNumBatches: nbatch=16384 spaceAllowed=402653184
WARNING:  ExecHashIncreaseNumBatches: nbatch=32768 spaceAllowed=805306368
WARNING:  ExecHashIncreaseNumBatches: nbatch=65536 spaceAllowed=1610612736

Aaaaaand, it's a winner!

 Unique  (cost=5551524.36..5554207.33 rows=34619 width=1197) (actual time=6150303.060..6895451.210 rows=435274 loops=1)
   ->  Sort  (cost=5551524.36..5551610.91 rows=34619 width=1197) (actual time=6150303.058..6801372.192 rows=113478386 loops=1)
         Sort Key: documentinformationsubject.documentinternalid, documentinformationsubject.is_current, documentinformationsubject.documentid, documentinformationsubject.documenttypecode, documentinformationsubject.subjectroleinternalid, documentinformationsubject.subjectentityinternalid, documentinformationsubject.subjectentityid, documentinformationsubject.subjectentityidroot, documentinformationsubject.subjectentityname, documentinformationsubject.subjectentitytel, documentinformationsubject.subjectentityemail, documentinformationsubject.otherentityinternalid, documentinformationsubject.confidentialitycode, documentinformationsubject.actinternalid, documentinformationsubject.code_code, documentinformationsubject.code_displayname, q.code_code, q.code_displayname, an.extension, an.root, documentinformationsubject_2.subjectentitycode, documentinformationsubject_2.subjectentitycodesystem, documentinformationsubject_2.effectivetime_low, documentinformationsubject_2.effectivetime_high, documentinformationsubject_2.statuscode, documentinformationsubject_2.code_code, agencyid.extension, agencyname.trivialname, documentinformationsubject_1.subjectentitycode, documentinformationsubject_1.subjectentityinternalid
         Sort Method: external merge  Disk: 40726720kB
         ->  Hash Right Join  (cost=4255031.53..5530808.71 rows=34619 width=1197) (actual time=325240.679..1044194.775 rows=113478386 loops=1)
               Hash Cond: (((q.documentinternalid)::text = (documentinformationsubject.documentinternalid)::text) AND ((r.targetinternalid)::text = (documentinformationsubject.actinternalid)::text))
               ->  Hash Right Join  (cost=1341541.37..2612134.36 rows=13 width=341) (actual time=81093.327..81093.446 rows=236 loops=1)
                     Hash Cond: (((documentinformationsubject_2.documentinternalid)::text = (q.documentinternalid)::text) AND ((documentinformationsubject_2.actinternalid)::text = (q.actinternalid)::text))
                     ->  Gather  (cost=31291.54..1301884.52 rows=1 width=219) (actual time=41920.563..41929.780 rows=0 loops=1)
                           Workers Planned: 2
                           Workers Launched: 2
                           ->  Parallel Hash Left Join  (cost=30291.54..1300884.42 rows=1 width=219) (actual time=41915.960..41915.960 rows=0 loops=3)
                                 Hash Cond: ((documentinformationsubject_2.otherentityinternalid)::text = (agencyid.entityinternalid)::text)
                                 ->  Parallel Hash Left Join  (cost=28606.13..1299199.00 rows=1 width=204) (actual time=41862.767..41862.769 rows=0 loops=3)
                                       Hash Cond: ((documentinformationsubject_2.otherentityinternalid)::text = (agencyname.entityinternalid)::text)
                                       ->  Parallel Seq Scan on documentinformationsubject documentinformationsubject_2  (cost=0.00..1268800.85 rows=1 width=177) (actual time=40805.337..40805.337 rows=0 loops=3)
                                             Filter: ((participationtypecode)::text = 'AUT'::text)
                                             Rows Removed by Filter: 2815562
                                       ->  Parallel Hash  (cost=24733.28..24733.28 rows=166628 width=64) (actual time=981.000..981.001 rows=133303 loops=3)
                                             Buckets: 65536  Batches: 16  Memory Usage: 3136kB
                                             ->  Parallel Seq Scan on bestname agencyname  (cost=0.00..24733.28 rows=166628 width=64) (actual time=0.506..916.816 rows=133303 loops=3)
                                 ->  Parallel Hash  (cost=1434.07..1434.07 rows=20107 width=89) (actual time=52.350..52.350 rows=11393 loops=3)
                                       Buckets: 65536  Batches: 1  Memory Usage: 4680kB
                                       ->  Parallel Seq Scan on entity_id agencyid  (cost=0.00..1434.07 rows=20107 width=89) (actual time=0.376..46.875 rows=11393 loops=3)
                     ->  Hash  (cost=1310249.63..1310249.63 rows=13 width=233) (actual time=39172.682..39172.682 rows=236 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 70kB
                           ->  Hash Right Join  (cost=829388.20..1310249.63 rows=13 width=233) (actual time=35084.850..39172.545 rows=236 loops=1)
                                 Hash Cond: ((an.actinternalid)::text = (q.actinternalid)::text)
                                 ->  Seq Scan on act_id an  (cost=0.00..425941.04 rows=14645404 width=134) (actual time=0.908..7583.123 rows=14676871 loops=1)
                                 ->  Hash  (cost=829388.19..829388.19 rows=1 width=136) (actual time=29347.614..29347.614 rows=236 loops=1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 63kB
                                       ->  Gather  (cost=381928.46..829388.19 rows=1 width=136) (actual time=23902.428..29347.481 rows=236 loops=1)
                                             Workers Planned: 2
                                             Workers Launched: 2
                                             ->  Parallel Hash Join  (cost=380928.46..828388.09 rows=1 width=136) (actual time=23915.790..29336.452 rows=79 loops=3)
                                                   Hash Cond: ((q.actinternalid)::text = (r.sourceinternalid)::text)
                                                   ->  Parallel Seq Scan on documentinformation q  (cost=0.00..447271.93 rows=50050 width=99) (actual time=10055.238..15484.478 rows=87921 loops=3)
                                                         Filter: (((classcode)::text = 'CNTRCT'::text) AND ((moodcode)::text = 'EVN'::text) AND ((code_codesystem)::text = '2.16.840.1.113883.3.26.1.1'::text))
                                                         Rows Removed by Filter: 1540625
                                                   ->  Parallel Hash  (cost=380928.44..380928.44 rows=1 width=74) (actual time=13825.726..13825.726 rows=79 loops=3)
                                                         Buckets: 1024  Batches: 1  Memory Usage: 112kB
                                                         ->  Parallel Seq Scan on actrelationship r  (cost=0.00..380928.44 rows=1 width=74) (actual time=5289.948..13825.576 rows=79 loops=3)
                                                               Filter: ((typecode)::text = 'SUBJ'::text)
                                                               Rows Removed by Filter: 3433326
               ->  Hash  (cost=2908913.87..2908913.87 rows=34619 width=930) (actual time=244145.322..244145.322 rows=113478127 loops=1)
                     Buckets: 8192 (originally 8192)  Batches: 65536 (originally 16)  Memory Usage: 1204250kB
                     ->  Gather Merge  (cost=2892141.40..2908913.87 rows=34619 width=930) (actual time=75215.333..145622.427 rows=113478127 loops=1)
                           Workers Planned: 2
                           Workers Launched: 2
                           ->  Merge Left Join  (cost=2891141.37..2903917.96 rows=14425 width=930) (actual time=75132.988..99411.448 rows=37826042 loops=3)
                                 Merge Cond: (((documentinformationsubject.documentinternalid)::text = (documentinformationsubject_1.documentinternalid)::text) AND ((documentinformationsubject.documentid)::text = (documentinformationsubject_1.documentid)::text) AND ((documentinformationsubject.actinternalid)::text = (documentinformationsubject_1.actinternalid)::text))
                                 ->  Sort  (cost=1301590.26..1301626.32 rows=14425 width=882) (actual time=39801.337..40975.780 rows=231207 loops=3)
                                       Sort Key: documentinformationsubject.documentinternalid, documentinformationsubject.documentid, documentinformationsubject.actinternalidct_1.documentid, documentinformationsubject_1.actinternalid
                                             Sort Method: external merge  Disk: 169768kB
                                             Worker 0:  Sort Method: external merge  Disk: 169768kB
                                             Worker 1:  Sort Method: external merge  Disk: 169768kB
                                             ->  Seq Scan on documentinformationsubject documentinformationsubject_1  (cost=0.00..1329868.64 rows=1010585 width=159) (actual time=23401.537..31758.042 rows=1031106 loops=3)
                                                   Filter: ((participationtypecode)::text = 'PRD'::text)
                                                   Rows Removed by Filter: 7415579
 Planning Time: 40.559 ms
 Execution Time: 6896581.566 ms
(70 rows)

For the first time this query has succeeded now. Memory was bounded. The time of nearly hours is crazy, but things sometimes take that long. The important thing was not to get an out of memory error.

Thank you. Anything else you want to try, I can do it.

regards,
-Gunther


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux