Re: Out of Memory errors are frustrating as heck!

Gunther <raj@xxxxxxxx> · Sat, 20 Apr 2019 16:00:18 -0400

    On Tue, Apr 16, 2019 at 11:46:51PM -0500, Justin Pryzby wrote:

        I wonder if it'd be useful to compile with 
./configure CFLAGS=-DHJDEBUG=1

      Could you try this, too ?

    OK, doing it now, here is what I'm getting in the log file now. I
      am surprised I get so few rows here when there 

    2019-04-20 17:12:16.077 UTC [7093] LOG:  database system was shut down at 2019-04-20 17:12:15 UTC
2019-04-20 17:12:16.085 UTC [7091] LOG:  database system is ready to accept connections
Hashjoin 0x118e0c8: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x118e0f8: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x1194e78: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x119b518: initial nbatch = 16, nbuckets = 65536
Hashjoin 0x1194e78: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x119bb38: initial nbatch = 16, nbuckets = 65536
TopMemoryContext: 4347672 total in 9 blocks; 41784 free (19 chunks); 4305888 used
  HandleParallelMessages: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
  pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1456 free (0 chunks); 6736 used
  TopTransactionContext: 8192 total in 1 blocks; 5416 free (2 chunks); 2776 used
  Operator lookup cache: 24576 total in 2 blocks; 10760 free (3 chunks); 13816 used
  TableSpace cache: 8192 total in 1 blocks; 2096 free (0 chunks); 6096 used
  Type information cache: 24352 total in 2 blocks; 2624 free (0 chunks); 21728 used
  RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 chunks); 1296 used
  MessageContext: 32768 total in 3 blocks; 13488 free (1 chunks); 19280 used
  Operator class cache: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
  smgr relation table: 32768 total in 3 blocks; 16832 free (8 chunks); 15936 used
  TransactionAbortContext: 32768 total in 1 blocks; 32512 free (0 chunks); 256 used
  Portal hash: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
  TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used
    PortalHoldContext: 24632 total in 2 blocks; 7392 free (0 chunks); 17240 used
    PortalContext: 1474560 total in 183 blocks; 6152 free (8 chunks); 1468408 used:
      ExecutorState: 2234501600 total in 266274 blocks; 3696112 free (17274 chunks); 2230805488 used
        HashTableContext: 32768 total in 3 blocks; 17272 free (8 chunks); 15496 used
          HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        HashTableContext: 8192 total in 1 blocks; 7320 free (0 chunks); 872 used
          HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        HashTableContext: 8192 total in 1 blocks; 7320 free (0 chunks); 872 used
          HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        HashTableContext: 8192 total in 1 blocks; 7752 free (0 chunks); 440 used
          HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used
        HashTableContext: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used
          HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used
        TupleSort main: 286912 total in 8 blocks; 246792 free (39 chunks); 40120 used
        TupleSort main: 286912 total in 8 blocks; 246792 free (39 chunks); 40120 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        HashTableContext: 8454256 total in 6 blocks; 64848 free (32 chunks); 8389408 used
          HashBatchContext: 177003344 total in 5387 blocks; 7936 free (0 chunks); 176995408 used
        TupleSort main: 452880 total in 8 blocks; 126248 free (27 chunks); 326632 used
          Caller tuples: 1048576 total in 8 blocks; 21608 free (14 chunks); 1026968 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
  Relcache by OID: 16384 total in 2 blocks; 3512 free (2 chunks); 12872 used
  CacheMemoryContext: 1101328 total in 14 blocks; 236384 free (1 chunks); 864944 used
    index info: 2048 total in 2 blocks; 680 free (1 chunks); 1368 used: pg_toast_2619_index
    index info: 2048 total in 2 blocks; 968 free (1 chunks); 1080 used: entity_id_fkidx
    index info: 2048 total in 2 blocks; 696 free (1 chunks); 1352 used: entity_id_idx
    index info: 2048 total in 2 blocks; 968 free (1 chunks); 1080 used: act_id_fkidx
    ...
    index info: 2048 total in 2 blocks; 680 free (1 chunks); 1368 used: pg_attribute_relid_attnum_index
    index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_class_oid_index
  WAL record construction: 49768 total in 2 blocks; 6368 free (0 chunks); 43400 used
  PrivateRefCount: 8192 total in 1 blocks; 2624 free (0 chunks); 5568 used
  MdSmgr: 8192 total in 1 blocks; 6152 free (1 chunks); 2040 used
  LOCALLOCK hash: 16384 total in 2 blocks; 4600 free (2 chunks); 11784 used
  Timezones: 104120 total in 2 blocks; 2624 free (0 chunks); 101496 used
  ErrorContext: 8192 total in 1 blocks; 7936 free (4 chunks); 256 used
Grand total: 2430295856 bytes in 272166 blocks; 5223104 free (17571 chunks); 2425072752 used
2019-04-20 17:28:56.887 UTC [7100] ERROR:  out of memory
2019-04-20 17:28:56.887 UTC [7100] DETAIL:  Failed on request of size 32800 in memory context "HashBatchContext".
2019-04-20 17:28:56.887 UTC [7100] STATEMENT:  explain analyze SELECT * FROM reports.v_BusinessOperation;

    There are amazingly few entries in the
      log.

      (gdb) break ExecHashIncreaseNumBatches
Breakpoint 1 at 0x6b7bd6: file nodeHash.c, line 884.
(gdb) cont
Continuing.

Breakpoint 1, ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:884
884             int                     oldnbatch = hashtable->nbatch;
(gdb) cont
Continuing.

Breakpoint 1, ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:884
884             int                     oldnbatch = hashtable->nbatch;
(gdb) cont 100
Will ignore next 99 crossings of breakpoint 1.  Continuing.
      But I am surprised now to find that the behavior has changed or
      what?

    First, weirdly enough, now I am not
      getting any of the HJDEBUG messages any more. And yet my resident
      memory has already increased to 1.1 GB. 

       7100 postgres  20   0 2797036   1.1g 188064 R  17.9 14.9  14:46.00 postgres: postgres integrator [local] EXPLAIN
 7664 postgres  20   0 1271788  16228  14408 D  17.6  0.2   0:01.96 postgres: parallel worker for PID 7100
 7665 postgres  20   0 1271788  16224  14404 R  17.6  0.2   0:01.95 postgres: parallel worker for PID 7100

    so why is it all different now? Are we
      chasing a red herring?

    Finally I had another stop now at the
      HJDEBUG line

      #0  ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:904
#1  0x00000000006b93b1 in ExecHashTableInsert (hashtable=0x12d0818, slot=0x12bcf98, hashvalue=234960700) at nodeHash.c:1655
#2  0x00000000006bd600 in ExecHashJoinNewBatch (hjstate=0x12a4340) at nodeHashjoin.c:1051
#3  0x00000000006bc999 in ExecHashJoinImpl (pstate=0x12a4340, parallel=false) at nodeHashjoin.c:539
#4  0x00000000006bca23 in ExecHashJoin (pstate=0x12a4340) at nodeHashjoin.c:565
#5  0x00000000006a191f in ExecProcNodeInstr (node=0x12a4340) at execProcnode.c:461
#6  0x00000000006ceaad in ExecProcNode (node=0x12a4340) at ../../../src/include/executor/executor.h:247
#7  0x00000000006cebe7 in ExecSort (pstate=0x12a4230) at nodeSort.c:107
#8  0x00000000006a191f in ExecProcNodeInstr (node=0x12a4230) at execProcnode.c:461
#9  0x00000000006a18f0 in ExecProcNodeFirst (node=0x12a4230) at execProcnode.c:445
#10 0x00000000006cf25c in ExecProcNode (node=0x12a4230) at ../../../src/include/executor/executor.h:247
#11 0x00000000006cf388 in ExecUnique (pstate=0x12a4040) at nodeUnique.c:73
#12 0x00000000006a191f in ExecProcNodeInstr (node=0x12a4040) at execProcnode.c:461
#13 0x00000000006a18f0 in ExecProcNodeFirst (node=0x12a4040) at execProcnode.c:445
#14 0x000000000069728b in ExecProcNode (node=0x12a4040) at ../../../src/include/executor/executor.h:247
#15 0x0000000000699790 in ExecutePlan (estate=0x12a3da0, planstate=0x12a4040, use_parallel_mode=true, operation=CMD_SELECT,
    sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0xe811c0 <donothingDR>, execute_once=true)
    at execMain.c:1723
#16 0x0000000000697757 in standard_ExecutorRun (queryDesc=0x1404168, direction=ForwardScanDirection, count=0, execute_once=true)
    at execMain.c:364
#17 0x00000000006975f4 in ExecutorRun (queryDesc=0x1404168, direction=ForwardScanDirection, count=0, execute_once=true)
    at execMain.c:307
#18 0x000000000060d227 in ExplainOnePlan (plannedstmt=0x1402588, into=0x0, es=0x10b03a8,
    queryString=0x10866c0 "explain analyze SELECT * FROM reports.v_BusinessOperation;", params=0x0, queryEnv=0x0,
    planduration=0x7fff56a0df00) at explain.c:535

    So now I am back with checking who
      calls AllocSetAlloc?

      (gdb) break AllocSetAlloc if (int)strcmp(context->name, "ExecutorState") == 0
Breakpoint 4 at 0x9eab11: file aset.c, line 719.
(gdb) cont
Continuing.

Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=8272) at aset.c:719
719             AllocSet        set = (AllocSet) context;
(gdb) bt 5
#0  AllocSetAlloc (context=0x12a3c90, size=8272) at aset.c:719
#1  0x00000000009f2e47 in palloc (size=8272) at mcxt.c:938
#2  0x000000000082ae84 in makeBufFileCommon (nfiles=1) at buffile.c:116
#3  0x000000000082af14 in makeBufFile (firstfile=34029) at buffile.c:138
#4  0x000000000082b09f in BufFileCreateTemp (interXact=false) at buffile.c:201
(More stack frames follow...)
(gdb) bt 7
#0  AllocSetAlloc (context=0x12a3c90, size=8272) at aset.c:719
#1  0x00000000009f2e47 in palloc (size=8272) at mcxt.c:938
#2  0x000000000082ae84 in makeBufFileCommon (nfiles=1) at buffile.c:116
#3  0x000000000082af14 in makeBufFile (firstfile=34029) at buffile.c:138
#4  0x000000000082b09f in BufFileCreateTemp (interXact=false) at buffile.c:201
#5  0x00000000006bda31 in ExecHashJoinSaveTuple (tuple=0x86069b8, hashvalue=234960700, fileptr=0x8616a30) at nodeHashjoin.c:1220
#6  0x00000000006b80c0 in ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:1004
(More stack frames follow...)
(gdb) cont
Continuing.

Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=8) at aset.c:719
719             AllocSet        set = (AllocSet) context;
(gdb) bt 7
#0  AllocSetAlloc (context=0x12a3c90, size=8) at aset.c:719
#1  0x00000000009f2f5d in palloc0 (size=8) at mcxt.c:969
#2  0x000000000082aea2 in makeBufFileCommon (nfiles=1) at buffile.c:119
#3  0x000000000082af14 in makeBufFile (firstfile=34029) at buffile.c:138
#4  0x000000000082b09f in BufFileCreateTemp (interXact=false) at buffile.c:201
#5  0x00000000006bda31 in ExecHashJoinSaveTuple (tuple=0x86069b8, hashvalue=234960700, fileptr=0x8616a30) at nodeHashjoin.c:1220
#6  0x00000000006b80c0 in ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:1004
(More stack frames follow...)
(gdb) cont
Continuing.

Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=4) at aset.c:719
719             AllocSet        set = (AllocSet) context;
(gdb) bt 7
#0  AllocSetAlloc (context=0x12a3c90, size=4) at aset.c:719
#1  0x00000000009f2e47 in palloc (size=4) at mcxt.c:938
#2  0x000000000082af22 in makeBufFile (firstfile=34029) at buffile.c:140
#3  0x000000000082b09f in BufFileCreateTemp (interXact=false) at buffile.c:201
#4  0x00000000006bda31 in ExecHashJoinSaveTuple (tuple=0x86069b8, hashvalue=234960700, fileptr=0x8616a30) at nodeHashjoin.c:1220
#5  0x00000000006b80c0 in ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:1004
#6  0x00000000006b93b1 in ExecHashTableInsert (hashtable=0x12d0818, slot=0x12bcf98, hashvalue=234960700) at nodeHash.c:1655
(More stack frames follow...)
(gdb) cont
Continuing.

Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
719             AllocSet        set = (AllocSet) context;
(gdb) bt 7
#0  AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
#1  0x00000000009f2e47 in palloc (size=375) at mcxt.c:938
#2  0x00000000006bdbec in ExecHashJoinGetSavedTuple (hjstate=0x12a4340, file=0x13ca418, hashvalue=0x7fff56a0da54, tupleSlot=0x12bcf98)
    at nodeHashjoin.c:1277
#3  0x00000000006bd61f in ExecHashJoinNewBatch (hjstate=0x12a4340) at nodeHashjoin.c:1042
#4  0x00000000006bc999 in ExecHashJoinImpl (pstate=0x12a4340, parallel=false) at nodeHashjoin.c:539
#5  0x00000000006bca23 in ExecHashJoin (pstate=0x12a4340) at nodeHashjoin.c:565
#6  0x00000000006a191f in ExecProcNodeInstr (node=0x12a4340) at execProcnode.c:461
(More stack frames follow...)

    Now I am adding a breakpoint in
      AllocSetFree and I see that this gets called right after
      AllocSetAlloc with the same context, and presumably the pointer
      previously allocated. 

      (gdb) finish
Run till exit from #0  AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
0x00000000009f2e47 in palloc (size=375) at mcxt.c:938
938             ret = context->methods->alloc(context, size);
Value returned is $1 = (void *) 0x11bd858
(gdb) cont
Continuing.

Breakpoint 5, AllocSetFree (context=0x12a3c90, pointer=0x11bf748) at aset.c:992
992             AllocSet        set = (AllocSet) context;
(gdb) cont
Continuing.

Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
719             AllocSet        set = (AllocSet) context;
(gdb) finish
Run till exit from #0  AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
0x00000000009f2e47 in palloc (size=375) at mcxt.c:938
938             ret = context->methods->alloc(context, size);
Value returned is $2 = (void *) 0x11bf748
(gdb) cont
Continuing.

Breakpoint 5, AllocSetFree (context=0x12a3c90, pointer=0x11bd858) at aset.c:992
992             AllocSet        set = (AllocSet) context;

    See, now that pointer allocated before
      is being freed. I can already see how I would write a memory leak
      detection tool. I would keep a cache of a fixed size, say one
      page, of recently allocated pointers, and when free is called, it
      would remove the pointer from the cache. Then I would only log the
      allocated pointer once it has to be evicted from the cache because
      another one needs to be added, so I don't fill my log file with a
      bunch of memory allocations that get freed relatively quickly. 

    This is very confused. I just stepped
      over

      Breakpoint 2, ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:904
904             printf("Hashjoin %p: increasing nbatch to %d because space = %zu\n",
(gdb) next
908             oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
(gdb) call MemoryContextStats(TopPortalContext)

    and checked my log file and there was
      nothing before the call MemoryContextStats(TopPortalContext) so I
      don't understand where this printf stuff is ending up. Clearly
      *some* of it is in the log, but none of that "increasing nbatch"
      stuff is. I think there is something wrong with that HJDEBUG
      stuff. Oops, my calling that MemoryContextStats really seems to
      help force this buffer to be flushed, because now I got more:

      Hashjoin 0x118e4c8: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x118e4f8: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x1195278: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x119b918: initial nbatch = 16, nbuckets = 65536
Hashjoin 0x1195278: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x119b918: initial nbatch = 16, nbuckets = 65536
...
Hashjoin 0x13a8e68: initial nbatch = 16, nbuckets = 8192
Hashjoin 0x13a8e68: increasing nbatch to 32 because space = 4128933
Hashjoin 0x13a8e68: freed 148 of 10584 tuples, space now 4071106
Hashjoin 0x13a8e68: increasing nbatch to 64 because space = 4128826
Hashjoin 0x13a8e68: freed 544 of 10584 tuples, space now 3916296
Hashjoin 0x13a8e68: increasing nbatch to 128 because space = 4128846
Hashjoin 0x13a8e68: freed 10419 of 10585 tuples, space now 65570
Hashjoin 0x13a8e68: increasing nbatch to 256 because space = 4128829
Hashjoin 0x13a8e68: freed 10308 of 10734 tuples, space now 161815
Hashjoin 0x13a8e68: increasing nbatch to 512 because space = 4128908
Hashjoin 0x13a8e68: freed 398 of 10379 tuples, space now 3977787
Hashjoin 0x13a8e68: increasing nbatch to 1024 because space = 4129008
Hashjoin 0x13a8e68: freed 296 of 10360 tuples, space now 4013423
Hashjoin 0x13a8e68: increasing nbatch to 2048 because space = 4129133
Hashjoin 0x13a8e68: freed 154 of 10354 tuples, space now 4068786
Hashjoin 0x13a8e68: increasing nbatch to 4096 because space = 4129035
Hashjoin 0x13a8e68: freed 10167 of 10351 tuples, space now 72849
Hashjoin 0x242c9b0: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x2443ee0: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x2443aa0: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x2443440: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x2443330: initial nbatch = 16, nbuckets = 65536
Hashjoin 0x13a8e68: increasing nbatch to 8192 because space = 4128997
Hashjoin 0x12d0818: freed 10555 of 10560 tuples, space now 1983
Hashjoin 0x12d0818: increasing nbatch to 16384 because space = 4128957
Hashjoin 0x12d0818: freed 10697 of 10764 tuples, space now 25956TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used
...

    And you see here clearly that there is
      a problem with this printf stdout buffer. 

    Now finally I am at this stage of the
      breakpoint firing rapidly

      (gdb) cont 100
Will ignore next 99 crossings of breakpoint 1.  Continuing.

Breakpoint 1, ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:884
884             int                     oldnbatch = hashtable->nbatch;
(gdb) bt 7
#0  ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:884
#1  0x00000000006b93b1 in ExecHashTableInsert (hashtable=0x12d0818, slot=0x12bcf98, hashvalue=2161368) at nodeHash.c:1655
#2  0x00000000006bd600 in ExecHashJoinNewBatch (hjstate=0x12a4340) at nodeHashjoin.c:1051
#3  0x00000000006bc999 in ExecHashJoinImpl (pstate=0x12a4340, parallel=false) at nodeHashjoin.c:539
#4  0x00000000006bca23 in ExecHashJoin (pstate=0x12a4340) at nodeHashjoin.c:565
#5  0x00000000006a191f in ExecProcNodeInstr (node=0x12a4340) at execProcnode.c:461
#6  0x00000000006ceaad in ExecProcNode (node=0x12a4340) at ../../../src/include/executor/executor.h:247
(More stack frames follow...)

    so it's not a red herring after all.
      But nothing new in the logfile from that HJDEBUG. Trying to flush
      stdout with this call MemoryContextStats(TopPortalContext)

      ...
    ExecutorState: 782712360 total in 92954 blocks; 3626888 free (3126 chunks); 779085472 used

    and nothing heard from HJDEBUG. So you
      really can't rely on that HJDEBUG stuff. It's not working.

    I will summarize that the memory
      problem with the rapid firing on ExecHashIncreaseNumBatches is
      still occurring, confirmed as I reported earlier, it wasn't some
      red herring but that the HJDEBUG stuff doesn't work. Also, that
      there might be benefit in creating a little resident memory leak
      detector that keeps track of a single page of AllocSetAlloc
      pointers cancelled out by matching AllocSetFree and reporting the
      overflow with a quick little stack trace.

    On 4/20/2019 6:53, Justin Pryzby wrote:

        The only problem is that I can't test that this actually would trigger the
memory problem, because I can't force the plan to use the right join, it
always reverts to the left join hashing the tmp_q:

      I think the table on the "OUTER" side is the one which needs to be iterated
over (not hashed), in order to return each of its rows even if there are no
join partners in the other table.  In your original query, the small table was
being hashed and the large table iterated; maybe that's whats important.

    May be so. Trying to wrap my head around the RIGHT vs. LEFT outer
      join and why there even is a difference though. 

           ->  Hash Right Join  (cost=4255031.53..5530808.71 rows=34619 width=1197)
              Hash Cond: (((q.documentinternalid)::text = (documentinformationsubject.documentinternalid)::text) AND ((r.targetinternalid)::text = (documentinformationsubject.actinternalid)::text))
              ->  Hash Right Join  (cost=1341541.37..2612134.36 rows=13 width=341)
                    Hash Cond: (((documentinformationsubject_2.documentinternalid)::text = (q.documentinternalid)::text) AND ((documentinformationsubject_2.actinternalid)::text = (q.actinternalid)::text))
              ... from the TINY table tmp_q, the estimate of 13 rows ain't bad
              ->  Hash  (cost=2908913.87..2908913.87 rows=34619 width=930)
                   ->  Gather Merge  (cost=2892141.40..2908913.87 rows=34619 width=930)
                   ....  from the HUGE tmp_r table, with 100 million rows (estimate of 34619 is grossly wrong, but how could it know? ...

    Now in a right join, we include all rows from the right table,
      and only those from the left table that match the join key. I
      wonder why not transform all of those to left joins then? 

    Why are we not hashing only the "optional" side? 

    The plan here seems to tell me without a doubt that it is hashing
      the big table, 100 million rows get hashed into the hash join and
      then we iterate over the tiny table and then ... now my mind
      boggles and I just don't know why there are right joins at all. 

    But I see the plan is running the Hash index of 100 million rows.

        which is of course much better, but when tmp_q and tmp_r are the results of
complex stuff that the planner can't estimate, then it gets it wrong, and
then the issue gets triggered because we are hashing on the big tmp_r, not
tmp_q.

      I was able to get something maybe promising ?  "Batches: 65536 (originally 1)"

I didn't get "Out of memory" error yet, but did crash the server with this one:
postgres=# explain analyze WITH v AS( SELECT * FROM generate_series(1,99999999)i WHERE i%10<10 AND i%11<11 AND i%12<12 AND i%13<13 AND i%14<14 AND i%15<15 AND i%16<16 AND i%17<17 AND i%18<18 AND i%19<19 AND i%20<20 AND i%21<21 ) SELECT * FROM generate_series(1,99)k JOIN v ON k=i ;

Note, on pg12dev this needs to be "with v AS MATERIALIZED".

postgres=# SET work_mem='128kB';SET client_min_messages =log;SET log_statement_stats=on;explain(analyze,timing off) WITH v AS( SELECT * FROM generate_series(1,999999)i WHERE i%10<10 AND i%11<11 AND i%12<12 AND i%13<13 AND i%14<14 AND i%15<15 AND i%16<16 AND i%17<17 AND i%18<18 AND i%19<19 AND i%20<20 AND i%21<21 ) SELECT * FROM generate_series(1,99)k JOIN v ON k=i ;
 Hash Join  (cost=70.04..83.84 rows=5 width=8) (actual rows=99 loops=1)
   Hash Cond: (k.k = v.i)
   CTE v
     ->  Function Scan on generate_series i  (cost=0.00..70.00 rows=1 width=4) (actual rows=999999 loops=1)
           Filter: (((i % 10) < 10) AND ((i % 11) < 11) AND ((i % 12) < 12) AND ((i % 13) < 13) AND ((i % 14) < 14) AND ((i % 15) < 15) AND ((i % 16) < 16) AND ((i % 17) < 17) AND ((i % 18) < 18) AND ((i % 19) < 19) AND ((i % 20) < 20) AND ((i % 21) < 21))
   ->  Function Scan on generate_series k  (cost=0.00..10.00 rows=1000 width=4) (actual rows=99 loops=1)
   ->  Hash  (cost=0.02..0.02 rows=1 width=4) (actual rows=999999 loops=1)
         Buckets: 4096 (originally 1024)  Batches: 512 (originally 1)  Memory Usage: 101kB
         ->  CTE Scan on v  (cost=0.00..0.02 rows=1 width=4) (actual rows=999999 loops=1)

    Yes I thought that with CTEs and functions one might be able to
      generate a test case, but still not seing how you can trick the
      planner into this peculiar jointype JOIN_RIGHT and whether that is
      requisite for triggering the problem.
    Finally, I have tried to make a pstate pretty printer in
      explain.c:
    void
DumpPlanState(PlanState *pstate) {
        ExplainState *es = NewExplainState();
        ExplainNode(pstate, NIL, NULL, NULL, es);
        puts(es->str->data);
        pfree(es->str->data);
}

    but that didn't work, because unfortunately that ExplainNode
      function is destructive. It would be so nice to refactor this
      explain code such that there would be a completely conservative
      function that simply dumps the present pstate with all the
      information about its estimate and actual situation, how many
      iterations it has already accomplished, how many it estimates to
      still have to do, whether its original estimate panned out or not,
      etc. This would be so tremendously useful for runtime debugging of
      queries. I think the utility of this can hardly be overstated. I
      mean even for end-user applications of some data warehouse, where
      you could probe a long running query every 5 seconds as to where
      the execution is. Man, I could not think of any more low hanging
      fruit useful feature. I am sure that if PostgreSQL was originally
      written in Java, this feature would naturally exist already. 

    regards and Happy Easter,

      -Gunther