Re: Out of Memory errors are frustrating as heck!

Gunther <raj@xxxxxxxx> · Mon, 15 Apr 2019 21:49:50 -0400

    Wow, we are getting somewhere.
    Tom (BTW, your mail server rejects my direct mail, but I'm glad
      you got it through the list), you say:

      I'm pretty sure that's not the droid we're looking for.
ExecHashJoinGetSavedTuple does palloc a new tuple, but it immediately
sticks it into a TupleTableSlot that will be responsible for freeing
it (when the next tuple is stuck into the same slot).  I'd suggest
continuing a few times and looking for other code paths leading
to AllocSetAlloc in this context.

    I did continue a "few times", but few as in a dozen, it's always
      the same

    (gdb) bt 6
#0  AllocSetAlloc (context=0x1168230, size=375) at aset.c:715
#1  0x000000000084e6cd in palloc (size=375) at mcxt.c:938
#2  0x000000000061019c in ExecHashJoinGetSavedTuple (file=file@entry=0x21df688, hashvalue=hashvalue@entry=0x7fff2e4ca76c,
    tupleSlot=0x10856b8, hjstate=0x11688e0) at nodeHashjoin.c:1277
#3  0x0000000000610c83 in ExecHashJoinNewBatch (hjstate=0x11688e0) at nodeHashjoin.c:1042
#4  ExecHashJoinImpl (parallel=false, pstate=0x11688e0) at nodeHashjoin.c:539
#5  ExecHashJoin (pstate=0x11688e0) at nodeHashjoin.c:565
(More stack frames follow...)

    So I decided to just let it go until it exits the ExecHashJoin
      function:
    #6  0x00000000005fde68 in ExecProcNodeInstr (node=0x11688e0) at execProcnode.c:461
461             result = node->ExecProcNodeReal(node);
(gdb) list
456     {
457             TupleTableSlot *result;
458
459             InstrStartNode(node->instrument);
460
461             result = node->ExecProcNodeReal(node);
462
463             InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
464
465             return result;
(gdb) break 463
Breakpoint 3 at 0x5fde68: file execProcnode.c, line 463.
(gdb) disable 2
(gdb) cont
Continuing.

Breakpoint 3, ExecProcNodeInstr (node=0x11688e0) at execProcnode.c:463
463             InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);

    oops, that was fast, so up further ... 

    (gdb) cont
Continuing.

Breakpoint 4, ExecSort (pstate=0x11687d0) at nodeSort.c:109
109                             if (TupIsNull(slot))
(gdb) cont
Continuing.

Breakpoint 3, ExecProcNodeInstr (node=0x11688e0) at execProcnode.c:463
463             InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
(gdb) cont
Continuing.

Breakpoint 4, ExecSort (pstate=0x11687d0) at nodeSort.c:109
109                             if (TupIsNull(slot))
(gdb) up
#1  0x00000000005fde68 in ExecProcNodeInstr (node=0x11687d0) at execProcnode.c:461
461             result = node->ExecProcNodeReal(node);
(gdb) up
#2  0x000000000061d2c4 in ExecProcNode (node=0x11687d0) at ../../../src/include/executor/executor.h:247
247             return node->ExecProcNode(node);
(gdb) up
#3  ExecUnique (pstate=0x11685e0) at nodeUnique.c:73
73                      slot = ExecProcNode(outerPlan);
(gdb) list
68              for (;;)
69              {
70                      /*
71                       * fetch a tuple from the outer subplan
72                       */
73                      slot = ExecProcNode(outerPlan);
74                      if (TupIsNull(slot))
75                      {
76                              /* end of subplan, so we're done */
77                              ExecClearTuple(resultTupleSlot);

    ... but whatever I do, ultimately I get to that allocation
      routine through the same path.
    Since that is the bulk of the activity, and memory was still
      growing while we come through this path, I assume that this is it.

      My first thought on noticing the SELECT DISTINCT was that you might be
hitting the grouping-function-related leak that Andres fixed in 9cf37a527;
but that fix did make it into 11.2 (by just a couple of days...).  Still,
maybe there's another issue in the same area.

    I don't know about that one, I only know that I am running 11.2
      freshly compiled.
    The change suggested by Alvaro Herrera wasn't applicable. 

    Jeff Janes had more

            Breakpoint 2, AllocSetAlloc (context=0x1168230, size=8272) at aset.c:715
715     {
(gdb) p context->name
$8 = 0x96ce5b "ExecutorState"

        I think that the above one might have been the one you
          wanted.

      Not sure how you could tell that? It's the same place as
      everything else. If we can find out what you're looking for, may
      be we can set a break point earlier up the call chain? 

            I
              guess I should run this for a little longer. So I disable
              my breakpoints 

            it
              went up pretty quick from 1.2 GB to 1.5 GB, but then it
              stopped growing fast, so now back to gdb and break:

        Unfortunately, I think this means you missed your
          opportunity and are now getting backtraces of the innocent
          bystanders.

      But why? If I see the memory still go up insanely fast, isn't that
      a sign for the leak?

        Particularly since you report that the version using nested
          loops rather than hash joins also leaked, so it is probably
          not the hash-join specific code that is doing it.

      How about it's in the DISTINCT? I noticed while peeking up the
      call chain, that it was already in the UNIQUE sort thing also.  I
      guess it's streaming the results from the hash join right into the
      unique sort step.

        What I've done before is compile with the comments removed
          from 
        src/backend/utils/mmgr/aset.c:/* #define HAVE_ALLOCINFO */

      I have just done that and it creates an insane amount of output
      from all the processes, I'm afraid there will be no way to keep
      that stuff separated. If there was a way of turning that one and
      off for one process only, then we could probably get more info...
    Everything is also extremely slow that way. Like in a half hour
      the memory didn't even reach 100 MB.

        and then look for allocations sizes which are getting
          allocated but not freed, and then you can go back to gdb to
          look for allocations of those specific sizes.  

      I guess I should look for both, address and size to match it
      better.

        This generates a massive amount of output, and it bypasses
          the logging configuration and goes directly to stderr--so it
          might not end up where you expect.

      Yes, massive, like I said. Impossible to use. File system fills up
      rapidly. I made it so that it can be turned on and off, with the
      debugger.
    int _alloc_info = 0;
#ifdef HAVE_ALLOCINFO
#define AllocFreeInfo(_cxt, _chunk) \
        if(_alloc_info) \
            fprintf(stderr, "AllocFree: %s: %p, %zu\n", \
                (_cxt)->header.name, (_chunk), (_chunk)->size)
#define AllocAllocInfo(_cxt, _chunk) \
        if(_alloc_info) \
            fprintf(stderr, "AllocAlloc: %s: %p, %zu\n", \
                (_cxt)->header.name, (_chunk), (_chunk)->size)
#else
#define AllocFreeInfo(_cxt, _chunk)
#define AllocAllocInfo(_cxt, _chunk)
#endif

    so with this I do
    (gdb) b AllocSetAlloc
(gdb) cont
(gdb) set _alloc_info=1
(gdb) disable
(gdb) cont

    then I wait, ... until it crashes again ... no, it's too much. It
      fills up my filesystem in no time with the logs.  It produced 3 GB
      in just a minute of run time.

    And also, I doubt we can find anything specifically by allocation
      size. It's just going to be 512 or whatever. 

    Isn't there some other way? 

    I'm going to try without that DISTINCT step, or perhaps by
      dismantling this query until it works without this excessive
      memory growth.
    -Gunther