On 4/14/2019 23:24, Tom Lane wrote:
ExecutorState: 2234123384 total in 266261 blocks; 3782328 free (17244 chunks); 2230341056 used
Oooh, that looks like a memory leak right enough. The ExecutorState
should not get that big for any reasonable query.
2.2 GB is massive yes.
Your error and stack trace show a failure in HashBatchContext,
which is probably the last of these four:
HashBatchContext: 57432 total in 3 blocks; 16072 free (6 chunks); 41360 used
HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used
HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used
HashBatchContext: 100711712 total in 3065 blocks; 7936 free (0 chunks); 100703776 used
Perhaps that's more than it should be, but it's silly to obsess over 100M
when there's a 2.2G problem elsewhere.
Yes.
I think it's likely that it was
just coincidence that the failure happened right there. Unfortunately,
that leaves us with no info about where the actual leak is coming from.
Strange though, that the vmstat tracking never showed that the cache
allocated memory goes much below 6 GB. Even if this 2.2 GB memory leak
is there, and even if I had 2 GB of shared_buffers, I would still have
enough for the OS to give me.
Is there any doubt that this might be a problem with Linux? Because if
you want, I can whip out a FreeBSD machine, compile pgsql, and attach
the same disk, and try it there. I am longing to have a reason to move
back to FreeBSD anyway. But I have tons of stuff to do, so if you do not
have reason to suspect Linux to do wrong here, I prefer skipping that
futile attempt
The memory map shows that there were three sorts and four hashes going
on, so I'm not sure I believe that this corresponds to the query plan
you showed us before.
Like I said, the first explain was not using the same constraints (no
NL). Now what I sent last should all be consistent. Memory dump and
explain plan and gdb backtrace.
Any chance of extracting a self-contained test case that reproduces this?
With 18 million rows involved in the base tables, hardly.
But I am ready to try some other things with the debugger that you want
me to try. If we have a memory leak issue, we might just as well try to
plug it!
I could even to give someone of you access to the system that runs this.
thanks,
-Gunther