Re: Out of Memory errors are frustrating as heck!

Tom Lane <tgl@xxxxxxxxxxxxx> · Sun, 21 Apr 2019 19:07:53 -0400

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> writes:
> On Sun, Apr 21, 2019 at 10:36:43AM -0400, Tom Lane wrote:
>> Jeff Janes <jeff.janes@xxxxxxxxx> writes:
>>> If a single 32 bit hash value has enough tuples by itself to not fit in
>>> work_mem, then it will keep splitting until that value is in a batch by
>>> itself before shutting off

>> Right, that's the code's intention.  If that's not good enough for this
>> case, we'll need to understand the details a bit better before we can
>> design a better(?) heuristic.

> I think we only disable growing when there are no other values in the
> batch, but that seems rather easy to defeat - all you need is a single
> tuple with a hash that falls into the same batch, and it's over. Maybe
> we should make this a bit less accurate - say, if less than 5% memory
> gets freed, don't add more batches.

Yeah, something like that, but it's hard to design it without seeing some
concrete misbehaving examples.

I think though that this is somewhat independent of the problem that we're
not including the I/O buffers in our reasoning about memory consumption.

> An alternative would be spilling the extra tuples into a special
> overflow file, as I explained earlier. That would actually enforce
> work_mem I think.

Well, no, it won't.  If you have umpteen gigabytes of RHS tuples with the
exact same hash code, nothing we can do here is going to prevent you from
having to process those in a single table load.  (If it's a plain inner
join, maybe you could break that into subsections anyway ... but that
won't work for left or full joins where you need per-tuple match status.)

I think our ambition here should just be to not have the code go crazy
trying to keep its memory consumption under work_mem when it's ultimately
going to fail to do so anyhow.

			regards, tom lane