Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> writes: > On Sun, Apr 21, 2019 at 10:36:43AM -0400, Tom Lane wrote: >> Jeff Janes <jeff.janes@xxxxxxxxx> writes: >>> If a single 32 bit hash value has enough tuples by itself to not fit in >>> work_mem, then it will keep splitting until that value is in a batch by >>> itself before shutting off >> Right, that's the code's intention. If that's not good enough for this >> case, we'll need to understand the details a bit better before we can >> design a better(?) heuristic. > I think we only disable growing when there are no other values in the > batch, but that seems rather easy to defeat - all you need is a single > tuple with a hash that falls into the same batch, and it's over. Maybe > we should make this a bit less accurate - say, if less than 5% memory > gets freed, don't add more batches. Yeah, something like that, but it's hard to design it without seeing some concrete misbehaving examples. I think though that this is somewhat independent of the problem that we're not including the I/O buffers in our reasoning about memory consumption. > An alternative would be spilling the extra tuples into a special > overflow file, as I explained earlier. That would actually enforce > work_mem I think. Well, no, it won't. If you have umpteen gigabytes of RHS tuples with the exact same hash code, nothing we can do here is going to prevent you from having to process those in a single table load. (If it's a plain inner join, maybe you could break that into subsections anyway ... but that won't work for left or full joins where you need per-tuple match status.) I think our ambition here should just be to not have the code go crazy trying to keep its memory consumption under work_mem when it's ultimately going to fail to do so anyhow. regards, tom lane