Re: Out of Memory errors are frustrating as heck!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Apr 20, 2019 at 06:20:15PM -0400, Tom Lane wrote:
Justin Pryzby <pryzby@xxxxxxxxxxxxx> writes:
On Sat, Apr 20, 2019 at 04:46:03PM -0400, Tom Lane wrote:
Maybe we just need to account for the per-batch buffers while estimating
the amount of memory used during planning.  That would force this case
into a mergejoin instead, given that work_mem is set so small.

Do you mean by adding disable_cost if work_mem is so small that it's estimated
to be exceeded ?

No, my point is that ExecChooseHashTableSize fails to consider the
I/O buffers at all while estimating hash table size.  It's not
immediately obvious how to factor that in, but we should.

If Tomas is right that there's also an underestimate of the number
of rows here, that might not solve Gunther's immediate problem; but
it seems like a clear costing oversight.

There's also the angle that the runtime code acts as though increasing
the number of batches is free, while it clearly isn't when you think
about the I/O buffers.  So at some point we should probably stop
increasing the number of batches on the grounds of needing too many
buffers.

Yes. I think it might be partially due to the cost being hidden elsewhere.
The hashjoin code only really deals with array of pointers to BufFile, not
with the BufFiles. And might have looked insignificant for common cases,
but clearly for these corner cases it matters quite a bit.

So yes, ExecChooseHashTableSize() needs to consider this memory and check
if doubling the number of batches has any chance of actually improving
things, because at some point the BufFile memory starts to dominate and
would just force us to do more and more batches.

But I think we should also consider this before even creating the hash
join path - see if the expected number of batches has any chance of
fitting into work_mem, and if not then just not create the path at all.
Just like we do for hash aggregate, for example. It's not going to solve
cases like this (with underestimates), but it seems reasonable. Although,
maybe we won't actually use such paths, because merge join will win thanks
to being automatically cheaper? Not sure.

Also, I wonder if we really need 8kB buffers here. Would it make sense to
allow smaller buffers in some cases? Say, 1kB. It's not going to save us,
but it's still 8x better than now.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services






[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux