On Sat, Apr 20, 2019 at 04:26:34PM -0400, Tom Lane wrote:
Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> writes:
Considering how rare this issue likely is, we need to be looking for a
solution that does not break the common case.
Agreed. What I think we need to focus on next is why the code keeps
increasing the number of batches. It seems like there must be an undue
amount of data all falling into the same bucket ... but if it were simply
a matter of a lot of duplicate hash keys, the growEnabled shutoff
heuristic ought to trigger.
I think it's really a matter of underestimate, which convinces the planner
to hash the larger table. In this case, the table is 42GB, so it's
possible it actually works as expected. With work_mem = 4MB I've seen 32k
batches, and that's not that far off, I'd day. Maybe there are more common
values, but it does not seem like a very contrived data set.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services