Thomas Munro <thomas.munro@xxxxxxxxxxxxxxxx> writes: > On Wed, Nov 22, 2017 at 7:04 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >> Now, there's definitely something busted here; it should not have gone as >> far as 2 million batches before giving up on splitting. > I had been meaning to discuss this. We only give up when we reach the > point when a batch is entirely entirely kept or sent to a new batch > (ie splitting the batch resulted in one batch with the whole contents > and another empty batch). If you have about 2 million evenly > distributed keys and an ideal hash function, and then you also have 42 > billion keys that are the same (and exceed work_mem), we won't detect > extreme skew until the 2 million well behaved keys have been spread so > thin that the 42 billion keys are isolated in a batch on their own, > which we should expect to happen somewhere around 2 million batches. Yeah, I suspected it was something like that, but hadn't dug into the code yet. > I have wondered if our extreme skew detector needs to go off sooner. > I don't have a specific suggestion, but it could just be something > like 'you threw out or kept more than X% of the tuples'. Doing this, with some threshold like 95% or 99%, sounds plausible to me. I'd like to reproduce Cory's disk-space issue before we monkey with related logic, though; fixing the part we understand might obscure the part we still don't. regards, tom lane