Re: Out of Memory errors are frustrating as heck!

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> · Tue, 23 Apr 2019 23:46:52 +0200

On Tue, Apr 23, 2019 at 03:43:48PM -0500, Justin Pryzby wrote:
On Tue, Apr 23, 2019 at 04:37:50PM -0400, Gunther wrote:
On 4/21/2019 23:09, Tomas Vondra wrote:
>What I think might work better is the attached v2 of the patch, with a
Thanks for this, and I am trying this now.
...
Aaaaaand, it's a winner!

Unique (cost=5551524.36..5554207.33 rows=34619 width=1197) (actual time=6150303.060..6895451.210 rows=435274 loops=1)
-> Sort (cost=5551524.36..5551610.91 rows=34619 width=1197) (actual time=6150303.058..6801372.192 rows=113478386 loops=1)
Sort Method: external merge Disk: 40726720kB

For the first time this query has succeeded now. Memory was bounded. The
time of nearly hours is crazy, but things sometimes take that long

It wrote 40GB tempfiles - perhaps you can increase work_mem now to improve the
query time.

That's unlikely to reduce the amount of data written to temporary files,
it just means there will be fewer larger files - in total it's still
going to be ~40GB. And it's not guaranteed it'll improve performance,
because work_mem=4MB might fit into CPU caches and larger values almost
certainly won't. I don't think there's much to gain, really.

We didn't address it yet, but your issue was partially caused by a misestimate.
It's almost certainly because these conditions are correlated, or maybe
redundant.

Right. Chances are that with a bettwe estimate the optimizer would pick
merge join instead. I wonder if that would be significantly faster.

Merge Cond: (((documentinformationsubject.documentinternalid)::text =
(documentinformationsubject_1.documentinternalid)::text) AND
((documentinformationsubject.documentid)::text =
(documentinformationsubject_1.documentid)::text) AND
((documentinformationsubject.actinternalid)::text =
(documentinformationsubject_1.actinternalid)::text))

If they're completely redundant and you can get the same result after
dropping one or two of those conditions, then you should.

Alternately, if they're correlated but not redundant, you can use PG10
"dependency" statistics (CREATE STATISTICS) on the correlated columns
(and ANALYZE).

That's not going to help, because we don't use functional dependencies
in join estimation yet.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services