On 4/19/2019 17:01, Justin Pryzby
wrote:
Were you able to reproduce the issue in some minimized way ? Like after
joining fewer tables or changing to join with fewer join conditions ?
On Thu, Apr 18, 2019 at 05:21:28PM +0200, Tomas Vondra wrote:
It would be possible to do at least one of these two things:
Thanks, and sorry for my pushyness. Yes, I have pin pointed the
HashJoin, and I have created the two tables involved.
The data distribution of the join keys, they are all essentially
UUIDs and essentially random.
I am sharing this data with you. However, only someone who can
actually control the planner can use it to reproduce the problem.
I have tried but not succeeded. But I am sure the problem is
reproduced by this material.
Here is the part of the plan that generates this massive number
of calls to
-> Hash Right Join (cost=4255031.53..5530808.71 rows=34619 width=1197)
Hash Cond: (((q.documentinternalid)::text = (documentinformationsubject.documentinternalid)::text) AND ((r.targetinternalid)::text = (documentinformationsubject.actinternalid)::text))
-> Hash Right Join (cost=1341541.37..2612134.36 rows=13 width=341)
Hash Cond: (((documentinformationsubject_2.documentinternalid)::text = (q.documentinternalid)::text) AND ((documentinformationsubject_2.actinternalid)::text = (q.actinternalid)::text))
... let's call this tmp_q ...
-> Hash (cost=2908913.87..2908913.87 rows=34619 width=930)
-> Gather Merge (cost=2892141.40..2908913.87 rows=34619 width=930)
... let's call this tmp_r ...
This can be logically reduced to the following query
SELECT *
FROM tmp_q q
RIGHT OUTER JOIN tmp_r r
USING(documentInternalId, actInternalId);
with the following two tables
CREATE TABLE xtmp_q (
documentinternalid character varying(255),
operationqualifiercode character varying(512),
operationqualifiername character varying(512),
actinternalid character varying(255),
approvalinternalid character varying(255),
approvalnumber character varying(555),
approvalnumbersystem character varying(555),
approvalstatecode character varying(512),
approvalstatecodesystem character varying(512),
approvaleffectivetimelow character varying(512),
approvaleffectivetimehigh character varying(512),
approvalstatuscode character varying(32),
licensecode character varying(512),
agencyid character varying(555),
agencyname text
);
CREATE TABLE tmp_r (
documentinternalid character varying(255),
is_current character(1),
documentid character varying(555),
documenttypecode character varying(512),
subjectroleinternalid character varying(255),
subjectentityinternalid character varying(255),
subjectentityid character varying(555),
subjectentityidroot character varying(555),
subjectentityname character varying,
subjectentitytel text,
subjectentityemail text,
otherentityinternalid character varying(255),
confidentialitycode character varying(512),
actinternalid character varying(255),
operationcode character varying(512),
operationname text,
productitemcode character varying(512),
productinternalid character varying(255)..
);
you can download the data here (URLs just a tiny bit obfuscated):
The small table http:// gusw dot net/tmp_q.gz
The big table is in the form of 9 parts of 20 MB each, http://
gusw dot net/tmp_r.gz.00, .01, .02, ..., .09, maybe you need only
the first part.
Download as many as you have patience to grab, and then import
the data like this:
\copy tmp_q from program 'zcat tmp_q.gz'
\copt tmp_r from program 'cat tmp_r.gz.* |zcat'
The only problem is that I can't test that this actually would
trigger the memory problem, because I can't force the plan to use
the right join, it always reverts to the left join hashing the
tmp_q:
-> Hash Left Join (cost=10.25..5601401.19 rows=5505039 width=12118)
Hash Cond: (((r.documentinternalid)::text = (q.documentinternalid)::text) AND ((r.actinternalid)::text = (q.actinternalid)::text))
-> Seq Scan on tmp_r r (cost=0.00..5560089.39 rows=5505039 width=6844)
-> Hash (cost=10.10..10.10 rows=10 width=6306)
-> Seq Scan on tmp_q q (cost=0.00..10.10 rows=10 width=6306)
which is of course much better, but when tmp_q and tmp_r are the
results of complex stuff that the planner can't estimate, then it
gets it wrong, and then the issue gets triggered because we are
hashing on the big tmp_r, not tmp_q.
It would be so nice if there was a way to force a specific plan
for purposes of the testing. I tried giving false data in
pg_class reltuples and relpages:
foo=# analyze tmp_q;
ANALYZE
foo=# analyze tmp_r;
ANALYZE
foo=# select relname, relpages, reltuples from pg_class where relname in ('tmp_q', 'tmp_r');
relname | relpages | reltuples
---------+----------+-------------
tmp_r | 5505039 | 1.13467e+08
tmp_q | 7 | 236
(2 rows)
foo=# update pg_class set (relpages, reltuples) = (5505039, 1.13467e+08) where relname = 'tmp_q';
UPDATE 1
foo=# update pg_class set (relpages, reltuples) = (7, 236) where relname = 'tmp_r';
UPDATE 1
but that didn't help. Somehow the planner outsmarts every such
trick, so I can't get it to follow my right outer join plan where
the big table is hashed. I am sure y'all know some way to force
it.
regards,
-Gunther