Hi
I am having a rare issue with extremely inefficient merge join. The query plan indicates that PG is doing some kind of nested loop, although an index is present.
PostgreSQL 9.6.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
Schema of dir_current (some columns left out for brevity):
starfish=# \d sf.dir_current
Table "sf.dir_current"
Column | Type | Collation | Nullable | Default
--------------------+-------------+-----------+----------+-----------------------------------------------
id | bigint | | not null | nextval('sf.object_id_seq'::regclass)
volume_id | bigint | | not null |
parent_id | bigint | | |
blocks | sf.blkcnt_t | | |
rec_aggrs | jsonb | | not null |
Indexes:
"dir_current_pk" PRIMARY KEY, btree (id), tablespace "sf_current"
"dir_current_parentid_idx" btree (parent_id), tablespace "sf_current"
"dir_current_volume_id_out_of_sync_time_idx" btree (volume_id, out_of_sync_time) WHERE out_of_sync_time IS NOT NULL, tablespace "sf_current"
"dir_current_volume_id_path_unq_idx" UNIQUE, btree (volume_id, path text_pattern_ops), tablespace "sf_current"
"dir_current_volumeid_id_unq" UNIQUE CONSTRAINT, btree (volume_id, id), tablespace "sf_current"
Foreign-key constraints:
"dir_current_parentid_fk" FOREIGN KEY (parent_id) REFERENCES sf.dir_current(id) DEFERRABLE INITIALLY DEFERRED
Table "sf.dir_current"
Column | Type | Collation | Nullable | Default
--------------------+-------------+-----------+----------+-----------------------------------------------
id | bigint | | not null | nextval('sf.object_id_seq'::regclass)
volume_id | bigint | | not null |
parent_id | bigint | | |
blocks | sf.blkcnt_t | | |
rec_aggrs | jsonb | | not null |
Indexes:
"dir_current_pk" PRIMARY KEY, btree (id), tablespace "sf_current"
"dir_current_parentid_idx" btree (parent_id), tablespace "sf_current"
"dir_current_volume_id_out_of_sync_time_idx" btree (volume_id, out_of_sync_time) WHERE out_of_sync_time IS NOT NULL, tablespace "sf_current"
"dir_current_volume_id_path_unq_idx" UNIQUE, btree (volume_id, path text_pattern_ops), tablespace "sf_current"
"dir_current_volumeid_id_unq" UNIQUE CONSTRAINT, btree (volume_id, id), tablespace "sf_current"
Foreign-key constraints:
"dir_current_parentid_fk" FOREIGN KEY (parent_id) REFERENCES sf.dir_current(id) DEFERRABLE INITIALLY DEFERRED
dir_process is created as a temporary table:
CREATE TEMP TABLE dir_process AS (
SELECT sf.dir_current.id, volume_id, parent_id, depth, size, blocks, atime, ctime, mtime, sync_time, local_aggrs FROM sf.dir_current
WHERE ....
);
CREATE INDEX dir_process_indx ON dir_process(volume_id, id);
ANALYZE dir_process;
SELECT sf.dir_current.id, volume_id, parent_id, depth, size, blocks, atime, ctime, mtime, sync_time, local_aggrs FROM sf.dir_current
WHERE ....
);
CREATE INDEX dir_process_indx ON dir_process(volume_id, id);
ANALYZE dir_process;
and usually contains a few thousands rows.
Slow query:
SELECT dir.id, dir.volume_id, dir.parent_id, dir.rec_aggrs, dir.blocks FROM sf.dir_current AS dir
INNER JOIN dir_process ON dir.parent_id = dir_process.id AND dir.volume_id = dir_process.volume_id
WHERE dir.volume_id = ANY(volume_ids)
INNER JOIN dir_process ON dir.parent_id = dir_process.id AND dir.volume_id = dir_process.volume_id
WHERE dir.volume_id = ANY(volume_ids)
dir_current contains around 750M rows altogether, and there is ca. 1.75M rows with volume_id=5.
Sometimes Postgres will choose very inefficient plan, which involves looping many times over same rows, producing hundreds of millions or billions of rows:
LOG: duration: 1125530.496 ms plan:
Merge Join (cost=909.42..40484.01 rows=1 width=456) (actual rows=1 loops=1)
Merge Cond: (dir.volume_id = dir_process.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 13583132483
-> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..884052.46 rows=601329 width=456) (actual rows=2000756 loops=1)
Index Cond: (volume_id = ANY ('{5}'::bigint[]))
-> Sort (cost=909.31..912.70 rows=6789 width=16) (actual rows=13581131729 loops=1)
Sort Key: dir_process.volume_id
Sort Method: quicksort Memory: 511kB
-> Seq Scan on dir_process (cost=0.00..822.89 rows=6789 width=16) (actual rows=6789 loops=1)
Merge Cond: (dir.volume_id = dir_process.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 13583132483
-> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..884052.46 rows=601329 width=456) (actual rows=2000756 loops=1)
Index Cond: (volume_id = ANY ('{5}'::bigint[]))
-> Sort (cost=909.31..912.70 rows=6789 width=16) (actual rows=13581131729 loops=1)
Sort Key: dir_process.volume_id
Sort Method: quicksort Memory: 511kB
-> Seq Scan on dir_process (cost=0.00..822.89 rows=6789 width=16) (actual rows=6789 loops=1)
LOG: duration: 3923310.224 ms plan:
Merge Join (cost=0.17..4324.64 rows=1 width=456) (actual rows=529 loops=1)
Merge Cond: (dir_process.volume_id = dir.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 831113021
-> Index Only Scan using dir_process_indx on dir_process (cost=0.05..245.00 rows=450 width=16) (actual rows=450 loops=1)
Heap Fetches: 450
-> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..884052.46 rows=601329 width=456) (actual rows=831113101 loops=1)
Index Cond: (volume_id = ANY ('{5}'::bigint[]))
Merge Join (cost=0.17..4324.64 rows=1 width=456) (actual rows=529 loops=1)
Merge Cond: (dir_process.volume_id = dir.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 831113021
-> Index Only Scan using dir_process_indx on dir_process (cost=0.05..245.00 rows=450 width=16) (actual rows=450 loops=1)
Heap Fetches: 450
-> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..884052.46 rows=601329 width=456) (actual rows=831113101 loops=1)
Index Cond: (volume_id = ANY ('{5}'::bigint[]))
LOG: duration: 10140968.829 ms plan:
Merge Join (cost=0.17..8389.13 rows=1 width=456) (actual rows=819 loops=1)
Merge Cond: (dir_process.volume_id = dir.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 2153506735
-> Index Only Scan using dir_process_indx on dir_process (cost=0.06..659.76 rows=1166 width=16) (actual rows=1166 loops=1)
Heap Fetches: 1166
-> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..885276.20 rows=602172 width=456) (actual rows=2153506389 loops=1)
Index Cond: (volume_id = ANY ('{5}'::bigint[]))
Merge Join (cost=0.17..8389.13 rows=1 width=456) (actual rows=819 loops=1)
Merge Cond: (dir_process.volume_id = dir.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 2153506735
-> Index Only Scan using dir_process_indx on dir_process (cost=0.06..659.76 rows=1166 width=16) (actual rows=1166 loops=1)
Heap Fetches: 1166
-> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..885276.20 rows=602172 width=456) (actual rows=2153506389 loops=1)
Index Cond: (volume_id = ANY ('{5}'::bigint[]))
Note that
2153506389 /
1166 = 1 846 918. Similarly
831113101 /
450 = 1 846 918.
I wonder how I can help Postgres query planner to choose a faster plan?
--
Marcin Gozdalik