Hello colleagues – The problem description: We're moving from 9.6 to 11.5. There is a SQL code that never ends in 11.5 but works fine in 9.6. The main cause is the optimizer considers of using NL Anti join instead of Merge in 9.6. And the root cause - wrong estimation while self-joining. System environment: CentOS Linux 3.10.0-1062.4.1.el7.x86_64 x86_64 MemTotal: 16266644 kB Intel(R) Xeon(R) CPU E7-8867 v3 @ 2.50GHz HDD - unknown PostgreSQL: PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit shared_buffers = 1GB huge_pages = on temp_buffers = 1GB max_prepared_transactions = 128 max_connections = 500 work_mem = 256MB maintenance_work_mem = 1024MB autovacuum_work_mem = 512MB max_worker_processes = 100 max_parallel_workers_per_gather = 0 # changing this value to any others takes no effect for issue resolving max_parallel_workers = 8 checkpoint_timeout = 30min max_wal_size = 32GB min_wal_size = 8GB checkpoint_completion_target = 0.9 enable_nestloop = on # off value fixes the issue but this is wrong way random_page_cost = 4.0 effective_cache_size = 4GB default_statistics_target = 2000 Main script: -- preparation -- this table is reverted tree with tree keys position_uuid, parent_position_uuid create temporary table tmp_nsi_klp on commit drop as select k1.gid, k1.smnn_gid, k1.position_uuid, p.parent_position_uuid, k1.order_number, k1.date_start, k1.date_end, k1.is_active, coalesce(p.is_fake_series, false) as is_fake_series from nsi_klp k1 left join (select gid, unnest(parent_position_uuid) as parent_position_uuid, coalesce(array_length(parent_position_uuid, 1),0) > 1 as is_fake_series from nsi_klp where version_esklp = '2.0') p using (gid) where k1.version_esklp = '2.0' ; create unique index tmp_nsi_klp_ui on tmp_nsi_klp(gid, parent_position_uuid); analyze tmp_nsi_klp; -- working set (!!This SQL never ends in 11.5 now) create temporary table tmp_klp_replace on commit drop as select distinct on (klp_gid) * from ( select k2.gid as klp_gid, k2.smnn_gid as klp_smnn_gid, k2.position_uuid as klp_position_uuid, k2.order_number as klp_order_number, k2.is_active as klp_is_active, k1.gid as klp_child_gid, k1.smnn_gid as klp_child_smnn_gid, k1.position_uuid as klp_child_position_uuid, k1.order_number as klp_child_order_number, k1.is_active as klp_child_is_active from tmp_nsi_klp k1 join tmp_nsi_klp k2 on (k2.position_uuid = k1.parent_position_uuid) union all select k1.gid as klp_gid, k1.smnn_gid as klp_smnn_gid, k1.position_uuid as klp_position_uuid, k1.order_number as klp_order_number, k1.is_active as klp_is_active, null as klp_child_gid, null as klp_child_smnn_gid, null as klp_child_position_uuid, null as klp_child_order_number, null as klp_child_is_active from tmp_nsi_klp k1 left join tmp_nsi_klp k2 on (k1.position_uuid = k2.parent_position_uuid) left join (select position_uuid from tmp_nsi_klp where not is_fake_series group by position_uuid having count(1) > 1) klp_series on (klp_series.position_uuid = k1.position_uuid) where -- not exists(select 1 from tmp_nsi_klp k2 where k1.position_uuid = k2.parent_position_uuid) k2.gid is null -- none referenced and klp_series.position_uuid is null -- klp series with the same position_uuid ) a order by klp_gid, klp_order_number desc ; Characteristics of source table - tmp_nsi_klp: create table tmp_nsi_klp ( gid uuid NULL, -- not null by the fact smnn_gid uuid NULL, -- not null by the fact position_uuid uuid NULL, -- not null by the fact parent_position_uuid uuid NULL, order_number int8 NULL, date_start timestamp NULL, -- not null by the fact date_end timestamp NULL, is_active bool NULL, -- not null by the fact is_fake_series bool NULL -- not null by the fact ); Rows: 237279 Cols stats: https://docs.google.com/spreadsheets/d/1Ocbult13kZ64vK9nHt-_BV3EENK_ZSHFTAmRZLISUIE/edit?usp=sharing Execution plans for problematic query - working set "create temporary table tmp_klp_replace on commit drop as": On 11.5 (option nestloop enabled): https://explain.depesz.com/s/pIzd Exec time: never finished On 9.6 (PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit): https://explain.depesz.com/s/sO0G Exec time: ~1 sec On 11.5 (option nestloop disabled): https://explain.depesz.com/s/eYzk Exec time: ~1,5 sec Construction "not exists(select 1 from tmp_nsi_klp k2 where k1.position_uuid = k2.parent_position_uuid)" works perfectly but there are lots of similar constructions in a code made for checking inclusion of data. Thus no chances to change existing code to another using not exists construction. Are there any options to bring initial statement to life and keep the server option nestloop enable? Give me a clue, pls. Thanks in advance. Andrew. |