I have two identical databases that run the same query each morning. Starting this morning, something caused the first db to start using a different execution plan for the query, resulting in much worse performance. I've have tried several things this morning, but I am currently stumped on what would be causing the different execution plans.
The query and the results of the explain analyze on the two db's:
db1=> explain analyze
select
t1.bn,
t2.mu,
t1.nm,
t1.root,
t1.suffix,
t1.type
from
t1,
t2
where
t2.eff_dt = current_date
and t1.active = true
and t1.bn = t2.sn;
The slower plan used on db1:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=145.12..38799.61 rows=7876 width=47) (actual time=6.494..352.166 rows=8437 loops=1)
-> Bitmap Heap Scan on t2 (cost=145.12..19464.74 rows=10898 width=22) (actual time=6.472..22.684 rows=12204 loops=1)
Recheck Cond: (eff_dt = ('now'::text)::date)
-> Bitmap Index Scan on t2_nu1 (cost=0.00..142.40 rows=10898 width=0) (actual time=4.013..4.013 rows=24482 loops=1)
Index Cond: (eff_dt = ('now'::text)::date)
-> Index Scan using t1_uc2 on t1 (cost=0.00..1.76 rows=1 width=32) (actual time=0.012..0.026 rows=1 loops=12204)
Index Cond: ((t1.bn)::text = (t2.sn)::text)
Filter: active
Total runtime: 353.629 ms
(9 rows)
Time: 354.795 ms
And the faster plan from db2:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=21371.63..21720.78 rows=7270 width=47) (actual time=60.412..80.865 rows=8437 loops=1)
Merge Cond: ("outer"."?column6?" = "inner"."?column3?")
-> Sort (cost=8988.56..9100.55 rows=44794 width=32) (actual time=30.685..33.370 rows=8438 loops=1)
Sort Key: (t1.bn)::text
-> Seq Scan on t1 (cost=0.00..5528.00 rows=44794 width=32) (actual time=0.008..18.280 rows=8439 loops=1)
Filter: active
-> Sort (cost=12383.07..12409.32 rows=10500 width=22) (actual time=29.718..33.515 rows=12204 loops=1)
Sort Key: (t2.sn)::text
-> Index Scan using t2_nu1 on t2 (cost=0.00..11681.77 rows=10500 width=22) (actual time=0.052..13.295 rows=12204 loops=1)
Index Cond: (eff_dt = ('now'::text)::date)
Total runtime: 83.385 ms
(11 rows)
t2.eff_dt is defined as a date, t1.active is a boolean, all other fields are varchar. Table t1 has a unique index (uc2) on field bn and a second unique index (uc3) on fields (root, suffix). Table t2 has a unique index (uc1) on (sn, eff_dt), and a non-unique index (nu1) on eff_dt.
Table t1 has 12204 rows. Table t2 has 7.1M rows, 12204 of which have eff_dt = current_date.
Both database have autovacuum turned on, and both have been vacuumed and analyzed in the last 24 hours.
Any ideas as to what could the first db to opt for the slower subquery rather than the merge?
Thanks in advance.
The query and the results of the explain analyze on the two db's:
db1=> explain analyze
select
t1.bn,
t2.mu,
t1.nm,
t1.root,
t1.suffix,
t1.type
from
t1,
t2
where
t2.eff_dt = current_date
and t1.active = true
and t1.bn = t2.sn;
The slower plan used on db1:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=145.12..38799.61 rows=7876 width=47) (actual time=6.494..352.166 rows=8437 loops=1)
-> Bitmap Heap Scan on t2 (cost=145.12..19464.74 rows=10898 width=22) (actual time=6.472..22.684 rows=12204 loops=1)
Recheck Cond: (eff_dt = ('now'::text)::date)
-> Bitmap Index Scan on t2_nu1 (cost=0.00..142.40 rows=10898 width=0) (actual time=4.013..4.013 rows=24482 loops=1)
Index Cond: (eff_dt = ('now'::text)::date)
-> Index Scan using t1_uc2 on t1 (cost=0.00..1.76 rows=1 width=32) (actual time=0.012..0.026 rows=1 loops=12204)
Index Cond: ((t1.bn)::text = (t2.sn)::text)
Filter: active
Total runtime: 353.629 ms
(9 rows)
Time: 354.795 ms
And the faster plan from db2:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=21371.63..21720.78 rows=7270 width=47) (actual time=60.412..80.865 rows=8437 loops=1)
Merge Cond: ("outer"."?column6?" = "inner"."?column3?")
-> Sort (cost=8988.56..9100.55 rows=44794 width=32) (actual time=30.685..33.370 rows=8438 loops=1)
Sort Key: (t1.bn)::text
-> Seq Scan on t1 (cost=0.00..5528.00 rows=44794 width=32) (actual time=0.008..18.280 rows=8439 loops=1)
Filter: active
-> Sort (cost=12383.07..12409.32 rows=10500 width=22) (actual time=29.718..33.515 rows=12204 loops=1)
Sort Key: (t2.sn)::text
-> Index Scan using t2_nu1 on t2 (cost=0.00..11681.77 rows=10500 width=22) (actual time=0.052..13.295 rows=12204 loops=1)
Index Cond: (eff_dt = ('now'::text)::date)
Total runtime: 83.385 ms
(11 rows)
t2.eff_dt is defined as a date, t1.active is a boolean, all other fields are varchar. Table t1 has a unique index (uc2) on field bn and a second unique index (uc3) on fields (root, suffix). Table t2 has a unique index (uc1) on (sn, eff_dt), and a non-unique index (nu1) on eff_dt.
Table t1 has 12204 rows. Table t2 has 7.1M rows, 12204 of which have eff_dt = current_date.
Both database have autovacuum turned on, and both have been vacuumed and analyzed in the last 24 hours.
Any ideas as to what could the first db to opt for the slower subquery rather than the merge?
Thanks in advance.