Hi, I'm new to tuning PostgreSQL and I have a query that gets slower
after I run a vacuum analyze. I believe it uses a Hash Join before
the analyze and a Nested Loop IN Join after. It seems the Nested
Loop IN Join estimates the correct number of rows, but underestimates
the amount of time required. I am curious why the vacuum analyze
makes it slower and if that gives any clues as too which parameter I
should be tuning.
BTW, I know the query could be re-structured more cleanly to remove
the sub-selects, but that doesn't seem to impact the performance.
thanks,
Jeff
Welcome to psql 8.1.5, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit
plm_demo=# explain analyze SELECT count(*) AS count_all FROM symptoms
WHERE ( 1=1 and symptoms.id in (select symptom_id from
symptom_reports sr where 1=1 and sr.user_id in (select id from users
where disease_id=1))) ;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-----------------------
Aggregate (cost=366.47..366.48 rows=1 width=0) (actual
time=125.093..125.095 rows=1 loops=1)
-> Hash Join (cost=362.41..366.38 rows=36 width=0) (actual
time=124.162..124.859 rows=106 loops=1)
Hash Cond: ("outer".id = "inner".symptom_id)
-> Seq Scan on symptoms (cost=0.00..3.07 rows=107
width=4) (actual time=0.032..0.295 rows=108 loops=1)
-> Hash (cost=362.25..362.25 rows=67 width=4) (actual
time=124.101..124.101 rows=106 loops=1)
-> HashAggregate (cost=361.58..362.25 rows=67
width=4) (actual time=123.628..123.854 rows=106 loops=1)
-> Hash IN Join (cost=35.26..361.41 rows=67
width=4) (actual time=9.767..96.372 rows=13074 loops=1)
Hash Cond: ("outer".user_id = "inner".id)
-> Seq Scan on symptom_reports sr
(cost=0.00..259.65 rows=13165 width=8) (actual time=0.029..33.359
rows=13074 loops=1)
-> Hash (cost=35.24..35.24 rows=11
width=4) (actual time=9.696..9.696 rows=1470 loops=1)
-> Bitmap Heap Scan on users
(cost=2.04..35.24 rows=11 width=4) (actual time=0.711..6.347
rows=1470 loops=1)
Recheck Cond: (disease_id = 1)
-> Bitmap Index Scan on
users_disease_id_index (cost=0.00..2.04 rows=11 width=0) (actual
time=0.644..0.644 rows=2378 loops=1)
Index Cond: (disease_id
= 1)
Total runtime: 134.045 ms
(15 rows)
plm_demo=# vacuum analyze;
VACUUM
plm_demo=# analyze;
ANALYZE
plm_demo=# explain analyze SELECT count(*) AS count_all FROM symptoms
WHERE ( 1=1 and symptoms.id in (select symptom_id from
symptom_reports sr where 1=1 and sr.user_id in (select id from users
where disease_id=1))) ;
QUERY
PLAN
------------------------------------------------------------------------
-----------------------------------------------------------------
Aggregate (cost=586.47..586.48 rows=1 width=0) (actual
time=3441.385..3441.386 rows=1 loops=1)
-> Nested Loop IN Join (cost=149.05..586.26 rows=85 width=0)
(actual time=54.517..3441.115 rows=106 loops=1)
Join Filter: ("outer".id = "inner".symptom_id)
-> Seq Scan on symptoms (cost=0.00..3.08 rows=108
width=4) (actual time=0.007..0.273 rows=108 loops=1)
-> Hash IN Join (cost=149.05..603.90 rows=13074 width=4)
(actual time=0.078..24.503 rows=3773 loops=108)
Hash Cond: ("outer".user_id = "inner".id)
-> Seq Scan on symptom_reports sr
(cost=0.00..258.74 rows=13074 width=8) (actual time=0.003..9.044
rows=3773 loops=108)
-> Hash (cost=145.38..145.38 rows=1470 width=4)
(actual time=7.608..7.608 rows=1470 loops=1)
-> Seq Scan on users (cost=0.00..145.38
rows=1470 width=4) (actual time=0.006..4.353 rows=1470 loops=1)
Filter: (disease_id = 1)
Total runtime: 3441.452 ms
(11 rows)