I am in process of migrating a database from Pg 7.4.3 to Pg 8.0.3. I
dumped and reloaded the data, and things look good, until I try to
work with it. When I try to make a query to a view based on a large
join (select * from 'view' limit 10), the 7.4.3 query works very
well, and the indexes are being used well. But, when I go to 8.0.3,
there are some very costly joins, and the query is not workable.
(I've put the 'EXPLAIN' results below for the same database, and the
same query.) Also, the memory settings have been made the same for
7.4 vs. 8.0.
I should point out that some of the indexes (and joins) are based on
the type 'citext'. (http://gborg.postgresql.org/project/citext/
projdisplay.php). I did compile and add this to my Pg 8
installation, but I am concerned that this might be the root
problem. There seems to be no update on this work since sometime in
2003, and it might not be appropriate to use in 8.0.3. But, it seems
that indexes are being used when I do searches on the individual
tables. So, I'm a bit stumped at the moment.
So, I'd like to ask advice on my query, and what might be the cause.
If 'citext' is the culprit, how can I migrate a very large database
which uses this datatype extensively.
Any help is greatly appreciated.
Thanks,
-albert
===
In Pg 8.0.3--
EXPLAIN select * from snp_quality_overview limit 10;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-----------------
Limit (cost=83734.45..84167.07 rows=10 width=415)
-> Hash Join (cost=83734.45..10949200.55 rows=251154 width=415)
Hash Cond: ("outer".panel_id = "inner".panel_id)
-> Nested Loop (cost=83733.39..10924084.09 rows=251154
width=359)
-> Nested Loop (cost=83733.39..5685331.85
rows=253240 width=324)
-> Hash Join (cost=83733.39..3840930.37
rows=423746 width=291)
Hash Cond: ("outer".assay_id =
"inner".assay_id)
-> Nested Loop (cost=862.47..3680521.72
rows=602829 width=259)
-> Hash Join
(cost=862.47..258437.41 rows=602829 width=195)
Hash Cond:
("outer".genotype_submission_id = "inner".genotype_submission_id)
-> Seq Scan on genotype_set
gs (cost=0.00..218413.11 rows=6626211 width=117)
-> Hash
(cost=861.68..861.68 rows=316 width=94)
-> Nested Loop
(cost=450.77..861.68 rows=316 width=94)
Join Filter:
("inner".lsid = "outer".prot_lsid)
-> Merge Join
(cost=446.02..671.68 rows=358 width=83)
Merge Cond:
("outer".batch_file_id = "inner".batch_file_id)
-> Index
Scan using batch_file_id3 on genotype_submission gsb
(cost=0.00..210.81 rows=3472 width=75)
-> Sort
(cost=446.02..447.31 rows=517 width=20)
Sort
Key: batch_file.batch_file_id
->
Merge Join (cost=23.66..422.72 rows=517 width=20)
Merge Cond: ("outer".monthly_batch_id = "inner".monthly_batch_id)
Join Filter: ("outer".center_name = "inner".center_name)
-> Index Scan using monthly_batch_id on batch_file
(cost=0.00..298.59 rows=5024 width=24)
-> Sort (cost=23.66..24.02 rows=145 width=19)
Sort Key: monthly_batch.monthly_batch_id
-> Seq Scan on monthly_batch (cost=0.00..18.45 rows=145 width=19)
-> Materialize
(cost=4.74..4.97 rows=23 width=72)
-> Hash
Join (cost=1.10..4.72 rows=23 width=72)
Hash
Cond: ("outer".genotyping_platform_id = "inner".genotyping_platform_id)
->
Seq Scan on protocol p (cost=0.00..3.26 rows=26 width=65)
->
Hash (cost=1.08..1.08 rows=8 width=15)
-> Seq Scan on genotyping_platform gp (cost=0.00..1.08 rows=8
width=15)
-> Index Scan using lsid on assay
a (cost=0.00..5.66 rows=1 width=114)
Index Cond: (a.lsid =
"outer".assay_lsid)
-> Hash (cost=56880.53..56880.53
rows=2118953 width=48)
-> Seq Scan on assay2rs
(cost=0.00..56880.53 rows=2118953 width=48)
-> Index Scan using qcresult_genotype_set_id
on qcanalysis_result_plate qc_pl (cost=0.00..4.34 rows=1 width=33)
Index Cond: (qc_pl.genotype_set_id =
"outer".genotype_set_id)
-> Index Scan using
qcanalysis_result_panel_genotype_set_id_key on
qcanalysis_result_panel qc_pn (cost=0.00..20.61 rows=6 width=51)
Index Cond: (qc_pn.genotype_set_id =
"outer".genotype_set_id)
-> Hash (cost=1.05..1.05 rows=5 width=64)
-> Seq Scan on panel (cost=0.00..1.05 rows=5 width=64)
(42 rows)
====
In Pg 7.4.3--
hapmap_tracking=# explain select * from snp_quality_overview limit 10;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------
Limit (cost=1.06..303.34 rows=10 width=413)
-> Hash Join (cost=1.06..10120528.33 rows=334810 width=413)
Hash Cond: ("outer".panel_id = "inner".panel_id)
-> Nested Loop (cost=0.00..10086209.24 rows=334810
width=357)
-> Nested Loop (cost=0.00..6685494.20 rows=296445
width=322)
-> Nested Loop (cost=0.00..5118344.12
rows=438783 width=289)
-> Nested Loop (cost=0.00..2719551.73
rows=625532 width=258)
-> Nested Loop
(cost=0.00..216951.56 rows=625532 width=194)
-> Nested Loop
(cost=0.00..59958.71 rows=312 width=94)
Join Filter:
("outer".genotyping_platform_id = "inner".genotyping_platform_id)
-> Nested Loop
(cost=0.00..59549.25 rows=347 width=87)
-> Nested Loop
(cost=0.00..58308.55 rows=347 width=83)
Join
Filter: (("inner".monthly_batch_id = "outer".monthly_batch_id) AND
("inner".center_name = "outer".center_name))
-> Seq
Scan on monthly_batch (cost=0.00..21.38 rows=138 width=19)
-> Merge
Join (cost=0.00..372.99 rows=3292 width=87)
Merge
Cond: ("outer".batch_file_id = "inner".batch_file_id)
->
Index Scan using batch_file_id3 on genotype_submission gsb
(cost=0.00..175.60 rows=3292 width=75)
->
Index Scan using batch_file_pkey on batch_file (cost=0.00..144.36
rows=4768 width=24)
-> Index Scan
using lsid5 on protocol p (cost=0.00..3.56 rows=1 width=64)
Index Cond:
(p.lsid = "outer".prot_lsid)
-> Seq Scan on
genotyping_platform gp (cost=0.00..1.08 rows=8 width=15)
-> Index Scan using
genotype_submission_id on genotype_set gs (cost=0.00..464.24
rows=3115 width=116)
Index Cond:
("outer".genotype_submission_id = gs.genotype_submission_id)
-> Index Scan using lsid on assay
a (cost=0.00..3.99 rows=1 width=114)
Index Cond: (a.lsid =
"outer".assay_lsid)
-> Index Scan using assay2rs_assay_iduk
on assay2rs (cost=0.00..3.82 rows=1 width=47)
Index Cond: (assay2rs.assay_id =
"outer".assay_id)
-> Index Scan using qcresult_genotype_set_id
on qcanalysis_result_plate qc_pl (cost=0.00..3.56 rows=1 width=33)
Index Cond: (qc_pl.genotype_set_id =
"outer".genotype_set_id)
-> Index Scan using
qcanalysis_result_panel_genotype_set_id_key on
qcanalysis_result_panel qc_pn (cost=0.00..11.41 rows=5 width=51)
Index Cond: (qc_pn.genotype_set_id =
"outer".genotype_set_id)
-> Hash (cost=1.05..1.05 rows=5 width=64)
-> Seq Scan on panel (cost=0.00..1.05 rows=5 width=64)
(33 rows)
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend