I am trying to join three quite large tables, and the query is
unbearably slow(meaning I can't get results in more than a day of
processing).
I've tried the basic optimizations I understand, and nothing has
improved the execute speed.... any help with this would be greatly
appreciated
The three tables are quite large:
sequence_fragment = 4.5 million rows
sequence_external_info = 10million rows
sequence_alignment = 500 million rows
The query I am attempting to run is this:
select sf.library_id, fio.clip_type , count(distinct(sa.sequence_id))
from sequence_alignment sa, sequence_fragment sf,
fragment_external_info fio
where sf.seq_frag_id = fio.sequence_frag_id
and sf.sequence_id = sa.sequence_id
group by sf.library_id, fio.clip_type
NOTES:
~there are indexes on all of the fields being joined (but not on
library_id or clip_type ).
~Everything has been re-analyzed post index creation
~I've tried "set enable_seqscan=off" and set (join_table_order or
something) = 1
The explain plan is as follows:
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=1443436673.93..1480593403.29 rows=54
width=16)
-> Sort (cost=1443436673.93..1452725856.10 rows=3715672868
width=16)
Sort Key: sf.library_id,
fio.clip_type
-> Merge Join (cost=263624049.25..319410068.18
rows=3715672868
width=16)
Merge Cond: (sf.sequence_id =
sa.sequence_id)
-> Sort (cost=38102888.77..38128373.54 rows=10193906
width=16)
Sort Key:
sf.sequence_id
-> Hash Join (cost=5305576.14..36080036.76
rows=10193906
width=16)
Hash Cond: (fio.sequence_frag_id =
sf.seq_frag_id)
-> Index Scan using
frag_ext_info_seq_frag_id on fragment_external_info fio
(cost=0.00..30450510.27 rows=10193906 width=12)
-> Hash (cost=5223807.54..5223807.54
rows=4453728
width=12)
-> Index Scan using seq_frag_seqid_ind
on sequence_fragment sf (cost=0.00..5223807.54 rows=4453728
width=12)
-> Sort (cost=225521160.48..226688766.88 rows=467042560
width=4)
Sort Key:
sa.sequence_id
-> Seq Scan on sequence_alignment sa
(cost=100000000.00..110379294.60 rows=467042560
width=4)
15 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms]
Thanks in advance!
John Major
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq