John Major skrev: > I am trying to join three quite large tables, and the query is > unbearably slow(meaning I can't get results in more than a day of > processing). > I've tried the basic optimizations I understand, and nothing has > improved the execute speed.... any help with this would be greatly > appreciated > > > The three tables are quite large: > sequence_fragment = 4.5 million rows > sequence_external_info = 10million rows > sequence_alignment = 500 million rows > > > The query I am attempting to run is this: > > select sf.library_id, fio.clip_type , count(distinct(sa.sequence_id)) > from sequence_alignment sa, sequence_fragment sf, > fragment_external_info fio > where sf.seq_frag_id = fio.sequence_frag_id > and sf.sequence_id = sa.sequence_id > group by sf.library_id, fio.clip_type > > > NOTES: > ~there are indexes on all of the fields being joined (but not on > library_id or clip_type ). ~Everything has been re-analyzed post index > creation What are the primary (and candidate) keys of the tables? Are any of the fields nullable? How many distinct values exist for sequence_alignment.sequence_id? > ~I've tried "set enable_seqscan=off" and set (join_table_order or > something) = 1 It would help if you turned the settings back to defaults before doing the ANALYZE - or provide the results of that case as well. > The explain plan is as follows: [cut] Without trying to understand the ANALYZE output, I would suggest two possible optimizations: - Using count(distinct(sf.sequence_id)) instead of count(distinct(sa.sequence_id)). - Replacing the join to sequence_alignment with "WHERE sf.sequence_id IN (SELECT sequence_id from sequence_alignment)". The first one probably won't help (nor hurt), but the second one might be able to get rid of the table scan, or at least the need do the full merge join (which returns an estimated 3 billion rows). Hope this helps, Nis ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend