On 3:37 pm 07/23/08 Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > "Francisco Reyes" <lists@xxxxxxxxxxxxxxxx> writes: > > SET ENABLE_SEQSCAN TO OFF; > > SET ENABLE_BITMAPSCAN TO OFF; > > Aggregate (cost=25665216.10..25665216.11 rows=1 width=12) (actual > > time=3088.894..3088.896 rows=1 loops=1) > > -> Nested Loop (cost=0.00..25662307.70 rows=387785 width=12) > > (actual time=0.264..2624.680 rows=194734 loops=1) > > -> Index Scan using join_ids_join_id on join_ids > > (cost=0.00..2867051.21 rows=5020 width=4) (actual > > time=0.237..1236.019 rows=4437 loops=1) > > Filter: (customer_id = ANY ('{1014,2485,4636,4635,12 > 55,547,374,580}'::integer[])) > > -> Index Scan using historical_join_id_date on historical > > (cost=0.00..4522.43 rows=1477 width=16) (actual > > time=0.010..0.153 rows=44 loops=4437) > > Index Cond: ((historical.join_id = > > join_ids.join_id) AND (historical.date > '2007-04-01'::date) > > AND (historical.date < '2008-05-01'::date)) > > Filter: (trans.f5 > 0::numeric) > > Total runtime: 3091.227 ms --> 3 seconds > > You might be more likely to get a sane plan if you had an index on > join_ids.customer_id. There is an index in join_ids: joinids_customerids_joinid" btree (customer_id, joinid) WITH (fillfactor=98) Also, that plan is only 3 seconds. That is as good as that is going to get. Or where you refering that the other plans would have been better? > The first indexscan above is really a > completely silly choice, and would never have been used if you > weren't holding a gun to the planner's head. I have much to learn about how to properly read an explain analyze, but as silly as that plan may look it outperforms the other plans by orders of magnitude. 3 seconds vs 12 minutes is a very big difference. It was so fast that I even compared the results (which happens to be a single row) to make sure I was getting the correct value. >The index isn't contributing any selectivity at all. Which index scan? Are analyze read bottom up right? If it is this one you are refering to: -> Index Scan using historical_join_id_date on historical > > (cost=0.00..4522.43 rows=1477 width=16) (actual > > time=0.010..0.153 rows=44 loops=4437) > > Index Cond: ((historical.join_id = > > join_ids.join_id) AND (historical.date > '2007-04-01'::date) > > AND (historical.date < '2008-05-01'::date)) > > Filter: (trans.f5 > 0::numeric) I believe that is the reason performance is good with that plan. The number of rows that need to be returned from historical is less than 1%. > The other part of the problem is the factor-of-thirty overestimate of > the number of rows that the inner indexscan will produce (which means > also a factor-of-thirty overestimate of its cost). Perhaps higher > statistics targets for these two relations would give you a better > estimate there. Is it possible to go over default_statistics_target = 1000? > since there's no additional join condition. What PG version are you > running exactly? 8.3.3 I have only been at this job for 3 months and I can say that neither the data, nor the previous design I am trying to replace play nice with postgresql. I can't get into specifics, but I can say that our "historical" tables have about 60% data that is not used in most queries. I think that is partly what throws off the planner so much. My first clue was when I saw the planner trying to do sequential scans to retrieve less than 1% of rows. It didn't make sense. I tried several schemes with partitioning and that was even worse. I am going to convert the tables structure names to the mapping names I used in these thread. Perhaps that may be informative.