On Thu, 10 Jul 2008 11:50:01 +0200 Martijn van Oosterhout <kleptog@xxxxxxxxx> wrote: > On Thu, Jul 10, 2008 at 11:40:40AM +0200, Ivan Sergio Borgonovo > wrote: > > I've this: > > What's basically killing you is this condition: > > select i2.ItemID from catalog_items i2 > > inner join catalog_brands b2 on > > upper(i2.brands)=upper(b2.name) where i1.brands=i2.brands > > <********* and i2.dataPub>(now() - interval '8 month') and > > Is not indexable. Hence the seqscan, which makes everything slow. > In your "faster" version you test against a condition which *is* > indexable, hence it's faster. I changed to where upper(i1.brands)=upper(i2.brands) "Nested Loop (cost=0.00..1393962681.78 rows=74378 width=82)" " -> Seq Scan on catalog_brands b1 (cost=0.00..1.44 rows=44 width=18)" " -> Index Scan using catalog_items_brands_index on catalog_items i1 (cost=0.00..31680940.43 rows=1690 width=82)" " Index Cond: (upper((i1.brands)::text) = upper(("outer".name)::text))" " Filter: (subplan)" " SubPlan" " -> Limit (cost=9366.40..9366.41 rows=1 width=16)" " -> Sort (cost=9366.40..9366.41 rows=1 width=16)" " Sort Key: i2.datainserimento" " -> Nested Loop (cost=29.84..9366.39 rows=1 width=16)" " -> Bitmap Heap Scan on catalog_items i2 (cost=29.84..9364.61 rows=1 width=34)" " Recheck Cond: (upper(($0)::text) = upper((brands)::text))" " Filter: ((datapub > (now() - '8 mons'::interval)) AND (datainserimento > (now() - '6 mons'::interval)))" " -> Bitmap Index Scan on catalog_items_brands_index (cost=0.00..29.84 rows=3381 width=0)" " Index Cond: (upper(($0)::text) = upper((brands)::text))" " -> Seq Scan on catalog_brands b2 (cost=0.00..1.77 rows=1 width=18)" " Filter: (upper(($0)::text) = upper((name)::text))" but it still perform badly. Skipping one of the two join catalog_brands b1 on upper(i1.brands)=upper(b1.name) doesn't improve anything... even skipping some conditions, that I thought would actually make the query faster, restricting the rows to sort etc... and i2.dataPub>(now() - interval '8 month') and i2.datainserimento>(now() - interval '6 month') didn't improve the speed. And the sum of times it takes to execute the simpler statement for each brands even without waiting the end of the above statements is at least 1 order of magnitude faster than the more complicated query. catalog_brands is a quite small table so -> Seq Scan on catalog_brands b2 (cost=0.00..1.77 rows=1 width=18)" Filter: (upper(($0)::text) = upper((name)::text))" shouldn't be a problem and it seems that even the index is not playing such a big part since this that doesn't use the index select name, brands from catalog_items where brands='CAMBRIDGE UNIVERSITY PRESS' order by datainserimento desc limit 3 takes less than 1 sec. I'd say that having 44 groups and since the largest takes always less then 1 sec with the simpler query... there should be something else wrong with the above query that takes > 3 min. Infact: create temp table groupeditor as select i1.ItemID, i1.brands, i1.name, i1.dataPub, i1.datainserimento from catalog_items i1 inner join catalog_brands b1 on upper(i1.brands)=upper(b1.name); create index groupeditor_brands_idx on groupeditor (brands); create index groupeditor_ItemID_idx on groupeditor (ItemID); explain select i1.brands, i1.name, i1.dataPub, i1.datainserimento from groupeditor i1 where i1.ItemID in ( select i2.ItemID from groupeditor i2 where i1.brands=i2.brands and i2.dataPub>(now() - interval '8 month') and i2.datainserimento>(now() - interval '6 month') order by i2.datainserimento desc limit 3); "Seq Scan on groupeditor i1 (cost=0.00..197133363.99 rows=68583 width=1048)" " Filter: (subplan)" " SubPlan" " -> Limit (cost=1437.15..1437.16 rows=3 width=16)" " -> Sort (cost=1437.15..1437.34 rows=76 width=16)" " Sort Key: datainserimento" " -> Bitmap Heap Scan on groupeditor i2 (cost=7.40..1434.78 rows=76 width=16)" " Recheck Cond: (($0)::text = (brands)::text)" " Filter: ((datapub > (now() - '8 mons'::interval)) AND (datainserimento > (now() - '6 mons'::interval)))" " -> Bitmap Index Scan on groupeditor_brands_idx (cost=0.00..7.40 rows=686 width=0)" " Index Cond: (($0)::text = (brands)::text)" Creating the temp table takes up 3 sec, creating the indexes 3 sec... and the new query... still forever... Killing i2.dataPub>(now() - interval '8 month') and i2.datainserimento>(now() - interval '6 month') and moving it in the creation of the temp table made the above run in... 5 sec roughly... what took most was table and indexes creation... the query took 61ms. I could follow a more scientific measurement method (cache etc...) but still it looks pretty impressive... even more... dropping the indexes on the temp table even after restarting the server still make the whole process stay below 5 sec. As soon as groupeditor get larger (increasing the considered interval) the query get slower and slower... much more than O^2 and surely still not as fast as it would be to use several simpler statements. It still looks like a good idea to create a temp table so I'll have to sort over a smaller set... but still I'm puzzled. The subquery technique still perform awfully compared to the sums of times taken by simpler queries. Debian stable, pg 8.1 BTW what's going to happen to the indexes related to a temp table? -- Ivan Sergio Borgonovo http://www.webthatworks.it