Re: Query using SeqScan instead of IndexScan

Brendan Duddridge <brendan@xxxxxxxxxxxxxx> · Fri, 31 Mar 2006 18:31:47 -0700

Ah I see. Ok, well we have a very wide variety here...

category_id | count
-------------+-------
     1000521 | 31145
     1001211 | 22991
     1001490 | 22019
     1001628 | 12472
     1000046 | 10480
     1000087 | 10338
     1001223 | 10020
     1001560 |  9532
     1000954 |  8633
     1001314 |  8191
     1001482 |  8140
     1001556 |  7959
     1001481 |  7850
[snip...]
     1001133 |     1
     1000532 |     1
     1000691 |     1
     1000817 |     1
     1000783 |     1
     1000689 |     1

(1157 rows)

So what's the best kind of query to handle this kind of data to make  
it fast in all cases? I'd like get down to sub-second response times.

currently we have:

select cp.product_id
             from category_product cp, product_attribute_value pav
             where cp.category_id = 1001082 and cp.product_id =
             pav.product_id;

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@xxxxxxxxxxxxxx

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Mar 31, 2006, at 6:23 PM, chris smith wrote:

On 4/1/06, Brendan Duddridge <brendan@xxxxxxxxxxxxxx> wrote:
Hi Jim,

I'm not quite sure what you mean by the correlation of category_id?

It means how many distinct values does it have (at least that's my
understanding of it ;) ).

select category_id, count(*) from category_product group by  
category_id;

will show you how many category_id's there are and how many products
are in each category.

Having a lot of products in one category (or having a small amount of
categories) can slow things down because the db can't use the index
effectively.. which might be what you're seeing (hence why it's fast
for some categories, slow for others).

On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote:

What's the correlation of category_id? The current index scan cost
estimator places a heavy penalty on anything with a correlation much
below about 90%.

On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote:
Hi,

I have a query that is using a sequential scan instead of an index
scan. I've turned off sequential scans and it is in fact faster  
with
the index scan.

Here's my before and after.

Before:

ssdev=# SET enable_seqscan TO DEFAULT;
ssdev=# explain analyze select cp.product_id
             from category_product cp, product_attribute_value pav
             where cp.category_id = 1001082 and cp.product_id =
             pav.product_id;

           QUERY PLAN
------------------------------------------------------------------- 
--
---
------------------------------------------------------------------- 
--
---
------------------------------
Hash Join  (cost=25.52..52140.59 rows=5139 width=4) (actual
time=4.521..2580.520 rows=19695 loops=1)
   Hash Cond: ("outer".product_id = "inner".product_id)
   ->  Seq Scan on product_attribute_value pav   
(cost=0.00..40127.12
rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846
loops=1)
   ->  Hash  (cost=23.10..23.10 rows=970 width=4) (actual
time=2.267..2.267 rows=1140 loops=1)
         ->  Index Scan using  
x_category_product__category_id_fk_idx
on category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual
time=0.122..1.395 rows=1140 loops=1)
               Index Cond: (category_id = 1001082)
Total runtime: 2584.221 ms
(7 rows)

After:

ssdev=# SET enable_seqscan TO false;
ssdev=# explain analyze select cp.product_id
             from category_product cp, product_attribute_value pav
             where cp.category_id = 1001082 and cp.product_id =
             pav.product_id;

              QUERY PLAN
------------------------------------------------------------------- 
--
---
------------------------------------------------------------------- 
--
---
-------------------------------------
Nested Loop  (cost=0.00..157425.22 rows=5139 width=4) (actual
time=0.373..71.177 rows=19695 loops=1)
   ->  Index Scan using x_category_product__category_id_fk_idx on
category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual
time=0.129..1.438 rows=1140 loops=1)
         Index Cond: (category_id = 1001082)
   ->  Index Scan using product_attribute_value__product_id_fk_idx
on product_attribute_value pav  (cost=0.00..161.51 rows=61 width=4)
(actual time=0.016..0.053 rows=17 loops=1140)
         Index Cond: ("outer".product_id = pav.product_id)
Total runtime: 74.747 ms
(6 rows)

There's quite a big difference in speed there. 2584.221 ms vs.  
74.747
ms.

Any ideas what I can do to improve this without turning sequential
scanning off?

Thanks,

___________________________________________________________________ 
_
Brendan Duddridge | CTO | 403-277-5591 x24 |   
brendan@xxxxxxxxxxxxxx

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@xxxxxxxxxxxxx
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

---------------------------(end of
broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that
your
       message can get through to the mailing list cleanly

--
Postgresql & php tutorials
http://www.designmagick.com/

Attachment:
smime.p7s

Description: S/MIME cryptographic signature