Ah I see. Ok, well we have a very wide variety here... category_id | count -------------+------- 1000521 | 31145 1001211 | 22991 1001490 | 22019 1001628 | 12472 1000046 | 10480 1000087 | 10338 1001223 | 10020 1001560 | 9532 1000954 | 8633 1001314 | 8191 1001482 | 8140 1001556 | 7959 1001481 | 7850 [snip...] 1001133 | 1 1000532 | 1 1000691 | 1 1000817 | 1 1000783 | 1 1000689 | 1 (1157 rows)So what's the best kind of query to handle this kind of data to make it fast in all cases? I'd like get down to sub-second response times.
currently we have: select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; Thanks, ____________________________________________________________________ Brendan Duddridge | CTO | 403-277-5591 x24 | brendan@xxxxxxxxxxxxxx ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Mar 31, 2006, at 6:23 PM, chris smith wrote:
On 4/1/06, Brendan Duddridge <brendan@xxxxxxxxxxxxxx> wrote:Hi Jim, I'm not quite sure what you mean by the correlation of category_id?It means how many distinct values does it have (at least that's my understanding of it ;) ).select category_id, count(*) from category_product group by category_id;will show you how many category_id's there are and how many products are in each category. Having a lot of products in one category (or having a small amount of categories) can slow things down because the db can't use the index effectively.. which might be what you're seeing (hence why it's fast for some categories, slow for others).On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote:What's the correlation of category_id? The current index scan cost estimator places a heavy penalty on anything with a correlation much below about 90%. On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote:Hi, I have a query that is using a sequential scan instead of an indexscan. I've turned off sequential scans and it is in fact faster withthe index scan. Here's my before and after. Before: ssdev=# SET enable_seqscan TO DEFAULT; ssdev=# explain analyze select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; QUERY PLAN------------------------------------------------------------------- ------------------------------------------------------------------------ ----- ------------------------------ Hash Join (cost=25.52..52140.59 rows=5139 width=4) (actual time=4.521..2580.520 rows=19695 loops=1) Hash Cond: ("outer".product_id = "inner".product_id)-> Seq Scan on product_attribute_value pav (cost=0.00..40127.12rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846 loops=1) -> Hash (cost=23.10..23.10 rows=970 width=4) (actual time=2.267..2.267 rows=1140 loops=1)-> Index Scan using x_category_product__category_id_fk_idxon category_product cp (cost=0.00..23.10 rows=970 width=4) (actual time=0.122..1.395 rows=1140 loops=1) Index Cond: (category_id = 1001082) Total runtime: 2584.221 ms (7 rows) After: ssdev=# SET enable_seqscan TO false; ssdev=# explain analyze select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; QUERY PLAN------------------------------------------------------------------- ------------------------------------------------------------------------ ----- ------------------------------------- Nested Loop (cost=0.00..157425.22 rows=5139 width=4) (actual time=0.373..71.177 rows=19695 loops=1) -> Index Scan using x_category_product__category_id_fk_idx on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual time=0.129..1.438 rows=1140 loops=1) Index Cond: (category_id = 1001082) -> Index Scan using product_attribute_value__product_id_fk_idx on product_attribute_value pav (cost=0.00..161.51 rows=61 width=4) (actual time=0.016..0.053 rows=17 loops=1140) Index Cond: ("outer".product_id = pav.product_id) Total runtime: 74.747 ms (6 rows)There's quite a big difference in speed there. 2584.221 ms vs. 74.747ms. Any ideas what I can do to improve this without turning sequential scanning off? Thanks,___________________________________________________________________ _ Brendan Duddridge | CTO | 403-277-5591 x24 | brendan@xxxxxxxxxxxxxxClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com-- Jim C. Nasby, Sr. Engineering Consultant jnasby@xxxxxxxxxxxxx Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your message can get through to the mailing list cleanly-- Postgresql & php tutorials http://www.designmagick.com/
Attachment:
smime.p7s
Description: S/MIME cryptographic signature