Index usage with sub select or inner joins

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I am doing some performances testing on Postgres & I discovered the
following behavior, when using 2 different ways of writing selects (but
doing the same aggregations at the end):
1. test case 1, using outer join:
create table test2 as 
select
soj_session_log_id, pv_timestamp, vi_pv_id,a.item_id,
coalesce(sum(case when (bid_date<pv_timestamp and bid_date>=pv_timestamp -
INTERVAL '3 day') then 1 else 0 end)) as recent_sales_3d1,
coalesce(sum(case when (bid_date<pv_timestamp and bid_date>=pv_timestamp -
INTERVAL '7 day') then 1 else 0 end)) as recent_sales_7d1,
coalesce(sum(case when (bid_date<pv_timestamp and bid_date>=pv_timestamp -
INTERVAL '14 day') then 1 else 0 end)) as recent_sales_14d1,
coalesce(sum(case when (bid_date<pv_timestamp and bid_date>=pv_timestamp -
INTERVAL '30 day') then 1 else 0 end)) as recent_sales_30d1,
coalesce(sum(case when (bid_date<pv_timestamp and bid_date>=pv_timestamp -
INTERVAL '60 day') then 1 else 0 end)) as recent_sales_60d1
from bm_us_views_main_1609 a
left outer join bm_us_bids b on (b.item_id=a.item_id and
b.bid_date<a.pv_timestamp and (b.bid_date>=a.pv_timestamp - INTERVAL '60
day'))
where a.item_type in (7,9) and qty>1
group by soj_session_log_id, pv_timestamp, vi_pv_id, a.item_id;;

This query doesn't use any index according to the explain plan:
"HashAggregate  (cost=672109.07..683054.81 rows=182429 width=49)"
"  ->  Merge Left Join  (cost=646489.83..668004.42 rows=182429 width=49)"
"        Merge Cond: (a.item_id = b.item_id)"
"        Join Filter: ((b.bid_date < a.pv_timestamp) AND (b.bid_date >=
(a.pv_timestamp - '60 days'::interval)))"
"        ->  Sort  (cost=331768.62..332224.69 rows=182429 width=41)"
"              Sort Key: a.item_id"
"              ->  Seq Scan on bm_us_views_main_1609 a
(cost=0.00..315827.08 rows=182429 width=41)"
"                    Filter: ((item_type = ANY ('{7,9}'::numeric[])) AND
(qty > 1))"
"        ->  Sort  (cost=314669.01..320949.52 rows=2512205 width=19)"
"              Sort Key: b.item_id"
"              ->  Seq Scan on bm_us_bids b  (cost=0.00..47615.05
rows=2512205 width=19)"

2. Test case 2, using sub queries:
create table test2 as 
select
soj_session_log_id, pv_timestamp, vi_pv_id,item_id,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_date<pv_timestamp and bid_date>=pv_timestamp - INTERVAL '3 day' group by
item_id ),0) as recent_sales_3d,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_date<pv_timestamp and bid_date>=pv_timestamp - INTERVAL '7 day' group by
item_id ),0) as recent_sales_7d,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_date<pv_timestamp and bid_date>=pv_timestamp - INTERVAL '14 day' group
by item_id ),0) as recent_sales_14d,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_date<pv_timestamp and bid_date>=pv_timestamp - INTERVAL '30 day' group
by item_id ),0) as recent_sales_30d,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_date<pv_timestamp and bid_date>=pv_timestamp - INTERVAL '60 day' group
by item_id ),0) as recent_sales_60d
from bm_us_views_main_1609 a
where item_type in (7,9) and qty>1;

This query uses indexes  according to the explain plan:
"Seq Scan on bm_us_views_main_1609 a  (cost=0.00..8720230.77 rows=182429
width=41)"
"  Filter: ((item_type = ANY ('{7,9}'::numeric[])) AND (qty > 1))"
"  SubPlan"
"    ->  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)"
"          ->  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)"
"                Index Cond: ((item_id = $0) AND (bid_date < $1) AND
(bid_date >= ($1 - '60 days'::interval)))"
"    ->  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)"
"          ->  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)"
"                Index Cond: ((item_id = $0) AND (bid_date < $1) AND
(bid_date >= ($1 - '30 days'::interval)))"
"    ->  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)"
"          ->  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)"
"                Index Cond: ((item_id = $0) AND (bid_date < $1) AND
(bid_date >= ($1 - '14 days'::interval)))"
"    ->  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)"
"          ->  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)"
"                Index Cond: ((item_id = $0) AND (bid_date < $1) AND
(bid_date >= ($1 - '7 days'::interval)))"
"    ->  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)"
"          ->  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)"
"                Index Cond: ((item_id = $0) AND (bid_date < $1) AND
(bid_date >= ($1 - '3 days'::interval)))"

The index bm_us_bids_item_ix is on columns item_id, bidder_id, bid_date


QUESTION: Why the planner choose seq scan in the first case & indexes scan
in the second case? In a more general way, I observed that the planner has
difficulties to select index scans & does in almost all the cases seq scan,
when doing join queries. After investigations, it looks like when you join
table a with table b on a column x and y and you have an index on column x
only, the planner is not able to choose the index scan. You have to build
the index corresponding exactly to the join statement btw the 2 tables
 
For example,by creating an new index on item_id and bid_date, the planner
has been able to choose this last index in both cases. Would it be possible
that the planner can choose in any case the closest index for queries having
outer join

Last thing, I am running Postgres 8.3.4 on a Windows laptop having 3.5Gb
RAM, 161Gb disk and dual core 2.5Gz processor

Regards,
Julien Theulier

Attachment: postgresql.conf
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux