Re: : PostgreSQL Index behavior

Venkat Balaji <venkat.balaji@xxxxxxxx> · Wed, 12 Sep 2012 12:27:55 +0530

Thank you Jeff !
My comments are inline.

> explain  SELECT tv.short_code, tv.chn as pkg_subscription_chn,

>                tv.vert as pkg_vert, ubs.campaign_id as campaign,

> 'none'::varchar as referer,

>                CAST('CAMPAIGNWISE_SUBSCRIBER_BASE' AS VARCHAR) as vn,

> count(tv.msisdn) as n_count, '0'::numeric AS tot_revenue

>                FROM campaign_base ubs

>                JOIN tab_current_day_v2 tv

>                ON ubs.ubs_seq_id = tv.ubs_seq_id

>              AND tv.dt = CAST('2012-09-08' AS DATE)

>              GROUP BY tv.short_code, tv.vert, tv.chn, ubs.campaign_id, vn;

...

>

> The above plan shows "seq scan" on tab_current_day_v2 table, though there is

> an index on "ubs_seq_id" column which is an unique column.

>

> Can anyone please help us understand, why PostgreSQL optimizer is not

> prioritizing the unique column and hitting ubs_seq_id_idx Index here ?

The query where clause does not specify a constant value for

ubs_seq_id.  So it is likely that the only way to use that index would

be to reverse the order of the nested loop and seq scan the other

table.  Is there any reason to think that doing that would be faster?

I believe, I missed an important point here. Yes, since the constant value is not provided for ubs_seq_id, Index scan is not a prime preference. Makes sense. Further analysis is explained below.

> Later -

>

> We have created composite Index on "dt" (one distinct value) and

> "ubs_seq_id" (no duplicate values) and the index has been picked up.

Postgres seems to think that "dt" has no duplicate values, the

opposite of having one distinct value.

That is based on the estimates given in the explain plan, that teh seq

scan will return only one row after the filter on Filter: "(dt =

'2012-09-08'::date)".   This does seem to conflict with what you

report from pg_stats, but I'm not familiar with that view, and you

haven't told us what version of pgsql you are using.

We are using PostgreSQL-9.0.1. 

Yes, "dt" has one distinct value all the time is generated on daily basis.

"2012-09-08" is an non-existent value, so, Postgres seems to think there are no duplicates.

If i pass on the value which is existing in the table "2012-09-11", PostgreSQL optimizer is picking up "Seq Scan" (what ever Index is existing).

In our scenario, we cannot expect an Index scan to happen, because I believe, following are the reasons -

ubs_seq_id column in campaign_base table has 1.2 m rows -- all distinct

ubs_seq_id column in tab_current_day_v2 table has 1.9 m rows -- all distinct
dt has only 1 distinct value.

All being used with AND operator, extracted rows will be minimum 1.2 m. So, I believe, "seq scan" is the best choice PG is opting for.

I got the point. Thanks !

Regards,
Venkat

DISCLAIMER:

Please note that this message and
any attachments may contain confidential and proprietary material and
information and are intended only for the use of the intended recipient(s). If
you are not the intended recipient, you are hereby notified that any review,
use, disclosure, dissemination, distribution or copying of this message and any
attachments is strictly prohibited. If you have received this email in error,
please immediately notify the sender and delete this e-mail , whether
electronic or printed. Please also note that any views, opinions, conclusions
or commitments expressed in this message are those of the individual sender and
do not necessarily reflect the views of Ver sé Innovation Pvt Ltd.