Search Postgresql Archives

Re: Postgresql selecting strange index for simple query

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tom Lane wrote:
Maxim Boguk <mboguk@xxxxxxxxxxxxx> writes:
So i have two theory (just waving hands ofcourse):
1)integer owerflow somewhere in cost calculation

Costs are floats, and in any case you're not showing costs anywhere near
the integer overflow limit...

2)floating rounding errors (because cost very close in wrong situations: 254918.19 vs 259709.09)

The planner is intentionally set up to consider costs within a percent
or so of each other as being effectively equal.  If the estimated costs
are that close then it doesn't surprise me if it sometimes picks the
"wrong" plan.  The real question is why are the estimates so close?
They should not be, since AFAICS you are talking about a situation
where we'd have to scan all of the multicol index versus only about
a fifth of the single-col one.

Ok i exploring more:

just one thing:

hh=# SHOW default_statistics_target ;
 default_statistics_target
---------------------------
 10
(1 row)
(btw increase statistic to 1000 do not fix situation).

I try simplify test case and:
Now use sequential user_id, and truncate last_change_time to date:

SELECT nextval('test_seq') as user_id,last_change_time::date ,rpad('a',500,'b') as f1 into test_table from resume;
ANALYZE test_table;
SELECT count(*) from test_table;
CREATE INDEX right_idx on test_table(last_change_time);
CREATE INDEX wrong_idx on test_table(user_id, last_change_time);
EXPLAIN ANALYZE SELECT 1 from test_table where last_change_time > '2009-01-10 00:00:00';

 Index Scan using wrong_idx on test_table  (cost=0.00..182623.51 rows=316522 width=0) (actual time=0.056..534.620 rows=382671 loops=1)
   Index Cond: (last_change_time > '2009-01-10'::date)

DROP INDEX wrong_idx;
EXPLAIN ANALYZE SELECT 1 from test_table where last_change_time > '2009-01-10 00:00:00';

 Index Scan using right_idx on test_table  (cost=0.00..221765.19 rows=316522 width=0) (actual time=0.023..346.213 rows=382671 loops=1)
   Index Cond: (last_change_time > '2009-01-10'::date)

Full index scan over wrong index cost reasonable lower then 1/17 of single column index  (182623 vs 221765)!

So just last_change_time still cannot be generated... but:

hh=# SELECT count(distinct last_change_time) from test_table;
 count
-------
  2133
(1 row)

And statistic values for last_change_time is:

hh=# SELECT * from pg_stats where tablename='test_table' and attname='last_change_time';
-[ RECORD 1 ]-----+---------------------------------------------------------------------------------------------------------------------------
schemaname        | public
tablename         | test_table
attname           | last_change_time
null_frac         | 0
avg_width         | 4
n_distinct        | 1211
most_common_vals  | {2004-02-27,2009-01-26,2008-03-11,2009-01-27,2004-01-15,2008-10-13,2009-01-19,2009-01-22,2009-01-21,2008-09-29}
most_common_freqs | {0.00833333,0.00766667,0.00733333,0.007,0.00666667,0.00533333,0.00533333,0.00533333,0.005,0.00466667}
histogram_bounds  | {2003-03-31,2005-08-26,2006-10-04,2007-04-29,2007-09-27,2008-01-24,2008-05-01,2008-07-29,2008-10-01,2008-11-27,2009-01-25}
correlation       | 0.261512

I think it is all what planner can use when choose plan... because user_id is unique sequential values.

regargs, Maxim Boguk







--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux