Setting Statistics on Functional Indexes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey everyone,

So recently we upgraded to 9.1 and have noticed a ton of our queries got much worse. It turns out that 9.1 is *way* more optimistic about our functional indexes, even when they're entirely the wrong path. So after going through the docs, I see that the normal way to increase stats is to alter columns directly on a table, or change the default_statistics_target itself.

But there doesn't seem to be any way to increase stats for a functional index unless you modify default_statistics_target. I did some testing, and for a particularly bad plan, we don't get a good result until the stats are at 5000 or higher. As you can imagine, this drastically increases our analyze time, and there's no way we're setting that system-wide.

I tested this by:

SET default_statistics_target = 5000;

ANALYZE my_table;

EXPLAIN SELECT [ugly query];

I only tested 1000, 2000, 3000, 4000, and 5000 before it switched plans. This is a 30M row table, and the "good" plan is 100x faster than the bad one. You can see this behavior yourself with this test case:

CREATE TABLE date_test (
  id SERIAL,
  col1 varchar,
  col2 numeric,
  action_date TIMESTAMP WITHOUT TIME ZONE
);

insert into date_test (col1, col2, action_date)
select 'S:' || ((random()*a.num)::int % 10000),
       (random()*a.num)::int % 15000,
       current_date - (random()*a.num)::int % 1000
  from generate_series(1,10000000) a(num);

create index idx_date_test_action_date_trunc
    on date_test (date_trunc('day', action_date));

create index idx_date_test_col1_col2
    on date_test (col1, col2);

explain analyze
select *
  from date_test
 where col1 IN ('S:96')
   and col2 = 657
   and date_trunc('day', action_date) >= '2012-10-24'
 order by id desc, action_date


This seems to cause the problem more consistently when using a value where col1 and col2 have no matches. In this particular example, I didn't get the good plan until using 1000 as the default stats target. It can't be a coincidence that there are 1000 distinct values in the table for that column, and we get a terrible plan until a statistic is recorded for each and every one in the functional index so it can exclude itself. This seems counter-intuitive to pg_stats with default stats at 500:

SELECT attname,n_distinct FROM pg_stats WHERE tablename='date_test';

   attname   | n_distinct
-------------+------------
 id          |         -1
 action_date |       1000
 col2        |      14999
 col1        |      10000

SELECT stadistinct FROM pg_statistic
 WHERE starelid='idx_date_test_col1_col2'::regclass

 stadistinct
-------------
        1000

Just on pure selectivity, it should prefer the index on col1 and col2. Anyway, we're getting all the devs to search out that particular functional index and eradicate it, but that will take a while to get through testing and deployment. The overriding problem seems to be two-fold:

1. Is there any way to specifically set stats on a functional index?
2. Why is the planner so ridiculously optimistic with functional indexes, even in the case of much higher selectivity as reported by pg_stats on the named columns?

Thanks!

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@xxxxxxxxxxxxxxxx

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux