Setting Statistics on Functional Indexes

Shaun Thomas <sthomas@xxxxxxxxxxxxxxxx> · Wed, 24 Oct 2012 11:55:15 -0500

Hey everyone,

So recently we upgraded to 9.1 and have noticed a ton of our queries got 
much worse. It turns out that 9.1 is *way* more optimistic about our 
functional indexes, even when they're entirely the wrong path. So after 
going through the docs, I see that the normal way to increase stats is 
to alter columns directly on a table, or change the 
default_statistics_target itself.

But there doesn't seem to be any way to increase stats for a functional 
index unless you modify default_statistics_target. I did some testing, 
and for a particularly bad plan, we don't get a good result until the 
stats are at 5000 or higher. As you can imagine, this drastically 
increases our analyze time, and there's no way we're setting that 
system-wide.

I tested this by:

SET default_statistics_target = 5000;

ANALYZE my_table;

EXPLAIN SELECT [ugly query];

I only tested 1000, 2000, 3000, 4000, and 5000 before it switched plans. 
This is a 30M row table, and the "good" plan is 100x faster than the bad 
one. You can see this behavior yourself with this test case:

CREATE TABLE date_test (
  id SERIAL,
  col1 varchar,
  col2 numeric,
  action_date TIMESTAMP WITHOUT TIME ZONE
);

insert into date_test (col1, col2, action_date)
select 'S:' || ((random()*a.num)::int % 10000),
       (random()*a.num)::int % 15000,
       current_date - (random()*a.num)::int % 1000
  from generate_series(1,10000000) a(num);

create index idx_date_test_action_date_trunc
    on date_test (date_trunc('day', action_date));

create index idx_date_test_col1_col2
    on date_test (col1, col2);

explain analyze
select *
  from date_test
 where col1 IN ('S:96')
   and col2 = 657
   and date_trunc('day', action_date) >= '2012-10-24'
 order by id desc, action_date

This seems to cause the problem more consistently when using a value 
where col1 and col2 have no matches. In this particular example, I 
didn't get the good plan until using 1000 as the default stats target. 
It can't be a coincidence that there are 1000 distinct values in the 
table for that column, and we get a terrible plan until a statistic is 
recorded for each and every one in the functional index so it can 
exclude itself. This seems counter-intuitive to pg_stats with default 
stats at 500:

SELECT attname,n_distinct FROM pg_stats WHERE tablename='date_test';

   attname   | n_distinct
-------------+------------
 id          |         -1
 action_date |       1000
 col2        |      14999
 col1        |      10000

SELECT stadistinct FROM pg_statistic
 WHERE starelid='idx_date_test_col1_col2'::regclass

 stadistinct
-------------
        1000

Just on pure selectivity, it should prefer the index on col1 and col2. 
Anyway, we're getting all the devs to search out that particular 
functional index and eradicate it, but that will take a while to get 
through testing and deployment. The overriding problem seems to be two-fold:

1. Is there any way to specifically set stats on a functional index?
2. Why is the planner so ridiculously optimistic with functional 
indexes, even in the case of much higher selectivity as reported by 
pg_stats on the named columns?

Thanks!

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@xxxxxxxxxxxxxxxx

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance