Hey everyone,
So recently we upgraded to 9.1 and have noticed a ton of our queries got
much worse. It turns out that 9.1 is *way* more optimistic about our
functional indexes, even when they're entirely the wrong path. So after
going through the docs, I see that the normal way to increase stats is
to alter columns directly on a table, or change the
default_statistics_target itself.
But there doesn't seem to be any way to increase stats for a functional
index unless you modify default_statistics_target. I did some testing,
and for a particularly bad plan, we don't get a good result until the
stats are at 5000 or higher. As you can imagine, this drastically
increases our analyze time, and there's no way we're setting that
system-wide.
I tested this by:
SET default_statistics_target = 5000;
ANALYZE my_table;
EXPLAIN SELECT [ugly query];
I only tested 1000, 2000, 3000, 4000, and 5000 before it switched plans.
This is a 30M row table, and the "good" plan is 100x faster than the bad
one. You can see this behavior yourself with this test case:
CREATE TABLE date_test (
id SERIAL,
col1 varchar,
col2 numeric,
action_date TIMESTAMP WITHOUT TIME ZONE
);
insert into date_test (col1, col2, action_date)
select 'S:' || ((random()*a.num)::int % 10000),
(random()*a.num)::int % 15000,
current_date - (random()*a.num)::int % 1000
from generate_series(1,10000000) a(num);
create index idx_date_test_action_date_trunc
on date_test (date_trunc('day', action_date));
create index idx_date_test_col1_col2
on date_test (col1, col2);
explain analyze
select *
from date_test
where col1 IN ('S:96')
and col2 = 657
and date_trunc('day', action_date) >= '2012-10-24'
order by id desc, action_date
This seems to cause the problem more consistently when using a value
where col1 and col2 have no matches. In this particular example, I
didn't get the good plan until using 1000 as the default stats target.
It can't be a coincidence that there are 1000 distinct values in the
table for that column, and we get a terrible plan until a statistic is
recorded for each and every one in the functional index so it can
exclude itself. This seems counter-intuitive to pg_stats with default
stats at 500:
SELECT attname,n_distinct FROM pg_stats WHERE tablename='date_test';
attname | n_distinct
-------------+------------
id | -1
action_date | 1000
col2 | 14999
col1 | 10000
SELECT stadistinct FROM pg_statistic
WHERE starelid='idx_date_test_col1_col2'::regclass
stadistinct
-------------
1000
Just on pure selectivity, it should prefer the index on col1 and col2.
Anyway, we're getting all the devs to search out that particular
functional index and eradicate it, but that will take a while to get
through testing and deployment. The overriding problem seems to be two-fold:
1. Is there any way to specifically set stats on a functional index?
2. Why is the planner so ridiculously optimistic with functional
indexes, even in the case of much higher selectivity as reported by
pg_stats on the named columns?
Thanks!
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@xxxxxxxxxxxxxxxx
______________________________________________
See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance