I'm seeing a problem with the query planner not doing what's expected, and I
think it is because we are using composite fields. Here is a stripped down
example.
create type type1 as ( part1 varchar, part2 varchar);
create table table1 (field1 type1, field2 varchar);
create function get_part1(type1) returns varchar as $$ select ($1).part1 $$
language sql;
create index i1 on table1 (get_part1(field1));
create index i2 on table1 (field2);
If we do
select * from table1 where get_part1(field1) > 'val';
it correctly uses index i1, but if I do
select * from table1 where get_part1(field1) > 'val1' and field2 = 'val2';
It - essentially - gets the hits from both indexes and then "ands" them
together whatever the data distribution. In the case I am interested in the
data is actually a timestamp, and I am typically looking from things that have
happened in the last little while. Typically I might get about 100 hits out of
about 20,000,000 records. "field2" is column that could return up to 10% of
the data. If we split field1 into two normal fields and index the first part
naturally, the query planner correctly returns the recent ones and filters
them out according to field2.
Naively, it looks to me that issue is that pg_stats table is holding data
about the columns ( it seems even if the are not indexed ) and not storing
information about the values actually stored in the index. As a result the
query planner has no information about the data distribution so can't do it's job.
For many reason's we don't want to have to split up the data into separate
parts, unless we really really have to.
Steve
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general