Composite fields and the query planner

Steve Rogerson <steve.pg@xxxxxxxxxxxxxxxxx> · Sun, 05 May 2013 14:05:26 +0100

I'm seeing a problem with the query planner not doing what's expected, and I 
think it is because we are using composite fields. Here is a stripped down 
example.

create type type1 as ( part1 varchar, part2 varchar);
create table table1 (field1 type1, field2 varchar);
create function get_part1(type1) returns varchar as $$ select ($1).part1 $$ 
language sql;
create index i1 on table1 (get_part1(field1));
create index i2 on table1 (field2);

If we do

select * from table1 where get_part1(field1) > 'val';

it correctly uses index i1, but if I do

select * from table1 where get_part1(field1) > 'val1' and field2 = 'val2';

It - essentially - gets the hits from both indexes and then "ands" them 
together whatever the data distribution. In the case I am  interested in the 
data is actually a timestamp, and I am typically looking from things that have 
happened in the last little while. Typically I might get about 100 hits out of 
about 20,000,000 records. "field2" is column that could return up to 10% of 
the data. If we split field1 into two normal fields   and index the first part 
 naturally, the query planner correctly returns the recent ones and filters 
them out according to field2.

Naively, it looks to me that issue is that pg_stats table is holding data 
about the columns ( it seems even if the are not indexed )  and not storing 
information about the values actually stored in the index. As a result the 
query planner has no information about the data distribution so can't do it's job.

For many reason's we don't want to have to split up the data into separate 
parts, unless we really really have to.

Steve

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general