Re: Possible to improve optimisation / index usage based on domain properties of a function

Sameer Kumar <sameer.kumar@xxxxxxxxxx> · Thu, 20 Feb 2014 14:21:44 +0800

Hi all,

Take the following scenario

I have a set of partitions inherited from a parent table, called streams.

One of the properties of these tables is a timestamp field, nothing fancy about it.

I also have a qualified index on this field.

I’ve noticed that if I perform the following query, the planner will correctly use the CHECK constraints to determine the partition, and then use the indexes available to retrieve the streams between the specified dates.

select count(*) from streams where stream_date >= ‘2013-01-08’ and stream_date < ‘2013-01-09’;

This is correct way of writing such queries.

If however, I was to provide the below query, it uses a sequential scan based plan.  The planner is unable to utilise any indexes because it can’t know what the function is going to return – thus unable to constrain the range at the time of planning the execution.

select count(*) from streams where date(stream_date) = ‘2013-01-08’;

This will not use index on stream_date. Nothing unusual in this. Nothing special with PostgreSQL. None of the RDBMS I have dealt with are smart enough to transform this query.

I’m wondering if we could build into postgres some level of metadata regarding the properties of a function, such that the optimiser could filter against the range of values that the function is expected to return.

In this case, it could deduce that the date function will only ever return a value for stream_date to within a certain maximum and minimum range.
Thus the planner could scan the index for all values of stream_date falling within +/- 24 hours of the right operand, and then check/re-check the results.

If you can't go for the smarter query, go for more optimum index by "_expression_ based index"

http://www.postgresql.org/docs/9.1/static/indexes-expressional.html

I suspect this would only be suitable for very basic functions, such as date(), date_trunc() - I suspect, for any function that reduces cardinality to any predictable degree.

Wrong, there are many expressions which won't use indexes but the moment you shift the calculation from LHS to RHS, indexes will appear in plan. This I have seen with at least 3 other RDBMS.

Thoughts?

I don't think the RDBMS optimizer should be overloaded with smartness which is expected from users writing it. If you do it, then there is no end to it.

Tim