Re: extract(year from date) doesn't use index but maybe could?

Tom Lane <tgl@xxxxxxxxxxxxx> · Sun, 19 Apr 2015 17:33:11 -0400

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> writes:
> On 04/19/15 22:10, Jon Dufresne wrote:
>> My point is, why force the user to take these extra steps or add
>> overhead when the the two queries (or two indexes) are functionally
>> equivalent. Shouldn't this is an optimization handled by the
>> database so the user doesn't need to hand optimize these differences?

> Theoretically yes.

> But currently the "extract" function call is pretty much a black box for 
> the planner, just like any other function - it has no idea what happens 
> inside, what fields are extracted and so on. It certainly is unable to 
> infer the date range as you propose.

> It's possible that in the future someone will implement an optimization 
> like this, but I'm not aware of anyone working on that and I wouldn't 
> hold my breath.

Yeah.  In principle you could make the planner do this.  As Adam Williams
notes nearby, there's a problem with lack of exact consistency between
extract() semantics and straight timestamp comparisons; but you could
handle that by extracting indexable expressions that are considered lossy,
much as we do with anchored LIKE and regexp patterns.  The problem is that
this would add significant overhead to checking for indexable clauses.
With "x LIKE 'foo%'" you just need to make a direct check whether x is
an indexed column; this is exactly parallel to noting whether x is indexed
in "x >= 'foo'", and it doesn't require much additional machinery or
cycles to reject the common case that there's no match to x.  But if you
want to notice whether d is indexed in "extract(year from d) = 2015",
that requires digging down another level in the expression, so it's going
to add overhead that's not there now, even in cases that have nothing to
do with extract() let alone have any chance of benefiting.

We might still be willing to do it if there were a sufficiently wide range
of examples that could be handled by the same extra machinery, but this
doesn't look too promising from that angle: AFAICS only the "year" case
could yield a useful index restriction.

So the short answer is that whether it's worth saving users from
hand-optimizing such cases depends a lot on what it's going to cost in
added planning time for queries that don't get any benefit.  This example
doesn't look like a case that's going to win that cost/benefit tradeoff
comparison.

			regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance