extract(year from date) doesn't use index but maybe could?

Jon Dufresne <jon.dufresne@xxxxxxxxx> · Sun, 19 Apr 2015 10:16:29 -0700

Given the table:

CREATE TABLE dates (id SERIAL, d DATE NOT NULL, t TEXT NOT NULL)

With an *index* on field d. The following two queries are functionally
equivalent:

1. SELECT * FROM dates WHERE d >= '1900-01-01'
2. SELECT * FROM dates WHERE EXTRACT(year from d) >= 1900'

By functionally equivalent, they will return the same result set.

Query 2 does not use the index, adding a performance cost. It seems
there is an opportunity for optimization to handle these two queries
equivalently to take advantage of the index.

Some database abstraction layers have attempted to workaround this
limitation by rewriting EXTRACT(year ...) queries into a query more
like query 1. For example: Django's ORM does exctly this. Rather than
all abstraction layers trying to optimize this case, maybe it could be
pushed to the database layer.

I have written a test script that demonstrates that these functionally
equivalent queries have different performance characteristics. The
script and results are provide below:

RESULTS:

----
EXPLAIN SELECT * FROM dates WHERE d >= '1900-01-01'
                                 QUERY PLAN
----------------------------------------------------------------------------
 Bitmap Heap Scan on dates  (cost=9819.23..26390.15 rows=524233 width=40)
   Recheck Cond: (d >= '1900-01-01'::date)
   ->  Bitmap Index Scan on d_idx  (cost=0.00..9688.17 rows=524233 width=0)
         Index Cond: (d >= '1900-01-01'::date)
(4 rows)

EXPLAIN SELECT * FROM dates WHERE EXTRACT(year from d) >= 1900
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Seq Scan on dates  (cost=0.00..37540.25 rows=524233 width=40)
   Filter: (date_part('year'::text, (d)::timestamp without time zone)
>= 1900::double precision)
(2 rows)

Timing
select_without_extract: 284.233350s
select_with_extract: 323.106491s
----

SCRIPT:

----
#!/usr/bin/python3

import datetime
import subprocess
import random
import timeit
import sys

subprocess.check_call(['psql', 'postgres', '-c', 'DROP DATABASE IF
EXISTS datetest'], stdout=subprocess.DEVNULL)
subprocess.check_call(['psql', 'postgres', '-c', 'CREATE DATABASE
datetest'], stdout=subprocess.DEVNULL)
subprocess.check_call(['psql', 'datetest', '-c', 'CREATE TABLE dates
(id SERIAL, d DATE NOT NULL, t TEXT NOT NULL)'],
stdout=subprocess.DEVNULL)

def chunks(n, l):
    i = 0
    while i < len(l):
        yield l[i:i+n]
        i += n

d = datetime.date(1800, 1, 1)
today = datetime.date.today()
values = []
while d < today:
    values.extend('(\'%s\', \'%s\')' % (d, d) for i in range(20))
    d += datetime.timedelta(days=1)
random.shuffle(values)
for chunk in chunks(1000, values):
    s = ','.join(chunk)
    subprocess.check_call(['psql', 'datetest', '-c', 'INSERT INTO
dates (d, t) VALUES %s' % s], stdout=subprocess.DEVNULL)

subprocess.check_call(['psql', 'datetest', '-c', 'CREATE INDEX d_idx
ON dates (d)'], stdout=subprocess.DEVNULL)
print('EXPLAIN SELECT * FROM dates WHERE d >= \'1900-01-01\'')
sys.stdout.flush()
subprocess.check_call(['psql', 'datetest', '-c', 'EXPLAIN SELECT *
FROM dates WHERE d >= \'1900-01-01\''])
print('EXPLAIN SELECT * FROM dates WHERE EXTRACT(year from d) >= 1900')
sys.stdout.flush()
subprocess.check_call(['psql', 'datetest', '-c', 'EXPLAIN SELECT *
FROM dates WHERE EXTRACT(year from d) >= 1900'])

def select_without_extract():
    subprocess.check_call(['psql', 'datetest', '-c', 'SELECT * FROM
dates WHERE d >= \'1900-01-01\''], stdout=subprocess.DEVNULL)

def select_with_extract():
    subprocess.check_call(['psql', 'datetest', '-c', 'SELECT * FROM
dates WHERE EXTRACT(year from d) >= 1900'], stdout=subprocess.DEVNULL)

print('Timing')
sys.stdout.flush()

v = timeit.timeit('select_without_extract()', setup='from __main__
import select_without_extract', number=100)
print('select_without_extract: %fs' % v)
sys.stdout.flush()

v = timeit.timeit('select_with_extract()', setup='from __main__ import
select_with_extract', number=100)
print('select_with_extract: %fs' % v)
sys.stdout.flush()
---

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance