Given the table: CREATE TABLE dates (id SERIAL, d DATE NOT NULL, t TEXT NOT NULL) With an *index* on field d. The following two queries are functionally equivalent: 1. SELECT * FROM dates WHERE d >= '1900-01-01' 2. SELECT * FROM dates WHERE EXTRACT(year from d) >= 1900' By functionally equivalent, they will return the same result set. Query 2 does not use the index, adding a performance cost. It seems there is an opportunity for optimization to handle these two queries equivalently to take advantage of the index. Some database abstraction layers have attempted to workaround this limitation by rewriting EXTRACT(year ...) queries into a query more like query 1. For example: Django's ORM does exctly this. Rather than all abstraction layers trying to optimize this case, maybe it could be pushed to the database layer. I have written a test script that demonstrates that these functionally equivalent queries have different performance characteristics. The script and results are provide below: RESULTS: ---- EXPLAIN SELECT * FROM dates WHERE d >= '1900-01-01' QUERY PLAN ---------------------------------------------------------------------------- Bitmap Heap Scan on dates (cost=9819.23..26390.15 rows=524233 width=40) Recheck Cond: (d >= '1900-01-01'::date) -> Bitmap Index Scan on d_idx (cost=0.00..9688.17 rows=524233 width=0) Index Cond: (d >= '1900-01-01'::date) (4 rows) EXPLAIN SELECT * FROM dates WHERE EXTRACT(year from d) >= 1900 QUERY PLAN ------------------------------------------------------------------------------------------------- Seq Scan on dates (cost=0.00..37540.25 rows=524233 width=40) Filter: (date_part('year'::text, (d)::timestamp without time zone) >= 1900::double precision) (2 rows) Timing select_without_extract: 284.233350s select_with_extract: 323.106491s ---- SCRIPT: ---- #!/usr/bin/python3 import datetime import subprocess import random import timeit import sys subprocess.check_call(['psql', 'postgres', '-c', 'DROP DATABASE IF EXISTS datetest'], stdout=subprocess.DEVNULL) subprocess.check_call(['psql', 'postgres', '-c', 'CREATE DATABASE datetest'], stdout=subprocess.DEVNULL) subprocess.check_call(['psql', 'datetest', '-c', 'CREATE TABLE dates (id SERIAL, d DATE NOT NULL, t TEXT NOT NULL)'], stdout=subprocess.DEVNULL) def chunks(n, l): i = 0 while i < len(l): yield l[i:i+n] i += n d = datetime.date(1800, 1, 1) today = datetime.date.today() values = [] while d < today: values.extend('(\'%s\', \'%s\')' % (d, d) for i in range(20)) d += datetime.timedelta(days=1) random.shuffle(values) for chunk in chunks(1000, values): s = ','.join(chunk) subprocess.check_call(['psql', 'datetest', '-c', 'INSERT INTO dates (d, t) VALUES %s' % s], stdout=subprocess.DEVNULL) subprocess.check_call(['psql', 'datetest', '-c', 'CREATE INDEX d_idx ON dates (d)'], stdout=subprocess.DEVNULL) print('EXPLAIN SELECT * FROM dates WHERE d >= \'1900-01-01\'') sys.stdout.flush() subprocess.check_call(['psql', 'datetest', '-c', 'EXPLAIN SELECT * FROM dates WHERE d >= \'1900-01-01\'']) print('EXPLAIN SELECT * FROM dates WHERE EXTRACT(year from d) >= 1900') sys.stdout.flush() subprocess.check_call(['psql', 'datetest', '-c', 'EXPLAIN SELECT * FROM dates WHERE EXTRACT(year from d) >= 1900']) def select_without_extract(): subprocess.check_call(['psql', 'datetest', '-c', 'SELECT * FROM dates WHERE d >= \'1900-01-01\''], stdout=subprocess.DEVNULL) def select_with_extract(): subprocess.check_call(['psql', 'datetest', '-c', 'SELECT * FROM dates WHERE EXTRACT(year from d) >= 1900'], stdout=subprocess.DEVNULL) print('Timing') sys.stdout.flush() v = timeit.timeit('select_without_extract()', setup='from __main__ import select_without_extract', number=100) print('select_without_extract: %fs' % v) sys.stdout.flush() v = timeit.timeit('select_with_extract()', setup='from __main__ import select_with_extract', number=100) print('select_with_extract: %fs' % v) sys.stdout.flush() --- -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance