Sorry for double post, just ignore this post..
From: pgsql-general-owner@xxxxxxxxxxxxxx <pgsql-general-owner@xxxxxxxxxxxxxx> on behalf of Kim Rose Carlsen <krc@xxxxxxxx>
Sent: Thursday, October 27, 2016 6:34:58 PM To: pgsql-general@xxxxxxxxxxxxxx Subject: How to hint two columns IS NOT DISTINCT FROM each other in a join
Hi
I was wondering if there is a way to hint that two columns in two different tables IS NOT DISTINCT FROM each other. So that the optimizer may assume if table_a.key = 'test' THEN table_b.key = 'test' .
The equals operator already does this but it does not handle NULLS very well (or not at all). And it seems IS NOT DISTINCT FROM is not indexable and doesn't establish the same inference rules as equals.
Example:
CREATE TABLE a ( id INTEGER PRIMARY KEY, key VARCHAR, value VARCHAR
);
CREATE INDEX ON a (key);
INSERT INTO a VALUES (1, 'test', 'test'), (2, 'foo', 'bar'), (3, null, null), (4, 'baz', 'qoz');
CREATE VIEW view_a AS ( SELECT table_a.id, table_a.key, table_a.value, table_b.key as b_key FROM a AS table_a JOIN a AS table_b ON table_a.id = table_b.id );
CREATE VIEW view_a_eq AS ( SELECT table_a.id, table_a.key, table_a.value, table_b.key as b_key FROM a AS table_a JOIN a AS table_b ON table_a.id = table_b.id AND table_a.key = table_b.key ); CREATE VIEW view_a_distinct AS ( SELECT table_a.id, table_a.key, table_a.value, table_b.key as b_key FROM a AS table_a JOIN a AS table_b ON table_a.id = table_b.id AND table_a.key IS NOT DISTINCT FROM table_b.key ); QUERY PLAN
------------------------------------------------------------------------------------
Hash Join (cost=12.69..34.42 rows=4 width=100)
Hash Cond: (table_b.id = table_a.id)
-> Seq Scan on a table_b (cost=0.00..18.50 rows=850 width=36)
-> Hash (cost=12.64..12.64 rows=4 width=68)
-> Bitmap Heap Scan on a table_a (cost=4.18..12.64 rows=4 width=68)
Recheck Cond: ((key)::text = 'test'::text)
-> Bitmap Index Scan on a_key_idx (cost=0.00..4.18 rows=4 width=0)
Index Cond: ((key)::text = 'test'::text)
We only get index scan on table_a
EXPLAIN SELECT * FROM view_a_eq WHERE key
= 'test';
QUERY PLAN
------------------------------------------------------------------------------------
Nested Loop (cost=8.36..25.53 rows=1 width=100)
Join Filter: (table_a.id = table_b.id)
-> Bitmap Heap Scan on a table_a (cost=4.18..12.64 rows=4 width=68)
Recheck Cond: ((key)::text = 'test'::text)
-> Bitmap Index Scan on a_key_idx (cost=0.00..4.18 rows=4 width=0)
Index Cond: ((key)::text = 'test'::text)
-> Materialize (cost=4.18..12.66 rows=4 width=36)
-> Bitmap Heap Scan on a table_b (cost=4.18..12.64 rows=4 width=36)
Recheck Cond: ((key)::text = 'test'::text)
-> Bitmap Index Scan on a_key_idx (cost=0.00..4.18 rows=4 width=0)
Index Cond: ((key)::text = 'test'::text)
We get index scan on both tables and the where clause is pushed all the way down
EXPLAIN SELECT
* FROM view_a_distinct WHERE key = 'test';
QUERY PLAN
------------------------------------------------------------------------------------
Hash Join (cost=12.69..34.43 rows=1 width=100)
Hash Cond: (table_b.id = table_a.id)
Join Filter: (NOT ((table_a.key)::text IS DISTINCT FROM (table_b.key)::text))
-> Seq Scan on a table_b (cost=0.00..18.50 rows=850 width=36)
-> Hash (cost=12.64..12.64 rows=4 width=68)
-> Bitmap Heap Scan on a table_a (cost=4.18..12.64 rows=4 width=68)
Recheck Cond: ((key)::text = 'test'::text)
-> Bitmap Index Scan on a_key_idx (cost=0.00..4.18 rows=4 width=0)
Index Cond: ((key)::text = 'test'::text)
Same as the first example
In these examples it really doesn't matter which plan is used. But for larger view it might be a great hint for the optimizer to know that since we are joining on a primary key we could hint that all other columns for
the table is actually NOT DISTINCT from each other. This will result in the planner being able to push the WHERE condition down into the other joined tables.
It works well for the = operator, but it is easy to see the it fails as soon as we asks about nulls.
SELECT * FROM view_a_eq WHERE key IS NULL;
id | key | value | b_key
----+-----+-------+-------
(0 rows)
Maybe my examples are too simple, but I hope you can verify the same will hold for a much larger table with 2 different views on top of them.
|