Hi list. I have a question about the different plans produced by postgres for an outer join versus an inner join. (complete sql script attached) Take these two tables: CREATE TABLE album ( id SERIAL PRIMARY KEY, title TEXT NOT NULL ); CREATE INDEX album_title ON album (title); CREATE TABLE track ( id SERIAL PRIMARY KEY, album_id INT NOT NULL REFERENCES album(id), title TEXT NOT NULL ); CREATE INDEX track_album ON track(album_id); CREATE INDEX track_title ON track(title); where crucially `track` references `album(id)` with a `NOT NULL` reference. Now, if we query both an inner and an outer join on `track` and `album` (after having suitably filled-in data), we get very different plans where only the inner join exploits indices. That is: EXPLAIN ANALYZE SELECT t.id FROM track t INNER JOIN album a ON (t.album_id = a.id) ORDER BY a.title ASC LIMIT 10; Produces this query plan: QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.56..3.29 rows=10 width=36) (actual time=0.038..0.052 rows=10 loops=1) -> Nested Loop (cost=0.56..3606.93 rows=13200 width=36) (actual time=0.036..0.046 rows=10 loops=1) -> Index Scan using album_title on album a (cost=0.28..113.23 rows=1397 width=36) (actual time=0.015..0.016 rows=1 loops=1) -> Index Scan using track_album on track t (cost=0.29..1.84 rows=66 width=8) (actual time=0.012..0.018 rows=10 loops=1) Index Cond: (album_id = a.id) Planning Time: 0.473 ms Execution Time: 0.096 ms (7 rows) While this: EXPLAIN ANALYZE SELECT t.id FROM track t LEFT JOIN album a ON (t.album_id = a.id) ORDER BY a.title ASC LIMIT 10; Produces this query plan: QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Limit (cost=604.43..604.45 rows=10 width=36) (actual time=20.934..20.943 rows=10 loops=1) -> Sort (cost=604.43..637.43 rows=13200 width=36) (actual time=20.932..20.934 rows=10 loops=1) Sort Key: a.title Sort Method: top-N heapsort Memory: 27kB -> Hash Left Join (cost=42.43..319.18 rows=13200 width=36) (actual time=1.082..12.333 rows=10000 loops=1) Hash Cond: (t.album_id = a.id) -> Seq Scan on track t (cost=0.00..242.00 rows=13200 width=8) (actual time=0.031..3.919 rows=10000 loops=1) -> Hash (cost=24.97..24.97 rows=1397 width=36) (actual time=0.990..0.991 rows=1000 loops=1) Buckets: 2048 Batches: 1 Memory Usage: 101kB -> Seq Scan on album a (cost=0.00..24.97 rows=1397 width=36) (actual time=0.014..0.451 rows=1000 loops=1) Planning Time: 0.251 ms Execution Time: 20.999 ms (12 rows) My question then is, shouldn't the inner and outer join queries be semantically equivalent when the columns we are joining on are non-nullable foreign keys? Is there some corner case I'm not considering? Would it be a good addition to postgres if it could detect this and produce a plan that exploits the indices? (My root motivation for asking this question is this github issue: https://github.com/hasura/graphql-engine/issues/5949) Regards, Philip
Attachment:
outer-join-indexes.sql
Description: application/sql