Index use with left join

"Julian Scarfe" <julian@xxxxxxxxxxx> · Fri, 8 Apr 2005 10:05:14 +0100

In a 7.4 database I have a table n of which the relevant columns see to be:

                   Table "public.n"
   Column     |            Type             | Modifiers
---------------+-----------------------------+-----------
code          | text                        | not null
ref           | text                        | not null
part          | integer                     | not null
...
q_node        | point                       |
bbox          | box                         |

with indexes

Indexes:
   "n_pkey" primary key, btree (code, ref, part)
   "n_bbox" rtree (bbox)

Performance on simple spatial queries seem sensible.

1) Using the index

explain analyze

select  n.ref, n.code

       from n

       where bbox && box (point (-0.032, 0.873), point (0.017, 0.908))

                                                 QUERY PLAN

---------------------------------------------------------------------------------------------------------------

Index Scan using n_bbox on n  (cost=0.00..88.33 rows=22 width=20) (actual 
time=0.090..7.309 rows=396 loops=1)

  Index Cond: (bbox && '(0.017,0.908),(-0.032,0.873)'::box)

Total runtime: 7.713 ms

2) Filtering on a different criterion to force a sequential scan. (These 150 
rows are a subset of those from query 1)

explain analyze

select  n.ref, n.code

       from n

       where box (q_node, q_node)

               @ box (point (-0.032, 0.873), point (0.017, 0.908))

                                             QUERY PLAN

------------------------------------------------------------------------------------------------------

Seq Scan on n  (cost=0.00..9087.60 rows=5 width=20) (actual 
time=646.273..1135.774 rows=150 loops=1)

  Filter: (box(q_node, q_node) @ '(0.017,0.908),(-0.032,0.873)'::box)

Total runtime: 1136.919 ms

3) Combining the two, the strategy seems sensible

explain analyze

select  n.ref, n.code

       from n

       where bbox && box (point (-0.032, 0.873), point (0.017, 0.908))

       and box (q_node, q_node)

               @ box (point (-0.032, 0.873), point (0.017, 0.908))

                                                 QUERY PLAN

---------------------------------------------------------------------------------------------------------------

Index Scan using n_bbox on n  (cost=0.00..88.44 rows=1 width=20) (actual 
time=0.360..11.482 rows=150 loops=1)

  Index Cond: (bbox && '(0.017,0.908),(-0.032,0.873)'::box)

  Filter: (box(q_node, q_node) @ '(0.017,0.908),(-0.032,0.873)'::box)

Total runtime: 11.772 ms

So far so good.  Now I want to left join it with another table a (again, 
just the columns that appear relevant)

                   Table "public.a"
      Column        |         Type          | Modifiers
---------------------+-----------------------+-----------
ident               | character(4)          |
name                | character varying(30) |
...
Indexes:
   "a_ident" unique, btree (ident)

4) First with a filter for which there's an index, like query 1

explain analyze

select  n.ref, n.code, a.ident, a.name

       from n left outer join a on (a.ident = n.code)

       where bbox && box (point (-0.032, 0.873), point (0.017, 0.908))

                                                        QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------

Merge Left Join  (cost=1371.99..1449.68 rows=174 width=45) (actual 
time=182.082..211.555 rows=396 loops=1)

  Merge Cond: ("outer".code = "inner"."?column3?")

  ->  Sort  (cost=88.82..88.87 rows=22 width=20) (actual 
time=16.120..16.375 rows=396 loops=1)

        Sort Key: n.code

        ->  Index Scan using n_bbox on n  (cost=0.00..88.33 rows=22 
width=20) (actual time=0.213..11.893 rows=396 loops=1)

              Index Cond: (bbox && '(0.017,0.908),(-0.032,0.873)'::box)

  ->  Sort  (cost=1283.17..1308.44 rows=10105 width=25) (actual 
time=164.296..170.774 rows=10054 loops=1)

        Sort Key: (a.ident)::text

        ->  Seq Scan on a  (cost=0.00..611.05 rows=10105 width=25) (actual 
time=0.066..68.968 rows=10105 loops=1)

Total runtime: 214.777 ms

5) Now with a filter that forces a sequential scan

explain analyze

select  n.ref, n.code, a.ident, a.name

       from n left outer join a on (a.ident = n.code)

       where box (q_node, q_node)

               @ box (point (-0.032, 0.873), point (0.017, 0.908))

                                                   QUERY PLAN

-------------------------------------------------------------------------------------------------------------------

Merge Left Join  (cost=10370.84..10447.06 rows=40 width=45) (actual 
time=912.068..914.158 rows=150 loops=1)

  Merge Cond: ("outer".code = "inner"."?column3?")

  ->  Sort  (cost=9087.66..9087.68 rows=5 width=20) (actual 
time=728.757..728.847 rows=150 loops=1)

        Sort Key: n.code

        ->  Seq Scan on n  (cost=0.00..9087.60 rows=5 width=20) (actual 
time=314.466..726.479 rows=150 loops=1)

              Filter: (box(q_node, q_node) @ 
'(0.017,0.908),(-0.032,0.873)'::box)

  ->  Sort  (cost=1283.17..1308.44 rows=10105 width=25) (actual 
time=180.078..180.927 rows=1391 loops=1)

        Sort Key: (a.ident)::text

        ->  Seq Scan on a  (cost=0.00..611.05 rows=10105 width=25) (actual 
time=0.170..83.442 rows=10105 loops=1)

Total runtime: 917.066 ms

Again, so far, nothing obviously unusual.  Now I combine the filters in 4 & 
5 (as I did from 1 & 2 to get 3)

6) Now I combine the filters in 4 & 5 (as I did from 1 & 2 to get 3, which 
performed in a similar time to 1)

explain analyze

select  n.ref, n.code, a.ident, a.name

       from n left outer join a on (a.ident = n.code)

       where bbox && box (point (-0.032, 0.873), point (0.017, 0.908))

       and box (q_node, q_node)

               @ box (point (-0.032, 0.873), point (0.017, 0.908))

                                                    QUERY PLAN

---------------------------------------------------------------------------------------------------------------------

Nested Loop Left Join  (cost=0.00..851.06 rows=8 width=45) (actual 
time=11.662..7919.946 rows=150 loops=1)

  Join Filter: (("inner".ident)::text = "outer".code)

  ->  Index Scan using n_bbox on n  (cost=0.00..88.44 rows=1 width=20) 
(actual time=0.107..10.256 rows=150 loops=1)

        Index Cond: (bbox && '(0.017,0.908),(-0.032,0.873)'::box)

        Filter: (box(q_node, q_node) @ '(0.017,0.908),(-0.032,0.873)'::box)

  ->  Seq Scan on a  (cost=0.00..611.05 rows=10105 width=25) (actual 
time=0.006..18.044 rows=10105 loops=150)

Total runtime: 7920.684 ms

Whoa!  Instead of a performance similar to query 4, it chooses a different 
strategy, and takes 40 times as long. (Both tables just analyzed.)

By brute force:

set enable_nestloop to off;

explain analyze

select  n.ref, n.code, a.ident, a.name

       from n left outer join a on (a.ident = n.code)

       where bbox && box (point (-0.032, 0.873), point (0.017, 0.908))

       and box (q_node, q_node)

               @ box (point (-0.032, 0.873), point (0.017, 0.908))

                                                       QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------

Merge Left Join  (cost=1371.62..1447.50 rows=8 width=45) (actual 
time=177.273..179.341 rows=150 loops=1)

  Merge Cond: ("outer".code = "inner"."?column3?")

  ->  Sort  (cost=88.45..88.45 rows=1 width=20) (actual time=8.452..8.538 
rows=150 loops=1)

        Sort Key: n.code

        ->  Index Scan using n_bbox on n  (cost=0.00..88.44 rows=1 
width=20) (actual time=0.109..7.031 rows=150 loops=1)

              Index Cond: (bbox && '(0.017,0.908),(-0.032,0.873)'::box)

              Filter: (box(q_node, q_node) @ 
'(0.017,0.908),(-0.032,0.873)'::box)

  ->  Sort  (cost=1283.17..1308.44 rows=10105 width=25) (actual 
time=165.520..166.348 rows=1391 loops=1)

        Sort Key: (a.ident)::text

        ->  Seq Scan on a  (cost=0.00..611.05 rows=10105 width=25) (actual 
time=0.042..69.560 rows=10105 loops=1)

Total runtime: 182.275 ms

What's happening here, please? How am I misleading the planner? Is it 
because the index is rtree?

Yes, I should consider PostGIS for spatial stuff, but I've got what I've got 
:-).

TIA

Julian Scarfe 

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings