Proximity query with GIST and row estimation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Following the work on Mark Stosberg on this list (thanks Mark!), I
optimized our slow proximity queries by using cube, earthdistance
(shipped with contrib) and a gist index. The result is globally very
interesting apart for a specific query and we'd like to be able to fix
it too to be more consistent (it's currently faster with a basic
distance calculation based on acos, cos and so on but it's slow
anyway).

The problem is that we have sometimes very few places near a given
location (small city) and sometimes a lot of them (in Paris, Bruxelles
and so on - it's the case we have here). The gist index I created
doesn't estimate the number of rows in the area very well.

Table: lieu (100k rows) with wgslat and wgslon as numeric
Table: lieugelieu (200k rows, 1k with codegelieu = 'PKG')
Index: "idx_lieu_earth" gist (ll_to_earth(wgslat::double precision,
wgslon::double precision))

The simplified query is:
SELECT DISTINCT l.numlieu, l.nomlieu, ROUND
(earth_distance(ll_to_earth(48.85957600, 2.34860800),
ll_to_earth(l.wgslat, l.wgslon))) as dist
	FROM lieu l, lieugelieu lgl
	WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600,
2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu =
l.numlieu ORDER BY dist ASC LIMIT 2;
It's used to find the nearest car parks from a given location.

The plan is attached plan_earthdistance_nestedloop.txt. It uses a
nested loop because the row estimate is pretty bad: (cost=0.00..3.38
rows=106 width=0) (actual time=30.229..30.229 rows=5864 loops=1).

If I disable the nested loop, the plan is different and faster (see
plan_earthdistance_hash.txt attached).

Is there any way to improve this estimation? I tried to set the
statistics of wgslat and wgslon higher but it doesn't change anything
(I don't know if the operator is designed to use the statistics).

Any other idea to optimize this query is very welcome too.

--
Guillaume
db=# explain analyze SELECT DISTINCT l.numlieu, l.nomlieu, ROUND (earth_distance(ll_to_earth(48.85957600, 2.34860800), ll_to_earth(l.wgslat, l.wgslon))) as dist
db-# FROM lieu l, lieugelieu lgl
db-# WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600, 2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu = l.numlieu ORDER BY dist ASC LIMIT 2;
                                                                                                                    QUERY PLAN                                                                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=626.84..626.85 rows=1 width=51) (actual time=449.287..449.298 rows=2 loops=1)
   ->  Unique  (cost=626.84..626.85 rows=1 width=51) (actual time=449.283..449.290 rows=2 loops=1)
         ->  Sort  (cost=626.84..626.84 rows=1 width=51) (actual time=449.278..449.279 rows=2 loops=1)
               Sort Key: round(sec_to_gc(cube_distance('(4192714.86111655, 171959.656483755, 4803394.52951123)'::cube, (ll_to_earth((l.wgslat)::double precision, (l.wgslon)::double precision))::cube))), l.numlieu, l.nomlieu
               ->  Nested Loop  (cost=3.38..626.83 rows=1 width=51) (actual time=258.877..448.651 rows=78 loops=1)
                     ->  Bitmap Heap Scan on lieu l  (cost=3.38..201.34 rows=106 width=51) (actual time=32.988..60.197 rows=5786 loops=1)
                           Recheck Cond: ('(4190964.86112204, 170209.656489245, 4801644.52951672),(4194464.86111106, 173709.656478266, 4805144.52950574)'::cube @ (ll_to_earth((wgslat)::double precision, (wgslon)::double precision))::cube)
                           ->  Bitmap Index Scan on idx_lieu_earth  (cost=0.00..3.38 rows=106 width=0) (actual time=30.229..30.229 rows=5864 loops=1)
                                 Index Cond: ('(4190964.86112204, 170209.656489245, 4801644.52951672),(4194464.86111106, 173709.656478266, 4805144.52950574)'::cube @ (ll_to_earth((wgslat)::double precision, (wgslon)::double precision))::cube)
                     ->  Index Scan using idx_lieugelieu_codegelieu_numlieu_principal on lieugelieu lgl  (cost=0.00..4.00 rows=1 width=4) (actual time=0.052..0.052 rows=0 loops=5786)
                           Index Cond: (((lgl.codegelieu)::text = 'PKG'::text) AND (lgl.numlieu = "outer".numlieu))
 Total runtime: 449.607 ms
(12 rows)
db=# explain analyze SELECT DISTINCT l.numlieu, l.nomlieu, ROUND (earth_distance(ll_to_earth(48.85957600, 2.34860800), ll_to_earth(l.wgslat, l.wgslon))) as dist
db-# FROM lieu l, lieugelieu lgl
db-# WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600, 2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu = l.numlieu ORDER BY dist ASC LIMIT 2;
                                                                                                                       QUERY PLAN                                                                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=787.07..787.09 rows=1 width=51) (actual time=168.623..168.635 rows=2 loops=1)
   ->  Unique  (cost=787.07..787.09 rows=1 width=51) (actual time=168.619..168.627 rows=2 loops=1)
         ->  Sort  (cost=787.07..787.08 rows=1 width=51) (actual time=168.614..168.615 rows=2 loops=1)
               Sort Key: round(sec_to_gc(cube_distance('(4192714.86111655, 171959.656483755, 4803394.52951123)'::cube, (ll_to_earth((l.wgslat)::double precision, (l.wgslon)::double precision))::cube))), l.numlieu, l.nomlieu
               ->  Hash Join  (cost=206.13..787.06 rows=1 width=51) (actual time=102.005..167.992 rows=78 loops=1)
                     Hash Cond: ("outer".numlieu = "inner".numlieu)
                     ->  Bitmap Heap Scan on lieugelieu lgl  (cost=4.52..583.26 rows=435 width=4) (actual time=1.137..3.909 rows=1095 loops=1)
                           Recheck Cond: ((codegelieu)::text = 'PKG'::text)
                           ->  Bitmap Index Scan on idx_lieugelieu_codegelieu_numlieu_principal  (cost=0.00..4.52 rows=435 width=0) (actual time=1.030..1.030 rows=1095 loops=1)
                                 Index Cond: ((codegelieu)::text = 'PKG'::text)
                     ->  Hash  (cost=201.34..201.34 rows=106 width=51) (actual time=97.807..97.807 rows=5786 loops=1)
                           ->  Bitmap Heap Scan on lieu l  (cost=3.38..201.34 rows=106 width=51) (actual time=32.921..76.736 rows=5786 loops=1)
                                 Recheck Cond: ('(4190964.86112204, 170209.656489245, 4801644.52951672),(4194464.86111106, 173709.656478266, 4805144.52950574)'::cube @ (ll_to_earth((wgslat)::double precision, (wgslon)::double precision))::cube)
                                 ->  Bitmap Index Scan on idx_lieu_earth  (cost=0.00..3.38 rows=106 width=0) (actual time=30.144..30.144 rows=5864 loops=1)
                                       Index Cond: ('(4190964.86112204, 170209.656489245, 4801644.52951672),(4194464.86111106, 173709.656478266, 4805144.52950574)'::cube @ (ll_to_earth((wgslat)::double precision, (wgslon)::double precision))::cube)
 Total runtime: 169.140 ms
(16 rows)

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux