here they are: (I replace the column and table names) also I post 2 more remarks, one on left join, and one on the test I did on postgres 15 postgis 3.3...
2023=# explain (analyze, buffers) select A.x, count(B.x) from A left join B on st_within(B.geom, A.geom) group by A.x;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=212638398.98..212701792.16 rows=20 width=16) (actual time=86717.857..86757.820 rows=11 loops=1)
Group Key: A.x
Buffers: shared hit=4243867
-> Sort (cost=212638398.98..212659529.97 rows=8452398 width=16) (actual time=86717.851..86727.334 rows=421307 loops=1)
Sort Key: A.x
Sort Method: quicksort Memory: 37963kB
Buffers: shared hit=4243867
-> Nested Loop Left Join (cost=0.00..211521459.31 rows=8452398 width=16) (actual time=17.473..86642.332 rows=421307 loops=1)
Join Filter: st_within(B.geom, A.geom)
Rows Removed by Join Filter: 4229377
Buffers: shared hit=4243867
-> Seq Scan on A (cost=0.00..9.20 rows=20 width=17752) (actual time=0.009..0.043 rows=11 loops=1)
Buffers: shared hit=9
-> Materialize (cost=0.00..22309.83 rows=422789 width=40) (actual time=0.001..23.392 rows=422789 loops=11)
Buffers: shared hit=15968
-> Seq Scan on B (cost=0.00..20195.89 rows=422789 width=40) (actual time=0.006..57.651 rows=422789 loops=1)
Buffers: shared hit=15968
Planning Time: 0.693 ms
Execution Time: 86763.087 ms
(19 lignes)
2023=# explain (analyze, buffers) select A.x, count(B.x) from A, B where st_within(B.geom, A.geom) group by A.x;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=6301606.00..6301608.60 rows=20 width=16) (actual time=11857.363..11863.212 rows=6 loops=1)
Group Key: A.x
Buffers: shared hit=2128836
-> Gather Merge (cost=6301606.00..6301608.30 rows=20 width=16) (actual time=11857.359..11863.207 rows=12 loops=1)
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=2128836
-> Sort (cost=6300605.99..6300606.04 rows=20 width=16) (actual time=11840.355..11840.356 rows=6 loops=2)
Sort Key: A.x
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Buffers: shared hit=2128836
-> Partial HashAggregate (cost=6300605.36..6300605.56 rows=20 width=16) (actual time=11840.331..11840.332 rows=6 loops=2)
Group Key: A.x
Buffers: shared hit=2128825
-> Nested Loop (cost=0.13..6275745.36 rows=4971999 width=16) (actual time=0.505..11781.817 rows=210651 loops=2)
Buffers: shared hit=2128825
-> Parallel Seq Scan on B (cost=0.00..18454.99 rows=248699 width=40) (actual time=0.005..22.859 rows=211395 loops=2)
Buffers: shared hit=15968
-> Index Scan using A_geom_idx on A (cost=0.13..25.15 rows=1 width=17752) (actual time=0.054..0.055 rows=1 loops=422789)
Index Cond: (geom ~ B.geom)
Filter: st_within(B.geom, geom)
Rows Removed by Filter: 0
Buffers: shared hit=2112857
Planning Time: 0.252 ms
Execution Time: 11863.357 ms
(26 lignes)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=212638398.98..212701792.16 rows=20 width=16) (actual time=86717.857..86757.820 rows=11 loops=1)
Group Key: A.x
Buffers: shared hit=4243867
-> Sort (cost=212638398.98..212659529.97 rows=8452398 width=16) (actual time=86717.851..86727.334 rows=421307 loops=1)
Sort Key: A.x
Sort Method: quicksort Memory: 37963kB
Buffers: shared hit=4243867
-> Nested Loop Left Join (cost=0.00..211521459.31 rows=8452398 width=16) (actual time=17.473..86642.332 rows=421307 loops=1)
Join Filter: st_within(B.geom, A.geom)
Rows Removed by Join Filter: 4229377
Buffers: shared hit=4243867
-> Seq Scan on A (cost=0.00..9.20 rows=20 width=17752) (actual time=0.009..0.043 rows=11 loops=1)
Buffers: shared hit=9
-> Materialize (cost=0.00..22309.83 rows=422789 width=40) (actual time=0.001..23.392 rows=422789 loops=11)
Buffers: shared hit=15968
-> Seq Scan on B (cost=0.00..20195.89 rows=422789 width=40) (actual time=0.006..57.651 rows=422789 loops=1)
Buffers: shared hit=15968
Planning Time: 0.693 ms
Execution Time: 86763.087 ms
(19 lignes)
2023=# explain (analyze, buffers) select A.x, count(B.x) from A, B where st_within(B.geom, A.geom) group by A.x;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=6301606.00..6301608.60 rows=20 width=16) (actual time=11857.363..11863.212 rows=6 loops=1)
Group Key: A.x
Buffers: shared hit=2128836
-> Gather Merge (cost=6301606.00..6301608.30 rows=20 width=16) (actual time=11857.359..11863.207 rows=12 loops=1)
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=2128836
-> Sort (cost=6300605.99..6300606.04 rows=20 width=16) (actual time=11840.355..11840.356 rows=6 loops=2)
Sort Key: A.x
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Buffers: shared hit=2128836
-> Partial HashAggregate (cost=6300605.36..6300605.56 rows=20 width=16) (actual time=11840.331..11840.332 rows=6 loops=2)
Group Key: A.x
Buffers: shared hit=2128825
-> Nested Loop (cost=0.13..6275745.36 rows=4971999 width=16) (actual time=0.505..11781.817 rows=210651 loops=2)
Buffers: shared hit=2128825
-> Parallel Seq Scan on B (cost=0.00..18454.99 rows=248699 width=40) (actual time=0.005..22.859 rows=211395 loops=2)
Buffers: shared hit=15968
-> Index Scan using A_geom_idx on A (cost=0.13..25.15 rows=1 width=17752) (actual time=0.054..0.055 rows=1 loops=422789)
Index Cond: (geom ~ B.geom)
Filter: st_within(B.geom, geom)
Rows Removed by Filter: 0
Buffers: shared hit=2112857
Planning Time: 0.252 ms
Execution Time: 11863.357 ms
(26 lignes)
On Sat, Jan 7, 2023 at 9:40 PM Erik Wienhold <ewie@xxxxxxxxx> wrote:
> On 07/01/2023 20:46 CET Marc Millas <marc.millas@xxxxxxxxxx> wrote:
>
> Hi,
>
> postgres 12, postgis 3.0
>
> I have a small table A, 11 rows with a varchar column x and a geometry column y.
> gist index on the geometry column.
> the geometry do contains multipolygons (regions on a map)
> I have a second table B , same structure, around 420 000 rows.
> no index,
> the geometry do contains points.
> all geometries are on 4326 srid.
>
> If i ask to count points in each multipolygons:
>
> select A.x, count(B.x) from A, B where st_within(B.y, A.y) group by A.x;
> it takes 11 seconds (everything in shared buffers).
> If I do the very same thing as:
> select A.x, count(B.x) from A left join B on st_within(B.y, A.y) group by A.x;
> same result, but 85 seconds (every thing in shared buffers, again)
> if I redo asking with explain analyze, buffers, the plan is very different.
>
>
> if I do create a gist index on geometry column of the big table, both syntax takes 21 seconds.
>
> I get the feeling I am missing something.. (at least 2 things...)
> can someone shed some light ??
Please provide the executions plans for both queries with and without the index on B.y.
--
Erik