Search Postgresql Archives

Re: impact join syntax ?? and gist index ??

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



here they are: (I replace the column and table names) also I post 2 more remarks, one on left join, and one on the test I did on postgres 15 postgis 3.3...

2023=# explain (analyze, buffers) select A.x, count(B.x) from A left join B on st_within(B.geom, A.geom) group by A.x;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=212638398.98..212701792.16 rows=20 width=16) (actual time=86717.857..86757.820 rows=11 loops=1)
   Group Key: A.x
   Buffers: shared hit=4243867
   ->  Sort  (cost=212638398.98..212659529.97 rows=8452398 width=16) (actual time=86717.851..86727.334 rows=421307 loops=1)
         Sort Key: A.x
         Sort Method: quicksort  Memory: 37963kB
         Buffers: shared hit=4243867
         ->  Nested Loop Left Join  (cost=0.00..211521459.31 rows=8452398 width=16) (actual time=17.473..86642.332 rows=421307 loops=1)
               Join Filter: st_within(B.geom, A.geom)
               Rows Removed by Join Filter: 4229377
               Buffers: shared hit=4243867
               ->  Seq Scan on A  (cost=0.00..9.20 rows=20 width=17752) (actual time=0.009..0.043 rows=11 loops=1)
                     Buffers: shared hit=9
               ->  Materialize  (cost=0.00..22309.83 rows=422789 width=40) (actual time=0.001..23.392 rows=422789 loops=11)
                     Buffers: shared hit=15968
                     ->  Seq Scan on B  (cost=0.00..20195.89 rows=422789 width=40) (actual time=0.006..57.651 rows=422789 loops=1)
                           Buffers: shared hit=15968
 Planning Time: 0.693 ms
 Execution Time: 86763.087 ms
(19 lignes)


2023=# explain (analyze, buffers) select A.x, count(B.x) from A, B where st_within(B.geom, A.geom) group by A.x;
                                                                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=6301606.00..6301608.60 rows=20 width=16) (actual time=11857.363..11863.212 rows=6 loops=1)
   Group Key: A.x
   Buffers: shared hit=2128836
   ->  Gather Merge  (cost=6301606.00..6301608.30 rows=20 width=16) (actual time=11857.359..11863.207 rows=12 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         Buffers: shared hit=2128836
         ->  Sort  (cost=6300605.99..6300606.04 rows=20 width=16) (actual time=11840.355..11840.356 rows=6 loops=2)
               Sort Key: A.x
               Sort Method: quicksort  Memory: 25kB
               Worker 0:  Sort Method: quicksort  Memory: 25kB
               Buffers: shared hit=2128836
               ->  Partial HashAggregate  (cost=6300605.36..6300605.56 rows=20 width=16) (actual time=11840.331..11840.332 rows=6 loops=2)
                     Group Key: A.x
                     Buffers: shared hit=2128825
                     ->  Nested Loop  (cost=0.13..6275745.36 rows=4971999 width=16) (actual time=0.505..11781.817 rows=210651 loops=2)
                           Buffers: shared hit=2128825
                           ->  Parallel Seq Scan on B  (cost=0.00..18454.99 rows=248699 width=40) (actual time=0.005..22.859 rows=211395 loops=2)
                                 Buffers: shared hit=15968
                           ->  Index Scan using A_geom_idx on A  (cost=0.13..25.15 rows=1 width=17752) (actual time=0.054..0.055 rows=1 loops=422789)
                                 Index Cond: (geom ~ B.geom)
                                 Filter: st_within(B.geom, geom)
                                 Rows Removed by Filter: 0
                                 Buffers: shared hit=2112857
 Planning Time: 0.252 ms
 Execution Time: 11863.357 ms
(26 lignes)




Marc MILLAS
Senior Architect
+33607850334



On Sat, Jan 7, 2023 at 9:40 PM Erik Wienhold <ewie@xxxxxxxxx> wrote:
> On 07/01/2023 20:46 CET Marc Millas <marc.millas@xxxxxxxxxx> wrote:
>
> Hi,
>
> postgres 12, postgis 3.0
>
> I have a small table A, 11 rows with a varchar column x and a geometry column y.
> gist index on the geometry column.
> the geometry do contains multipolygons (regions on a map)
> I have a second table B , same structure, around 420 000 rows.
> no index,
> the geometry do contains points.
> all geometries are on 4326 srid.
>
> If i ask to count points in each multipolygons:
>
> select A.x, count(B.x) from A, B where st_within(B.y, A.y) group by A.x;
> it takes 11 seconds (everything in shared buffers).
> If I do the very same thing as:
> select A.x, count(B.x) from A left join B on st_within(B.y, A.y) group by A.x;
> same result, but 85 seconds (every thing in shared buffers, again)
> if I redo asking with explain analyze, buffers, the plan is very different.
>
>
> if I do create a gist index on geometry column of the big table, both syntax takes 21 seconds.
>
> I get the feeling I am missing something.. (at least 2 things...)
> can someone shed some light ??

Please provide the executions plans for both queries with and without the index on B.y.

--
Erik

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux