Re: Same query 10000x More Time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Thu, 6 Jan 2022 at 20:01, Avi Weinberg <AviW@xxxxxxxxx> wrote:

Thanks for the input

 

postgres_fdw seems to bring the entire table even if all I use in the join is just the id from the remote table.  I know it is possible to query for the missing ids and then perform the delete, but I wonder why all types of joins are so inefficient.

 


just for fun, please do not do this.
I tried out multiple options where we join a small local table to a huge remote table with multiple plan skip settings.


postgres@db:~/playground$ psql
psql (14beta1)
Type "help" for help.

postgres=# \c localdb
You are now connected to database "localdb" as user "postgres".
localdb=# \x
Expanded display is on.
localdb=# table pg_foreign_server;
-[ RECORD 1 ]-----------------------------------------------------------------------------------------------
oid        | 85462
srvname    | remote_server
srvowner   | 10
srvfdw     | 85458
srvtype    |
srvversion |
srvacl     |
srvoptions | {dbname=remotedb,use_remote_estimate=true,fdw_startup_cost=0,fdw_tuple_cost=0,fetch_size=10000}

localdb=# \x
Expanded display is off.
localdb=# \dt
        List of relations
 Schema | Name | Type  |  Owner
--------+------+-------+----------
 public | t    | table | postgres
(1 row)

localdb=# \det remote_schema.remote_table;
            List of foreign tables
    Schema     |    Table     |    Server
---------------+--------------+---------------
 remote_schema | remote_table | remote_server
(1 row)

localdb=# \c remotedb;
You are now connected to database "remotedb" as user "postgres".
remotedb=# \dt
            List of relations
 Schema |     Name     | Type  |  Owner
--------+--------------+-------+----------
 public | remote_table | table | postgres
(1 row)

remotedb=# select count(1) from remote_table;
 count
--------
 100000
(1 row)

remotedb=# \c localdb
You are now connected to database "localdb" as user "postgres".
localdb=# select count(1) from t;
 count
-------
    10
(1 row)

# all the set options are forcing the optmizer to skip that plan route
localdb=# explain (analyze, verbose) select * from t join remote_schema.remote_table r on (t.t_id = r.t_id);
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1.23..2817.97 rows=100000 width=16) (actual time=5.814..63.310 rows=90000 loops=1)
   Output: t.t_id, t.t_col, r.rt_id, r.t_id
   Inner Unique: true
   Hash Cond: (r.t_id = t.t_id)
   ->  Foreign Scan on remote_schema.remote_table r  (cost=0.00..2443.00 rows=100000 width=8) (actual time=5.797..47.329 rows=100000 loops=1)
         Output: r.rt_id, r.t_id
         Remote SQL: SELECT rt_id, t_id FROM public.remote_table
   ->  Hash  (cost=1.10..1.10 rows=10 width=8) (actual time=0.009..0.010 rows=10 loops=1)
         Output: t.t_id, t.t_col
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on public.t  (cost=0.00..1.10 rows=10 width=8) (actual time=0.005..0.006 rows=10 loops=1)
               Output: t.t_id, t.t_col
 Planning Time: 4.464 ms
 Execution Time: 65.995 ms
(14 rows)

localdb=# set enable_seqscan TO 0;
SET
localdb=# explain (analyze, verbose) select * from t join remote_schema.remote_table r on (t.t_id = r.t_id);
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=12.41..2829.16 rows=100000 width=16) (actual time=5.380..61.028 rows=90000 loops=1)
   Output: t.t_id, t.t_col, r.rt_id, r.t_id
   Inner Unique: true
   Hash Cond: (r.t_id = t.t_id)
   ->  Foreign Scan on remote_schema.remote_table r  (cost=0.00..2443.00 rows=100000 width=8) (actual time=5.362..45.625 rows=100000 loops=1)
         Output: r.rt_id, r.t_id
         Remote SQL: SELECT rt_id, t_id FROM public.remote_table
   ->  Hash  (cost=12.29..12.29 rows=10 width=8) (actual time=0.011..0.011 rows=10 loops=1)
         Output: t.t_id, t.t_col
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Index Scan using t_pkey on public.t  (cost=0.14..12.29 rows=10 width=8) (actual time=0.005..0.008 rows=10 loops=1)
               Output: t.t_id, t.t_col
 Planning Time: 0.696 ms
 Execution Time: 63.666 ms
(14 rows)

localdb=# set enable_hashjoin TO 0;
SET
localdb=# explain (analyze, verbose) select * from t join remote_schema.remote_table r on (t.t_id = r.t_id);
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.15..4821.93 rows=100000 width=16) (actual time=5.199..75.817 rows=90000 loops=1)
   Output: t.t_id, t.t_col, r.rt_id, r.t_id
   Inner Unique: true
   ->  Foreign Scan on remote_schema.remote_table r  (cost=0.00..2443.00 rows=100000 width=8) (actual time=5.186..46.152 rows=100000 loops=1)
         Output: r.rt_id, r.t_id
         Remote SQL: SELECT rt_id, t_id FROM public.remote_table
   ->  Result Cache  (cost=0.15..0.16 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=100000)
         Output: t.t_id, t.t_col
         Cache Key: r.t_id
         Hits: 99990  Misses: 10  Evictions: 0  Overflows: 0  Memory Usage: 2kB
         ->  Index Scan using t_pkey on public.t  (cost=0.14..0.15 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=10)
               Output: t.t_id, t.t_col
               Index Cond: (t.t_id = r.t_id)
 Planning Time: 0.692 ms
 Execution Time: 78.512 ms
(15 rows)

localdb=# set enable_resultcache TO 0;
SET
localdb=# explain (analyze, verbose) select * from t join remote_schema.remote_table r on (t.t_id = r.t_id);
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=479.10..5847.98 rows=100000 width=16) (actual time=12.855..66.094 rows=90000 loops=1)
   Output: t.t_id, t.t_col, r.rt_id, r.t_id
   Inner Unique: true
   Merge Cond: (r.t_id = t.t_id)
   ->  Foreign Scan on remote_schema.remote_table r  (cost=0.29..4586.89 rows=100000 width=8) (actual time=6.235..55.329 rows=100000 loops=1)
         Output: r.rt_id, r.t_id
         Remote SQL: SELECT rt_id, t_id FROM public.remote_table ORDER BY t_id ASC NULLS LAST
   ->  Index Scan using t_pkey on public.t  (cost=0.14..12.29 rows=10 width=8) (actual time=0.006..0.024 rows=9 loops=1)
         Output: t.t_id, t.t_col
 Planning Time: 0.704 ms
 Execution Time: 68.724 ms
(11 rows)

localdb=# set enable_mergejoin TO 0;
SET
localdb=# explain (analyze, verbose) select * from t join remote_schema.remote_table r on (t.t_id = r.t_id);
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=113.94..8830.28 rows=100000 width=16) (actual time=11.576..89.465 rows=90000 loops=1)
   Output: t.t_id, t.t_col, r.rt_id, r.t_id
   ->  Index Scan using t_pkey on public.t  (cost=0.14..12.29 rows=10 width=8) (actual time=0.009..0.034 rows=10 loops=1)
         Output: t.t_id, t.t_col
   ->  Foreign Scan on remote_schema.remote_table r  (cost=113.80..781.80 rows=10000 width=8) (actual time=7.648..8.108 rows=9000 loops=10)
         Output: r.rt_id, r.t_id
         Remote SQL: SELECT rt_id, t_id FROM public.remote_table WHERE (($1::integer = t_id))
 Planning Time: 0.667 ms
 Execution Time: 92.131 ms
(9 rows)

from the logs for the last case: (it has open a new cursor everytime for each matching id)  and is still the slowest.

2022-01-06 22:10:48.665 IST [2318] LOG:  execute <unnamed>: DECLARE c1 CURSOR FOR
  SELECT rt_id, t_id FROM public.remote_table WHERE (($1::integer = t_id))
2022-01-06 22:10:48.665 IST [2318] DETAIL:  parameters: $1 = '1'
2022-01-06 22:10:48.665 IST [2318] LOG:  statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.679 IST [2318] LOG:  statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.679 IST [2318] LOG:  statement: CLOSE c1
2022-01-06 22:10:48.679 IST [2318] LOG:  execute <unnamed>: DECLARE c1 CURSOR FOR
  SELECT rt_id, t_id FROM public.remote_table WHERE (($1::integer = t_id))
2022-01-06 22:10:48.679 IST [2318] DETAIL:  parameters: $1 = '2'
2022-01-06 22:10:48.679 IST [2318] LOG:  statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.686 IST [2318] LOG:  statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.687 IST [2318] LOG:  statement: CLOSE c1
2022-01-06 22:10:48.687 IST [2318] LOG:  execute <unnamed>: DECLARE c1 CURSOR FOR
  SELECT rt_id, t_id FROM public.remote_table WHERE (($1::integer = t_id))
2022-01-06 22:10:48.687 IST [2318] DETAIL:  parameters: $1 = '3'
2022-01-06 22:10:48.687 IST [2318] LOG:  statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.698 IST [2318] LOG:  statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.698 IST [2318] LOG:  statement: CLOSE c1
2022-01-06 22:10:48.698 IST [2318] LOG:  execute <unnamed>: DECLARE c1 CURSOR FOR
  SELECT rt_id, t_id FROM public.remote_table WHERE (($1::integer = t_id))
2022-01-06 22:10:48.698 IST [2318] DETAIL:  parameters: $1 = '4'
2022-01-06 22:10:48.698 IST [2318] LOG:  statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.705 IST [2318] LOG:  statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.705 IST [2318] LOG:  statement: CLOSE c1


so i think, i just trust the optimizer, or rewrite my so as to gather the predicate first locally and then pass them to remote, or use materialized views to maintain
a stale copy of the remote table on my local db etc.


 

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux