Thanks for the input postgres_fdw seems to bring the entire table even if all I use in the join is just the id from the remote table. I know it is possible to query for the missing ids and then perform the delete, but I wonder why all types of joins are so
inefficient. DELETE FROM tbl_local lcl WHERE NOT EXISTS ( SELECT id FROM tbl_link lnk WHERE lnk.id = lcl.id ); "Delete on tbl_local lcl (cost=114.59..122.14 rows=3 width=730) (actual time=62153.636..62153.639 rows=0 loops=1)" " -> Hash Anti Join (cost=114.59..122.14 rows=3 width=730) (actual time=62153.633..62153.636 rows=0 loops=1)" " Hash Cond: (lcl.id = lnk.id)" " -> Seq Scan on tbl_local lcl (cost=0.00..7.11 rows=111 width=14) (actual time=0.022..0.062 rows=111 loops=1)" " -> Hash (cost=113.24..113.24 rows=108 width=732) (actual time=55984.489..55984.490 rows=112 loops=1)" " Buckets: 1024 (originally 1024) Batches: 32 (originally 1) Memory Usage: 240024kB" " -> Foreign Scan on tbl_link lnk (cost=100.00..113.24 rows=108 width=732) (actual time=48505.926..51893.668 rows=112 loops=1)" "Planning Time: 0.237 ms" "Execution Time: 62184.253 ms" From: Vijaykumar Jain [mailto:vijaykumarjain.github@xxxxxxxxx]
On Thu, Jan 6, 2022, 3:50 PM Avi Weinberg <AviW@xxxxxxxxx> wrote:
I understand when the optimizer makes a decision it uses stats to use the least expensive plan to get the result. I can reply but I am pretty sure making an analogy to a local setup of big and small table is not the same as small local table and a big remote table. I would leave it to the experts here unless you are open to read the src for postgres_fdw extension. There must be a reason if that is beyond cost calculation as to why this happens. Else if this is all just cost based, you can try tweaking the cost params and see if you can get a better plan. For exp, if you force parallel cost to 0 on the foreign server, it may use parallel workers and do some speed up, but given my exp, fighting optimizer is mostly asking for trouble :) |