On Thu, Nov 26, 2020 at 12:37 AM Mats Julian Olsen <mats@xxxxxxxxxxxxxxxxx> wrote:
We have just set up postgres_fdw between two postgres databases, x and y, with the plan to periodically insert data from x into y.We've successfully set up the connection with a few options: `use_remote_estimate 'true'` and `fetch_size '50000'` (the latter we've played around with). We've run ANALYZE on the foreign server.SELECTs against the foreign table returns in milliseconds, however an INSERT of 100 rows takes 10 seconds. we have roughly 200 000 rows for the initial sync, which translates into ~6 hours.Is this the expected performance of postgre_fdw? Is there anything we've overlooked when setting this up? Very curious to hear experiences from the community when doing read/write and not just read from foreign sources.
Are your inserts run in individual transactions or grouped into one transaction? If the latter, commit time will be a factor.
What's the round-trip time (ping time) to the foreign server? Since postgres_fdw runs each individual insert as a separate statement, you're going to face insert times of (n * RTT) for inserts. Assuming negligible time for insert execution on the foreign server, your runtime is 21600 seconds for 200000 rows, i.e. 9.25 rows/second or 0.108 seconds/row. That would be consistent with a 90-100ms ping time to the foreign server.
You'll be pleased to know that there is currently work ongoing in pgsql-hackers to add the capability to batch INSERTs in postgres_fdw to improve performance on higher latency links. See https://www.postgresql.org/message-id/flat/20200628151002.7x5laxwpgvkyiu3q%40development . That could well reduce your RTTs immensely. Try the patch out if you can and report back please.
If you can get the client application to manage the foreign insert directly, then handle commit consistency using two-phase commit, you should be able to do the insert in half an hour or less instead (assuming ~10ms execution time per insert and 90ms RTT). If you use `COPY`, or if you can use JDBC to benefit from PgJDBC's support for the JDBC addBatch() and executeBatch() APIs, you should be able to get it down way lower than that. Assuming your RTT latency is 90ms and you spend 10ms executing each insert, your insert time might well go down to 0.010 * 200000 + 90*2 = 2180 seconds or about 36 minutes. If you can insert a row in 3ms with COPY, 13 minutes.
There's work ongoing on making libpq (which underlies postgres_fdw) capable of running multiple statements at the same time, i.e. "pipelining". That won't immediately benefit postgres_fdw because using it in postgres_fdw would require changes to the whole postgres executor as well. But if adopted, it'd allow postgres_fdw to achieve that sort of performance transparently.