Re: postgres_fdw insert extremely slow

Mats Julian Olsen <mats@xxxxxxxxxxxxxxxxx> · Wed, 25 Nov 2020 18:01:57 +0100

On 25.11.2020 17:57, Adrian Klaver wrote:
On 11/25/20 8:48 AM, Mats Julian Olsen wrote:
Apologies for the sloppiness!
Postgres version(s)?

x: Aurora PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc 
(GCC) 4.9.3, 64-bit (RDS)

y: PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on 
x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 
20191008, 64-bit (GCP)

So they are separated by what network distance?

By quite a lot! One server is in eu-west-1 and the other in gcp 
us-central1-a. Note however, that SELECTs across the network are very 
fast, so it doesn't seem like this is a network issue.

Without the actual query and EXPLAIN ANALYZE on it this will be 
difficult to answer with any detail.

Also would be nice to have the table schema.

Here's the query, schema and the EXPLAIN ANALYZE

FYI, you will get more responses by including below information 
directly in the email. To that end:

On x

CREATE TABLE labels (
    id integer NOT NULL,
    address_id bytea NOT NULL,
    name text NOT NULL,
    author character varying(50) NOT NULL,
    type text NOT NULL,
    source text,
    updated_at timestamp with time zone DEFAULT now() NOT NULL,
    CONSTRAINT lowercase_name CHECK ((name = lower(name))),
    CONSTRAINT lowercase_type CHECK ((type = lower(type))),
    CONSTRAINT whitespace_name CHECK ((regexp_replace(btrim(name, ' 
'::text), '(\s+)'::text, ' '::text, 'g'::text) = name)),
    CONSTRAINT whitespace_type CHECK ((regexp_replace(btrim(type, ' 
'::text), '(\s+)'::text, ' '::text, 'g'::text) = type))
);
CREATE INDEX labels_label ON labels USING BTREE (name);
CREATE INDEX labels_label_type ON labels USING BTREE (type);
CREATE INDEX labels_address ON labels USING HASH (address_id);
CREATE INDEX labels_source ON labels USING BTREE (source);

On y

CREATE TABLE labels.labels (
    address     bytea PRIMARY KEY,
    labels      text[]
);

query: https://gist.github.com/mewwts/704663e19a21bc51864f3298f09a18c0

INSERT INTO foreign.labels (address, labels)
SELECT address_id, ARRAY_AGG(name) AS labels
FROM labels
GROUP BY 1
LIMIT 100;

plan: https://explain.depesz.com/s/RQFQ

Insert on labels  (cost=0.42..26.19 rows=100 width=53) (actual 
time=11541.205..11541.205 rows=0 loops=1)
   ->  Limit  (cost=0.42..25.19 rows=100 width=53) (actual 
time=0.068..3.549 rows=100 loops=1)
         ->  GroupAggregate  (cost=0.42..26502.02 rows=106996 
width=53) (actual time=0.066..3.449 rows=100 loops=1)
               Group Key: labels_1.address_id
               ->  Index Only Scan using 
labels_address_id_type_name_key on labels labels_1 
(cost=0.42..24068.85 rows=219145 width=31) (actual time=0.054..0.414 
rows=201 loops=1)
                     Heap Fetches: 0
 Planning Time: 0.102 ms
 Execution Time: 12797.143 ms

Best,
Thank you for inlining this!

--
Mats
CTO @ Dune Analytics
We're hiring: https://careers.duneanalytics.com