On 25.11.2020 17:57, Adrian Klaver wrote:
On 11/25/20 8:48 AM, Mats Julian Olsen wrote:
Apologies for the sloppiness!
Postgres version(s)?
x: Aurora PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc
(GCC) 4.9.3, 64-bit (RDS)
y: PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on
x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1
20191008, 64-bit (GCP)
So they are separated by what network distance?
By quite a lot! One server is in eu-west-1 and the other in gcp
us-central1-a. Note however, that SELECTs across the network are very
fast, so it doesn't seem like this is a network issue.
Without the actual query and EXPLAIN ANALYZE on it this will be
difficult to answer with any detail.
Also would be nice to have the table schema.
Here's the query, schema and the EXPLAIN ANALYZE
FYI, you will get more responses by including below information
directly in the email. To that end:
On x
CREATE TABLE labels (
id integer NOT NULL,
address_id bytea NOT NULL,
name text NOT NULL,
author character varying(50) NOT NULL,
type text NOT NULL,
source text,
updated_at timestamp with time zone DEFAULT now() NOT NULL,
CONSTRAINT lowercase_name CHECK ((name = lower(name))),
CONSTRAINT lowercase_type CHECK ((type = lower(type))),
CONSTRAINT whitespace_name CHECK ((regexp_replace(btrim(name, '
'::text), '(\s+)'::text, ' '::text, 'g'::text) = name)),
CONSTRAINT whitespace_type CHECK ((regexp_replace(btrim(type, '
'::text), '(\s+)'::text, ' '::text, 'g'::text) = type))
);
CREATE INDEX labels_label ON labels USING BTREE (name);
CREATE INDEX labels_label_type ON labels USING BTREE (type);
CREATE INDEX labels_address ON labels USING HASH (address_id);
CREATE INDEX labels_source ON labels USING BTREE (source);
On y
CREATE TABLE labels.labels (
address bytea PRIMARY KEY,
labels text[]
);
query: https://gist.github.com/mewwts/704663e19a21bc51864f3298f09a18c0
INSERT INTO foreign.labels (address, labels)
SELECT address_id, ARRAY_AGG(name) AS labels
FROM labels
GROUP BY 1
LIMIT 100;
plan: https://explain.depesz.com/s/RQFQ
Insert on labels (cost=0.42..26.19 rows=100 width=53) (actual
time=11541.205..11541.205 rows=0 loops=1)
-> Limit (cost=0.42..25.19 rows=100 width=53) (actual
time=0.068..3.549 rows=100 loops=1)
-> GroupAggregate (cost=0.42..26502.02 rows=106996
width=53) (actual time=0.066..3.449 rows=100 loops=1)
Group Key: labels_1.address_id
-> Index Only Scan using
labels_address_id_type_name_key on labels labels_1
(cost=0.42..24068.85 rows=219145 width=31) (actual time=0.054..0.414
rows=201 loops=1)
Heap Fetches: 0
Planning Time: 0.102 ms
Execution Time: 12797.143 ms
Best,
Thank you for inlining this!
--
Mats
CTO @ Dune Analytics
We're hiring: https://careers.duneanalytics.com