logical replication initiate via manual pg_dump

Vijaykumar Jain <vjain@xxxxxxxxxxxxx> · Fri, 10 May 2019 20:59:52 +0530

Hey Guys,

tl;dr, but incase i missed something, i can follow up on this with more details.

I have a setup where i try to upgrade a cluster from pg10 to pg11 via
logical replication with minimum downtime.
its a database that is 500GB with 1 table having 350GB of data (+
bloat) and 100GB of indexes.

now when i triggered logical replication, it took more than 2 days
(and still around 10% remaining) to catch up on the data. the
publisher shows copy table to stdout still running.
i am not sure of the internals, but given a case that in a test prep
with no active connections and DMLs it took around 2 days, does seem
like i am missing something.

on both pg10 and pg11
ram 32GB
cpu 8
SSD
max_wal_size 100GB
checkpoint_timeout 30min
shared_buffers 8GB

on pg10
(copy still running on the huge table)
select (now() - query_start)::interval, query from pg_stat_activity;
2 days 19:03:12.799767 | COPY public.<table> TO STDOUT

this is how disk looks like on pg10 (publisher)
du -sh /var/lib/postgresql/1*/main/{pg_wal*,base*}
78G /var/lib/postgresql/10/main/pg_wal
467G /var/lib/postgresql/10/main/base

on pg11
du -sh /var/lib/postgresql/1*/main/{pg_wal*,base*}
65G /var/lib/postgresql/11/main/pg_wal
417G /var/lib/postgresql/11/main/base

now although this is kind of upgrade involves very little downtime,
but it required around 2x the disk on the original server while the
replication was running, but there was not way to correctly estimate
the disk required for logical replication to finish and move over to
pg11.

---------------------

now
given the above problem,
i tried to read thru with a goal of if i can do an initial sync via
pg_dump and start the restore from there.

https://www.slideshare.net/noriyoshishinoda/pgconfasia-2017-logical-replication-internals-english
https://medium.com/avitotech/recovery-use-cases-for-logical-replication-in-postgresql-10-a1e6bab03072

and setup a small test cluster as to test if i can pg_dump and
pg_restore and then start replication from the restart_lsn of the
primary/publisher.

***
demo lab
(all pg11 for now)
pg1 (primary/publisher on port 3000)
pg2  (hot_standby replica on port 3001)
pg3  (subscriber on port 3002)

***
on pg1 (create some tables and trigger to ensure trigger does not fire
on subscriber)
example=# CREATE TABLE public.company (
example(#     id integer NOT NULL,
example(#     name text NOT NULL,
example(#     age integer NOT NULL,
example(#     address character(50),
example(#     salary real
example(# );
CREATE TABLE
example=# ALTER TABLE ONLY public.company
example-#     ADD CONSTRAINT company_pkey PRIMARY KEY (id);
ALTER TABLE
example=# CREATE TABLE public.audit (
example(#     emp_id integer NOT NULL,
example(#     entry_date text NOT NULL
example(# );
CREATE TABLE
example=# ALTER TABLE ONLY public.audit
example-#     ADD CONSTRAINT audit_pkey PRIMARY KEY (emp_id);
ALTER TABLE
example=# CREATE FUNCTION public.auditlogfunc() RETURNS trigger
example-#     LANGUAGE plpgsql
example-#     AS $$
example$#    BEGIN
example$#       INSERT INTO AUDIT(EMP_ID, ENTRY_DATE) VALUES (new.ID,
current_timestamp);
example$#       RETURN NEW;
example$#    END;
example$# $$;
CREATE FUNCTION
example=# CREATE TRIGGER example_trigger AFTER INSERT ON
public.company FOR EACH ROW EXECUTE PROCEDURE public.auditlogfunc();

***
setup pg2 as hot_standby replica
and dump the schema of pg1 example database on pg3.

***
on pg1 (insert some dummy data)
insert into company select x, x::text, x, 'address-' || x::text, x
from generate_series(1, 1500) x;

***
on pg1
create publication pg1 for all tables;

***
on pg2
verify replica is up and running and data replicated (fine)
example=# select count(1) from company;
 count
-------
  1500
(1 row)

***
on pg3
*take a dump and restore the dump on pg3.

pg_dump -p 3000 -U postgres -Fc --serializable-deferrable
--no-subscriptions --no-publications -d example | pg_restore -p 3002
-U postgres -C -d example

*then create subscriptions to pg1

create subscription pg3 connection 'dbname=example port=3000
user=postgres' publication pg1 with (enabled = false, copy_data =
false);

***
on pg1
*get the last restart_lsn value from pg_replication_slots

select * from pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | temporary |
active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
 pg2       |        | physical  |        |          | f         | t
  |      22724 |      |              | 13/A8133A68 |

***
and then on pg3

select * from pg_stat_subscription;
 subid | subname | pid | relid | received_lsn | last_msg_send_time |
last_msg_receipt_time | latest_end_lsn | latest_end_time
-------+---------+-----+-------+--------------+--------------------+-----------------------+----------------+-----------------
 17104 | pg3     |     |       |              |                    |
                    |                |
(1 row)

*** manually advance the lsn of remote origin
select pg_replication_origin_advance('pg_ 17104', '13/A8133A68');

select * from pg_replication_origin_status;
 local_id | external_id | remote_lsn  | local_lsn
----------+-------------+-------------+-----------
        1 | pg_17104    | 13/A8133A68 | 0/0

*enable subscription
alter subscription pg3 enable;

select * from pg_stat_subscription;
 subid | subname |  pid  | relid | received_lsn |
last_msg_send_time       |     last_msg_receipt_time     |
latest_end_lsn |        latest_end_time
-------+---------+-------+-------+--------------+-------------------------------+-------------------------------+----------------+-------------------------------
 17104 | pg3     | 31346 |       | 13/A8137700  | 2019-05-10
14:55:08.257756+00 | 2019-05-10 14:55:08.257806+00 | 13/A8137700    |
2019-05-10 14:55:08.257756+00

(1 row)

example=# select count(1) from company;
 count
-------
  1500
(1 row)

and i see i am able to carry on with the logical replication from the
publisher via the dump.

is this the right way, i have not been tinkering with lsn and generally used to
alter subscription pg3 refresh publication with (copy_data);

but for some reason this takes forever, hence the pg_dump and restore.

questions:
when a copy is triggered on the publisher during the initial sync, is
it similar to pg_dump but with -F plain  and hence the network i/o is
huge ?
if true, is there an option to compress the initial sync like the
pg_dump -Fc format and apply if on the subscriber like pg_restore to
make the network i/o fast?

finally,
am i totally doing things wrong :) ? and this should not be done.

Regards,
Vijay