Re: Missing rows after migrating from postgres 11 to 12 with logical replication

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Sun, 20 Dec 2020 09:58:09 -0800

On 12/20/20 8:33 AM, Lars Vonk wrote:
Hi,

Just wondering if someone knows how this could have happened? Did we 
miss out on something when setting up the logical replication? Are there 
any scenario's in which this could happen (like database restart or 
anything else?).
Or should I report this a bug (although I can't image it is)?
We really would like to know how we can prevent this from happening the 
next time.

We still have the old primary, and a snapshot of the current primary 
around the time we flipped from the old to the new. So we could some 
digging into the cause, but we don't know what to look for...

Questions I have:

1) Was there activity on the 12 instance while it was being replicated 
to that could account for the missing(deleted?) rows?

2) Are the logs still available for inspection to see if there where any 
errors thrown?

3) Are there FK relationships involved?

4) How did you determine the rows where missing?

Any help or tips are appreciated.

Thanks in advance,

Lars

On Fri, Dec 18, 2020 at 4:42 PM Lars Vonk <lars.vonk@xxxxxxxxx 
<mailto:lars.vonk@xxxxxxxxx>> wrote:

    Hi,

    We migrated from postgres 11 to 12 using logical replication (over
    local network). Today we noticed that one table is missing 1252 rows
    after the replication finished and we flipped to the new primary (we
    still have the old master database so we can recover).

    We see that these rows were inserted in the table after starting the
    initial copy of the table. Most of the missing rows seem from new
    inserts happening **during the initial copy** (1230) and the rest
    (22) from inserts **during the period the replication ran** (7 days).

    After further investigation unfortunately more tables have missing
    rows, all of them are after the initial table copy phase. We took a
    per-table approach for the replication, starting with creating an
    empty publication and adding tables via

    ALTER PUBLICATION pg12_migration ADD TABLE FOO

    After that we refreshed the publication on the "new postgres 12
    primary" using

    ALTER SUBSCRIPTION pg12_migration REFRESH PUBLICATION;

    We only added new tables after the the initial copy of the previous
    was done (the internal state was replicating).

    We never stopped the subscriptions during all this and we started
    with a fresh schema.

    We did some sanity checks before we switched to the new master, like
    comparing max(id) to see if the replica was up to date (including
    this table) and counts on some smaller tables and that all checked
    out okay, we never thought of missing rows somewhere in between....

    So how can this happen?

    Lars

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx