What was being run when the above ERROR was triggered?
The initial copy of a table. Other than that we ran select pg_size_pretty(pg_relation_size('table_name')) to see the current size of the table being copied to get a feeling on progress.
And whenever we added a new table to the publication we ran ALTER SUBSCRIPTION migration REFRESH PUBLICATION; to add any new table to the subscription. But not around that timestamp, about 50 minutes before the first occurence of that ERROR. (no ERRORS after prior ALTER SUBSCRIPTIONs).
But after the initial copy's ended there are more ERROR's on different WAL segments missing. Each missing wal segment is logged as ERROR a couple of times and then no more. After a couple of hours no errors are logged.
Lars
On Mon, Dec 21, 2020 at 10:23 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
On 12/21/20 12:26 PM, Lars Vonk wrote:
> Hi Adrian,
>
> Thanks for taking the time to reply!
>
> 2) Are the logs still available for inspection to see if there where
> any
> errors thrown?
>
>
> Yes, and we dug into those. And we also found some indications that
> something went wrong.
>
> 4) How did you determine the rows where missing?
>
>
> We were alerted by a bug later that day and found that some rows were
> missing in the new primary. We did a compare based on primary key and
> found that several tables were missing rows. Before the switch we
> unfortunately only checked max(id) and did some counts on tables and
> those all checked out. We didn't do a count on all tables...
>
> So to come back at the logs:
>
> We dug a little deeper and we did found ERROR logs around the time we
> ran the initial copies. During a period of several hours that day we see
> a couple of messages like:
>
> ERROR: requested WAL segment 00000001000001F10000001D has already
> been removed
What was being run when the above ERROR was triggered?
>
> Regards,
> Lars
>
> On Sun, Dec 20, 2020 at 6:58 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx