Re: pg_basebackup + incremental base backups

Christopher Pereira <kripper@xxxxxxxxxxxx> · Sun, 24 May 2020 14:36:01 -0400



      We've contemplated adding support for something like this to pgbackrest,
since all the pieces are there, but there hasn't been a lot of demand
for it and it kind of goes against the idea of having a proper backup
solution, really..  It'd also create quite a bit of load on the primary
to checksum all the files to do the comparison against what's on the
replica that you're trying to update, so not something you'd probably
want to do a lot more than necessary.

    
    Ok, we want to use pgbackrest to rebuild a standby that has
        fallen behind (where pg_rewind won't work). After reading
      the docs, we believe we should use this setup:

    
    a) Primary host: primary cluster

    
    b) Repository host: needed for rebuilding the standby (and having
      PITR as bonus).
    c) Standby host: standby cluster

    
    Some questions:
    1) The standby will use streaming replication and will be in sync
      until someday something funny happens and both standby and
      repository get out of sync with the primary.

      Now, to rebuild the standby first we will have to create a new
      backup transferring the data from primary -> repository,
      right?

      Wouldn't this also have a load impact on the primary cluster?

    
    2) In the user guide section 17.3 is explained how to create a
      "pg-standby host" to replicate the data from the repository
        host.

      And in section 17.4 is explained how to setup Streaming
      Replication to replicate the data from the primary host.

      Do 17.3 and 17.4 work together so that the data is replicated
        from the repository and then streamed from the primary?
    3) Before being able to rebuild the standby cluster, would we
      first need to update the backup on the repository (backup from
      primary -> repository) in order for streaming replication to
      work (from primary -> standby)?
    4) Once the backup on the repository is ready, what are the
      chances that streaming replication from primary to standby won't
      work because they got out of sync again?
    5) Could we just work with 2 hosts (primary and standby) instead
      of 3?

      FAQ section 8 says the repository shouldn't be on the same host as
      the standby and having it on the primary doesn't make much sense
      because if the primary host is down we won't have access to the
      backup.
    It would be ideal to have the repository on the standby host and
      taking good care of the configurations. What exactly should be
      cared of for this setup to be safe?

    
    I'm afraid I'm not understanding very well the pgbackrest design
      or how to use it efficiently to rebuild a standby cluster that got
      out of sync.