On Mon, Jun 22, 2020 at 8:02 AM Paul Förster <paul.foerster@xxxxxxxxx> wrote:
Hi Stephen,
> On 22. Jun, 2020, at 07:36, Stephen Frost <sfrost@xxxxxxxxxxx> wrote:
> That's not the only case that I, at least, have heard of- folks aren't
> really very happy with their backups fail when they could have just as
> well completed, even if they're overlapping. Sure, it's better if
> backups are scheduled such that they don't overlap, but that can be hard
> to guarantee.
I see.
Yeah, especially when your backups are a number of TB which makes them take Some Time (TM) to complete...
> The thing about this is though that the new API avoids *other* issues,
> like what happens if the system crashes during a backup (which is an
> entirely common thing that happens, considering how long many backups
> take...) and it does so in a relatively reasonable way while also
> allowing concurrent backups, which is perhaps a relatively modest
> benefit but isn't the main point of the different API.
that makes me curious about another thing. The output of pg_stop_backup() is to be stored. Otherwise the backup is useless. So far, so good. But what if the server crashes in the middle of the backup and pg_stop_back() hence is never reached? In this case, it obviously does not create any output.
Whenever the connection that ran pg_start_backup() disconnects without calling pg_stop_backup(), the "state" of being "in backup mode" is "rolled back" in the database. So similar to how a transaction you started with BEGIN gets rolled back if you just disconnect without issuing COMMIT.
Your backup will of course be invalid in this case, but the database itself will be fine. (And the inability to ensure this is exactly why the old "exclusive mode" for backups is deprecated -- but the non-exclusive mode is safe with this) So it is of course very important to check that the pg_stop_backup() step completed successfully, and fail the entire backup if it did not.
Ok, you usually start the server, the database does a crash recovery and opens. Then, some time later, you do the usual backup and all is well. This is like 99.999% of all cases.
But what if you need to restore to the latest transaction while the database was running in backup mode during which the crash occurred. How does that work if no pg_stop_backup() output exists? Did I miss something here?
It does not work off *that* base backup. But if you start from the *prior* be backup (one that did complete with a successful pg_stop_backup) then you can still use the archived wal to recover to any point in time.