Re: Something else about Redo Logs disappearing

Peter <pmc@xxxxxxxxxxxxxxxxxxxxxxx> · Thu, 11 Jun 2020 19:30:32 +0200

On Wed, Jun 10, 2020 at 08:32:22AM -0400, Stephen Frost wrote:

! > What repo?? I seem to have missed that at first glance.
! 
! Yes, pgbackrest has a repo, like most other tools (though they call them
! different things... pg_basebackup has one though it's not really
! formal).
! 
! > Are You indeed suggesting that one should have their data within
! > the database, where it is worked with, and then use Your tool
! > to copy it to some "repo" disk playground whatever area, and then
! > use their regular backup system to COPY IT AGAIN into their
! > backup/archiving system? Are You kiddin'?
! 
! No, I'm not kidding and yes, that's what I'm suggesting.

Well, do You think I could seriously sell that to anybody?

! You need a
! consistent backup of your database that includes all the needed WAL to
! perform a restore.

Okay. So lets behave like professional people and figure how that
can be achieved:
At first, we drop that WAL requirement, because with WAL archiving
it is already guaranteed that an unbroken chain of WAL is always
present in the backup (except when we have a bug like the one that
lead to this discussion).
So this is **not part of the scope**.

I'll get to the other item, the "consistent backup", further below.

! This is only one option though, there are others- you can also use
! pgbackrest to push your backups to s3 (or any s3-compatible data storage
! system, which includes some backup systems), and we'll be adding
! support

! I concur that this is becoming a madhouse, and is pushing past the limit
! for what I'm willing to deal with when trying to assist someone.

Well, then that might be a misconception. I'm traditionally a
consultant, and so I am used to *evaluate* solutions. I don't need
assistance for that, I only need precise technical info.

So lets get serious:
It is NOT technically feasible to amplify the storage in a way that
the entire backup data gets copied from the live database to some "repo"
place first, and then again from that repo place to regular file-based
backup/archiving storage.
And it does not make a difference WHERE that other place is, if at
Jeff's or whereever. It just does not belong into the loop.

So, how could the alternative look like?

I for my part consider doing this:
With a proper transactional filesystem we can do recursive filesystem
snapshots. That means, given a suitable filesystem layout, we can do a
snapshot of the data tree, AND the pg_wal filesystem, AND the
respective tablespaces. And this is atomical.

So, what we then do in the before- hook, is:

 * we call pg_start_backup(whatever, false, false).
 * we issue the filesystem snapshot, atomical.
 * we call pg_stop_backup(whatever).

And then we return with exit(0) (if all succeeded), and the file-based
backup software can start to collect the files from the filesystem
snapshot, and release it afterwards.

This STILL needs threaded programming (as I said, there is no way to
avoid that with those "new API"), but in this case it is effectively
reduced to just grab the return-code of some program that has been
started with "&".

So far, so good.

There is still one thing to be done, namely, the requirement to
collect the data reported by pg_stop_backup() and add that to the
backup, at a point in time where that is ALREADY CLOSED! (that's
another piece of gross bogus in this "new API") - but with my intended
method (and some ugliness) this could now also be solved.

But then, lets think another step forward: for what purpose do we
actually need to call pg_start_backup() and pg_stop_backup() at all?
I couldn't find exhaustive information about that, only some partial
facts.

What we know for certain, is: if we leave these calls away, and
just do a filesystem snapshot and make a backup from that, then we
have exactly the same thing as if we had a power failure at that time.
So this is restoreable, and the server will certainly start.

The remaining question is: can we then give it our subsequently
archived redologs and make it roll forward before going ready? I don't
know that yet, but maybe, if we just throw these WAL into pg_wal, the
thing might be stupid enough to swallow them.

If that does not work, then there might still be ways to trick it
into believing it's a standby server and make it roll forward these
WAL.

So, there are not only ways to avoid the useless storage
amplification, there might even be means to get rid of that whole
misconceived "API".

Things that remain to be figured out:
 1. What does pg_start_backup actually do and why would that be
    necessary? I could not find exhaustive information, but this can
    probably figured from the source. Currently I know so much:
     - it writes a backup_label file. That is just a few lines of
       ASCII and should not be difficult to produce.
     - it does a checkpoint. This can probably be done with the
       respective command (if it is needed at all).
     - it does temporarily switch to full_page_writes. This cannot
       be done interactively. So, item
 2. why might it be necessary to have full_page_writes on? I'm
    currently investigating into that one, but I might tend to assume
    that this is just another piece of enforced Boeing-security, and
    not really needed with a transactional filesystem.

I now hope very much that Magnus Hagander will tell some of the
impeding "failure scenarios", because I am getting increasingly
tired of pondering about probable ones, and searching the old
list entries for them, without finding something substantial.

cheerio,
PMc