Re: Something else about Redo Logs disappearing

Peter <pmc@xxxxxxxxxxxxxxxxxxxxxxx> · Wed, 10 Jun 2020 01:35:40 +0200

On Tue, Jun 09, 2020 at 12:34:38PM -0700, Adrian Klaver wrote:

! The backup solution is?

https://www.bareos.com/

! Fine rant below. Go forth and work your wonders.

I don't need to, anymore. I did that, for about 20 years - people
I used to work for as a consultant (major banks and insurance shops)
would usually run Informix or Oracle. Postgres is just my own private
fancy.

On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote:
! * Peter (pmc@xxxxxxxxxxxxxxxxxxxxxxx) wrote:
! > This professional backup solution also offers support for postgres.
! > Sadly, it only covers postgres up to Rel.9, and that piece of software
! > wasn't touched in the last 6 or 7 years.
! 
! Then it certainly doesn't work with the changes in v12, and probably has
! other issues, as you allude to.

Just having a look at their webpage, something seems to have been updated
recently, they now state that they have a new postgres adapter:

https://www.bareos.com/en/company_news/postgres-plugin-en1.html
Enjoy reading, and tell us what You think.

! > Actually, I am getting very tired of reading that something which can
! > easily be done within 20 lines of shell scripting, would need special
! 
! This is just simply false- you can't do it properly in 20 lines of shell
! scripting.

Well, Your own docs show how to do it with a one-liner. So please
don't blame me for improving that to 20 lines.

! Sure, you can write something that has probably next to no
! error checking,

Before judging that, one should first specify what precisely is the
demand.
In a basic approach, the demand may be to get the logs out on tape in
a failsafe automated fashion without any miss, and get the data tree
out periodically, and have guaranteed that these files are untampered
as on disk.

And that can very well be done properly with an incremental filesystem
backup software plus some 20 lines of shellscript.

Now talking about doing an automated restore, or, having some menu-
driven solution, or -the worst of all- having a solution that can be
operated by morons - that's an entirely different matter.

In my understanding, backup is done via pgdump. The archive logs are
for emergencies (data corruption, desaster), only. And emergencies
would usually be handled by some professional people who know what
they have to do.

You may consider different demands, and that is also fine, but doesn't
need to concern me.

! uses the deprecated API that'll cause your systems to
! fail to start if you ever happen to have a reboot during a backup

It is highly unlikely that I did never have that happen during 15
years. So what does that mean? If I throw in a pg_start_backup('bogus'),
and then restart the cluster, it will not work anymore?
Lets see...
Clean stop/start - no issue whatsoever. (LOG:  online backup mode
canceled)
kill -9 the whole flock - no issue whatsoever (Log: database system
was interrupted)
I won't pull the plug now, but that has certainly happened lots of
times in the past, and also yielded no issue whatsoever - simply
because there *never* was *any* issue whatsover with Postgres (until
I got the idea to install the relatively fresh R.12 - but that's
understandable).

So maybe this problem exists only on Windows?

And yes, I read that whole horrible discussion, and I could tear my
hair out, really, concerning the "deprecated API". I suppose You mean
the mentioning in the docs that the "exclusive low-level backup" is
somehow deprecated.
This is a very big bad. Because: normally you can run the base backup
as a strictly ordinary file-level backup in "full" mode, just as any
backup software can do it. You will simply execute the
pg_start_backup() and pg_stop_backup() commands in the before- and
after- hooks - and any backup software will offer these hooks.

But now, with the now recommended "non-exclusive low-level backup",
the task is different: now your before-hook needs to do two things
at the same time:
 1. keep a socket open in order to hold the connection to postgres
    (because postgres will terminate the backup when the socket is
    closed), and
 2. invoke exit(0) (because the actual backup will not start until
    the before- hook has properly delivered a successful exit code.
And, that is not only difficult, it is impossible.

So, what has to be done instead: you need to write a separate network
daemon, with the only purpose of holding that connection to postgres
open. And that network daemon needs to handle the communication to
the backup software on one side, and to postgres on the other side.
And that network daemon then needs the full-blown feature requirements
as a fail-safe network daemon should have (and that is a LOT!), plus
it needs to handle all kinds of possible failures (like network
interruptions) in that triangle, during the backup, and properly
notify both sides of whatever might be ongoing (and that is NOT
trivial).

So yes, this is really a LOT of work. But the point is: this all is
not really necessary, because currently the stuff works fine in the
old way.

So, well, do away with the old method - but you cannot do it away
inside of rel.12 - and then I will stay with 12 for as long as
possible (and I don't think I will be the only one).

! has no way to provide verification that the backup was at all successful

It doesn't need to. Thats the main point of using file level standard
backup - if that is tested and works, then it works for the data tree
and the logs just the same. And any monitoring is also just the same.

I see no point in creating artificial complications, which then create
a necessity for individual tools to handle them, which then create a
new requirement for testing and validating all these individual tools -
as this is strictly against the original idea as Brian Kernighan
explained it: use simple and versatile tools, and combine these to
achieve the individual task.

! > The only really interesting thing there is the pg_probackup. These
! > folks seem to have found a way to do row-level incremental backups.
! 
! pg_probackup doesn't do row-level incremental backups, unless I've
! missed some pretty serious change in its development, but it does
! provide page-level,

Ah, well, anyway that seems to be something significantly smaller
than the usual 1 gig table file at once.

! with, as I recall, an extension that didn't get
! good reception when it was posted and discussed on these mailing lists
! by other PG hackers.  I don't know if those concerns about it have been
! addressed or not, you might ask the pg_probackup folks if you're
! considering it as a solution.

Okay, thanks. That's interesting. I was just thinking if one could
cannibalize that respective code and make it into a filter for my own
purposes. And yes, the license would allow that.
And I was thinking that it will be quite an effort to get some kind
of logical verification that this scheme does actually work properly.

I don't consider it as a solution; I consider it as a piece of
functionality that, if working properly, does actually increase the
possibilities.

! PG generally isn't something that can be backed up using the simple file
! based backup solutions, as you might appreciate from just considering
! the number of tools written to specifically deal with the complexity of
! backing up an online PG cluster.

Yes, one could assume that. But then, I would prefer well-founded
technical reasons for what exactly would not work that way, and why it
would not work that way. And there seems to be not much about that.

And in such a case I tend to trust my own understanding, similar to the
full_page_writes matter. (In 2008 I heard about ZFS, and I concluded
that if ZFS is indeed copy-on-write, and if the description of the
full_page_writes option is correct, then one could safely switch it
off and free a lot of backup space - factor 10 at that time, with some
Rel.8. And so I started to use ZFS. Nobody would confirm that at that
time, but nowadays everybody does it.)

This was actually my job as a consultant: to de-mystify technology,
and make it understandable as an arrangement of well explainable
pieces of functionality with well-deducible consequences.
But this is no longer respected today; now people are expected to
*NOT* understand the technology they handle, and instead believe
in marketing and that it all is very complicated and un-intellegible.

cheerio,
PMc