Hi All,
thank you all, I sincerely appreciate your feedback.
I have done a fair amount of testing on the solution proposed by you all (not removing backup_label), and it seems to have completely addressed the issue.
This was actually introduced some time back, and I am not completely certain how it crept into our codebase. I think that at least part of the explanation lies in the fact that we are experiencing a fair amount of growth in the database size and use on some of our installations. This could be the reason why extensive testing did not show the issue back then and why we are seeing it now.
Would it make sense to log a warning in the case of a missing backup_label file, or would it be difficult to identify that situation in the code? I would be happy to dig in and develop a patch?
With regards to the package version; we *are* working with a few "stock" scenarios, where one of them is a fairly old RHEL installation. We also have centos versions that are much more updated.
Best regards, and thank you all again,
Fredrik
On 20 October 2016 at 22:38:26 +02:00, Andres Freund <andres@xxxxxxxxxxx> wrote:
On 2016-10-20 22:37:15 +0900, Michael Paquier wrote:On Thu, Oct 20, 2016 at 10:21 PM, <fredrik@xxxxxxxxxxxxx> wrote:- remove a file called backup_label, but I am not certain that this file isin fact there (any more).It is never a good idea when you are trying to restore from a backup,backup_label contains critical information when restoring from abackup, so you may finish with a corrupted data folder.And this actually seems like a likely source of these errors. Removinga backup label unfortunately causes hard to diagnose errors, becauseeverything appears to be ok as long as there's no checkpoints whiletaking the base backups (or when the control file was copied earlyenough). But as soon as a second checkpoint happens before the controlfile is copied...Fredrik, how did you end up removing the label?Greetings,Andres Freund