On Mon, Jun 15, 2020 at 11:44:33AM +0200, Laurenz Albe wrote: ! On Sat, 2020-06-13 at 19:48 +0200, Peter wrote: ! > ! > 4. If, by misconfiguration and/or operator error, the backup system ! > ! > happens to start a second backup. in parallel to the first, ! > ! > then do I correctly assume, both backups will be rendered ! > ! > inconsistent while this may not be visible to the operator; and ! > ! > the earlier backup would be flagged as apparently successful while ! > ! > carrying the wrong (later) label? ! > ! ! > ! If you are using my scripts and start a second backup while the first ! > ! one is still running, the first backup will be interrupted. ! > ! > This is not what I am asking. It appears correct to me, that, on ! > the database, the first backup will be interrupted. But on the ! > tape side, this might go unnoticed, and on completion it will ! > successfully receive the termination code from the *SECOND* ! > backup - which means that on tape we will have a seemingly ! > successful backup, which ! > 1. is corrupted, and ! > 2. carries a wrong label. ! ! That will only happen if the backup that uses my scripts does the ! wrong thing. Yes. Occasionally software does the wrong thing, it's called "bugs". ! An example: ! ! - Backup #1 calls "pgpre.sh" ! - Backup #1 starts copying files ! - Backup #2 calls "pgpre.sh". ! This will cancel the first backup. ! - Backup #1 completes copying files. ! - Backup #1 calls "pgpost.sh". ! It will receive an error. ! So it has to invalidate the backup. ! - Backup #2 completes copying files. ! - Backup #2 calls "pgpost.sh". ! It gets a "backup_label" file and completes the backup. That's not true. Now let me see how to compile a bash... and here we go: ! An example: ! ! - Backup #1 calls "pgpre.sh" > $ ./pgpre.sh > backup starting location: 1/C8000058 > $ We now have: > 24129 10 SJ 0:00.00 /usr/local/bin/bash ./pgpre.sh > 24130 10 SJ 0:00.00 /usr/local/bin/bash ./pgpre.sh > 24131 10 SJ 0:00.01 psql -Atq > 24158 10 SCJ 0:00.00 sleep 5 And: > postgres=# \d > List of relations > Schema | Name | Type | Owner > --------+--------+-------+---------- > public | backup | table | postgres > (1 row) > > postgres=# select * from backup; > id | state | pid | backup_label | tablespace_map > ----+---------+-------+--------------+---------------- > 1 | running | 24132 | | > (1 row) ! - Backup #1 starts copying files Let's suppose it does now. ! - Backup #2 calls "pgpre.sh". > $ ./pgpre.sh > backup starting location: 1/C9000024 > $ FATAL: terminating connection due to administrator command > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > connection to server was lost > Backup failed > ./pgpre.sh: line 93: ${PSQL[1]}: ambiguous redirect > > $ echo $? > 0 ! This will cancel the first backup. Yes, it seems it did: > 25279 10 SJ 0:00.00 /usr/local/bin/bash ./pgpre.sh > 25280 10 IWJ 0:00.00 /usr/local/bin/bash ./pgpre.sh > 25281 10 SJ 0:00.01 psql -Atq > 25402 10 SCJ 0:00.00 sleep 5 > postgres=# \d > List of relations > Schema | Name | Type | Owner > --------+--------+-------+---------- > public | backup | table | postgres > (1 row) > > postgres=# select * from backup; > id | state | pid | backup_label | tablespace_map > ----+---------+-------+--------------+---------------- > 1 | running | 25282 | | > (1 row) ! - Backup #1 completes copying files. ! - Backup #1 calls "pgpost.sh". > $ ./pgpost.sh > START WAL LOCATION: 1/C9000024 (file 0000000100000001000000C9) > CHECKPOINT LOCATION: 1/C9000058 > BACKUP METHOD: streamed > BACKUP FROM: master > START TIME: 2020-06-15 14:09:41 CEST > LABEL: 2020-06-15 14:09:40 > START TIMELINE: 1 > > $ echo $? > 0 ! It will receive an error. ! So it has to invalidate the backup. Where is the error? What we now have is this: No processes anymore. > id | state | pid | backup_label | tablespace_map > ----+----------+-------+----------------------------------------------------------------+---------------- > 1 | complete | 25282 | START WAL LOCATION: 1/C9000024 (file 0000000100000001000000C9)+| > | | | CHECKPOINT LOCATION: 1/C9000058 +| > | | | BACKUP METHOD: streamed +| > | | | BACKUP FROM: master +| > | | | START TIME: 2020-06-15 14:09:41 CEST +| > | | | LABEL: 2020-06-15 14:09:40 +| > | | | START TIMELINE: 1 +| > | | | | > (1 row) ! - Backup #2 completes copying files. ! - Backup #2 calls "pgpost.sh". ! It gets a "backup_label" file and completes the backup. Wishful thinking. BOTH backups are now inconsistent, and the first got the label from the second, and appears to be intact. Exactly as I said before. I don't need to try such things out. I can do logical verification in my mind, by looking at the code. And on the same foundation I am saying that this whole new API is a misconception. cheerio, PMc