Greetings, * Lucas ( root@xxxxxxx) wrote: On 26/10/2021, at 6:13 AM, Stephen Frost <sfrost@xxxxxxxxxxx> wrote:
* Mladen Gogala (gogala.mladen@xxxxxxxxx) wrote:
On 10/23/21 23:12, Lucas wrote:
This has proven to work very well for me. I had to restore a few backups already and it always worked. The bad part is that I need to stop the database before performing the Snapshot, for data integrity, so that means that I have a hot-standby server only for these snapshots. Lucas
Actually, you don't need to stop the database. You need to execute pg_start_backup() before taking a snapshot and then pg_stop_backup() when the snapshot is done. You will need to recover the database when you finish the restore but you will not lose any data. I know that pg_begin_backup() and pg_stop_backup() are deprecated but since PostgreSQL doesn't have any API for storage or file system snapshots, that's the only thing that can help you use storage snapshots as backups. To my knowledge,the only database that does have API for storage snapshots is DB2. The API is called "Advanced Copy Services" or ACS. It's documented here:
https://www.ibm.com/docs/en/db2/11.1?topic=recovery-db2-advanced-copy-services-acs
For Postgres, the old begin/stop backup functions should be sufficient.
No, it's not- you must also be sure to archive any WAL that's generated between the pg_start_backup and pg_stop_backup and then to be sure and add into the snapshot the appropriate signal files or recovery.conf, depending on PG version, to indicate that you're restoring from a backup and make sure that the WAL is made available via restore_command.
Just doing stat/stop backup is *not* enough and you run the risk of having an invalid backup or corruption when you restore.
If the entire system is on a single volume then you could possibly just take a snapshot of it (without any start/stop backup stuff) but it's very risky to do that and then try to do PITR with it because we don't know where consistency is reached in such a case (we *must* play all the way through to the end of the WAL which existed at the time of the snapshot in order to reach consistency).
In the end though, really, it's much, much, much better to use a proper backup and archiving tool that's written specifically for PG than to try and roll your own, using snapshots or not.
When I create a snapshot, the script gets the latest WAL file applied from [1] and adds that information to the Snapshot Tags in AWS. I then use that information in the future when restoring the snapshot. The script will read the tag and it will download 50 WAL Files before that and all the WAL files after that required. The WAL files are being backed up to S3.
I had to restore the database to a PITR state many times, and it always worked very well.
I also create slaves using the snapshot method. So, I don’t mind having to stop/start the Database for the snapshot process, as it’s proven to work fine for the last 5 years.
I have to say that the process used here isn't terribly clear to me (you cleanly shut down the database ... and also copy the WAL files?), so I don't really want to comment on if it's actually correct or not because I can't say one way or the other if it is or isn't. I do want to again stress that I don't recommend writing your own tools for doing backup/restore/PITR and I would caution people against people trying to use this approach you've suggested. Also, being able to tell when such a process *doesn't* work is non-trivial (look at how long it took us to discover the issues around fsync..), so saying that it seems to have worked for a long time for you isn't really enough to make me feel comfortable with it. Thanks, Stephen
I think it is worth mentioning that we’re using PG 9.2, so I don’t benefit from features like logical replication to help with backups and such. (Yes, I know it’s old and that I should upgrade ASAP. We’re working on it)
The snapshots are done this way: 1. Grab the latest applied WAL File for further references, stores that in a variable in Bash 2. Stop the Postgres process 3. Check it is stopped 4. Start the Block level EBS Snapshot process 5. Applied tags to the snapshot, such as the WAL file from Step #1 6. Wait for snapshot to complete, querying the AWS API for that 7. Start PG 8. Check it is started 9. Check it is replicating from master
The entire process takes around 10 minutes to complete.
I copy the WAL files to S3 in case I ever need to restore an old snapshot (we keep snapshots for 30 days), which won’t have the WAL files in the volume itself. Also for safety reasons, after reading this article many many times.
We’re working on a migration to PG 14 so, please, feel free to suggest another best way to achieve this, but it would be better in a new thread I reckon.
Lucas |