2016-11-29 15:21 GMT+13:00 David Steele <david@xxxxxxxxxxxxx>:
7. Postgres started to recover the wal_files and then streaming replication worked:On 11/24/16 8:05 PM, Patrick B wrote:
> hmm.. I really don't get it.
>
>
>
> If I get messages like:
>
> *cp: cannot stat '/walfiles/0000000200001AF8000000A5': No such file or
> director*y
>
> In my head, it's saying that it was unable to recover that file and,
> because of that, there is missing data.
> Even if the server is able to connect to tbe master via streaming
> replication, there might be missing data. There might be data that is
> into master but not into slave.
>
> Am I wrong? If so, I don't understand why are the wal_files for, then.
This is normal as far as it goes. Postgres keeps checking for the WAL
file that it needs to become consistent and cp keeps throwing errors
when it can't find the file (yet).
This simply demonstrates how unsuitable cp is as a recovery command. I
recommend you look at doing backup/archive with a professional tool such
as pgBackRest (http://www.pgbackrest.org) or Barman
(http://www.pgbarman.org/).
Thanks for all the replies guys.
Related to the "cp" - That's fine. I'm just doing this in a test environment. On my prod servers I use wal-e(aws s3) and bash script(locally) to do this work.
I've done some tests, that only gave me more questions. I'll put everything I done here and hopefully you guys will be able to help me to understand it.
1. Set up a pg-test-01 machine on Ubuntu 16 Server
2. installed Postgres 9.2 + contribs + libs + etc
3. Configured postgres to be a streaming replication slave (postgresql.conf and recovery.conf)
4. Configured my masterdb01 server to ship wal_files to the pg-test-01 server
5. Made a pg_basebackup on my master DB server (DB is 1TB big)
6. Recovered the basebackup on the pg-test-01 machine
LOG: streaming replication successfully connected to primary
8. Stopped postgres on the pg-test-01 server for 10 minutes
9. Started postgres on the pg-test-01 to see if the wal_files would be recovered
LOG: restored log file "0000000200001B2F0000004F" from archiveLOG: restored log file "0000000200001B2F00000050" from archiveLOG: restored log file "0000000200001B2F00000051" from archiveLOG: restored log file "0000000200001B2F00000052" from archive
10. Removed all the wal_files from pg-test-01 (No wal_files are being shipped to this server anymore)
11. Stopped postgres for 2 hours
12. Started postgres as streaming replication
cp: cannot stat 'archive/0000000200002B36000000BB': No such file or directory
cp: cannot stat 'archive/0000000200002B36000000BC': No such file or directory
cp: cannot stat 'archive/0000000200002B36000000BD': No such file or directory
LOG: streaming replication successfully connected to primary
Postgres was stopped for 2h. How can postgres connect using streaming replication, if no wal_files were in there???
Hos is that even possible?? I don't understand!
Thanks
Patrick