On Mon, May 20, 2019 at 08:20:33PM +0300, Mariel Cherkassky wrote:
Hey Greg,
Basically my backup was made after the first pg_resetxlog so I was wrong.
Bummer.
However, the customer had a secondary machine that wasn't synced for a
month. I have all the walls since the moment the secondary went out of
sync. Once I started it I hoped that it will start recover the wals and
fill the gap. However I got an error in the secondary :
2019-05-20 10:11:28 PDT 19021 LOG: entering standby mode
2019-05-20 10:11:28 PDT 19021 LOG: invalid primary checkpoint record
2019-05-20 10:11:28 PDT 19021 LOG: invalid secondary checkpoint link in
control file
2019-05-20 10:11:28 PDT 19021 PANIC: could not locate a valid
checkpoint record
2019-05-20 10:11:28 PDT 19018 LOG: startup process (PID 19021) was
terminated by signal 6: Aborted
2019-05-20 10:11:28 PDT 19018 LOG: aborting startup due to startup
process failure
2019-05-20 10:11:28 PDT 19018 LOG: database system is shut down.
I checked my secondary archive dir and pg_xlog dir and
it seems that the restore command doesnt work. My restore_command:
restore_command = 'rsync -avzhe ssh
postgres@x.x.x.x:/var/lib/pgsql/archive/%f /var/lib/pgsql/archive/%f ;
gunzip < /var/lib/pgsql/archive/%f > %p'
archive_cleanup_command = '/usr/pgsql-9.6/bin/pg_archivecleanup
/var/lib/pgsql/archive %r'
Well, when you say it does not work, why do you think so? Does it print
some error, or what? Does it even get executed? It does not seem to be
the case, judging by the log (there's no archive_command message).
How was the "secondary machine" created? You said you have all the WAL
since then - how do you know that?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services