[Please keep the list copied, and put your reply in-line instead
of at the top.]
Cliff de Carteret wrote:
> On 22 January 2013 16:07, Kevin Grittner <kgrittn@xxxxxxxx> wrote:
>
>> Cliff de Carteret wrote:
>>
>>> The current setup has been working successfully for several years
>>> until the recent database crash
>>
>> What file does the server log say it is trying to archive? What
>> error are you getting? Does that filename already exist on the
>> archive (or some intermediate location used by the archive command
>> or script)?
> The sever log is (repeated constantly):
>
> LOG: archive command failed with exit code 1
> DETAIL: The failed archive command was: test ! -f
> /opt/postgres/remote_pgsql/wal_archive/00000001000000A800000078 && cp
> pg_xlog/00000001000000A800000078
> /opt/postgres/remote_pgsql/wal_archive/00000001000000A800000078
> WARNING: transaction log file "00000001000000A800000078" could not be
> archived: too many failures
>
> The file 00000001000000A800000078 exists in the remote archive's
> wal_archive directory. I read a post saying to copy the file over to the
> archive and then delete the .ready file to get postgres to move onto the
> next file but this ended up logging out saying that a log file was missing.
> There are more recent files in this directory but they end at the point
> where I reverted all of the changes I made last night when time was running
> out and the database had to be put back to a known state.
I would have deleted (or renamed) the copy in the archive
directory. Archiving should have then resumed and cleaned up the
pg_xlog directory.
I have now deleted the copy on the remote wal_archive folder and the archiving is now functioning and sending the logs from the local to the remote folder. The remote database does not startup and the following is in the log:
LOG: database system was shut down in recovery at 2013-01-22 10:54:48 GMT
LOG: entering standby mode
LOG: restored log file "00000001000000AB00000051" from archive
LOG: invalid resource manager ID in primary checkpoint record
PANIC: could not locate a valid checkpoint record
LOG: startup process (PID 22350) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure
00000001000000AB00000051 is in my remote database's pg_xlog folder
Thanks for your help already!
-Kevin