John Krasnay <john@xxxxxxxxxx> writes: > We decided to do a point-in-time recovery, but that failed too, since > the archived WAL file 00000001000002BD00000072 was zero-length. Looking > at the logs, the archive command for this file failed at about 6:29am, > but the server continued on until later in the evening when we noticed > there was a disk space problem. > Now our problem is that we appear to have lost a whole day's worth of > data, since we can't do a PITR past the failed archive log. > The documentation says that if the archive command fails, the server > retries until it's successful, but that appears not to have happened. The archiver will retry, *if the archive command returns non-zero exit status*. It sounds to me like you're using an archive command script that dutifully logs a failure but is careless about returning the proper exit status. > Does anyone have any idea how we might recover from this? I'm afraid you're probably screwed as far as replaying any data beyond the lost WAL segment goes. Even if you forced the system to try to replay it, you'd have corrupted database state because of the omission of the changes that were in the lost segment. If you still have the original $PGDATA tree (ie you didn't blow it away while trying the PITR idea) then you might be able to get a closer approximation to current time by doing resetxlog and starting up --- though the consistency of the DB would still be questionable, so a dump and reload would be advisable. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general