On Tue, Oct 25, 2011 at 11:40 AM, Manoj K P <manoj@xxxxxxxxxx> wrote:
Server log Oct 1 00:06:59 server_host_name postgres[1453]: [5-1] 2011-10-01 00:06:59.831 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration: 418583.238 ms statement: select pg_start_backup('fortnightly'); Oct 2 03:03:18 server_host_name postgres[1453]: [6-1] 2011-10-02 03:03:18.243 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration: 8034.385 ms statement: select pg_stop_backup(); In between stop and start process server_host_name is receiving all type of DML & DDL and generating new WAL file Taking base backup in between start and stop process Client Log Details Oct 25 05:16:18 client_server_name postgres[28858]: [2-1] 2011-10-25 05:16:18.202 BST 28858 LOG: could not open file "pg_xlog/00002710000047B10000008C" (log file 18353, segment 140): No such file or directory Oct 25 05:16:18 client_server_name postgres[28858]: [3-1] 2011-10-25 05:16:18.203 BST 28858 LOG: invalid checkpoint record Oct 25 05:16:18 client_server_name postgres[28858]: [4-1] 2011-10-25 05:16:18.203 BST 28858 FATAL: could not locate required checkpoint record Oct 25 05:16:18 client_server_name postgres[28858]: [4-2] 2011-10-25 05:16:18.203 BST 28858 HINT: If you are not restoring from a backup, try removing the file "/mnt/new_cluster/backup_label". Oct 25 05:16:18 client_server_name postgres[28857]: [1-1] 2011-10-25 05:16:18.205 BST 28857 LOG: startup process (PID 28858) exited with exit code 1 Oct 25 05:16:18 client_server_name postgres[28857]: [2-1] 2011-10-25 05:16:18.205 BST 28857 LOG: aborting startup due to startup process failure Oct 25 05:20:53 client_server_name postgres[29030]: [2-1] 2011-10-25 05:20:53.630 BST 29030 LOG: could not open file "pg_xlog/00002710000047B100000068" (log file 18353, segment 104): No such file or directory Oct 25 05:20:53 client_server_name postgres[29030]: [3-1] 2011-10-25 05:20:53.630 BST 29030 FATAL: could not find redo location referenced by checkpoint record Oct 25 05:20:53 client_server_name postgres[29030]: [3-2] 2011-10-25 05:20:53.630 BST 29030 HINT: If you are not restoring from a backup, try removing the file "/mnt/new_cluster/backup_label". Oct 25 05:20:53 client_server_name postgres[29029]: [1-1] 2011-10-25 05:20:53.633 BST 29029 LOG: startup process (PID 29030) exited with exit code 1 Oct 25 05:20:53 client_server_name postgres[29029]: [2-1] 2011-10-25 05:20:53.633 BST 29029 LOG: aborting startup due to startup process failure manually copy following file to pg_xlog folder 00002710000047B10000008C 00002710000047B100000068 After words i can start postgres and accessing the database , but same error
As per the logs, do you see missing XLOG files in Archive Destination ? becz these kind of situations mostly missing files will be in WAL-Archive location. You need to copy to pg_xlog directory and start the instance.
As Merlin Said, you need to dig more to know why its crashing by increasing the LOG-DEBUG level's. Increasing DEBUG level may occupy good amount of space in log-location, so make sure you have good space for logs to get what exactly happening at the time of backup in particular. Am not sure whether its safe to attach ***backtrace*** to instance for information.