On Tue, Oct 25, 2011 at 2:41 AM, Raghavendra <raghavendra.rao@xxxxxxxxxxxxxxxx> wrote: > On Tue, Oct 25, 2011 at 11:40 AM, Manoj K P <manoj@xxxxxxxxxx> wrote: >> >> Server log Oct 1 00:06:59 server_host_name postgres[1453]: [5-1] >> 2011-10-01 00:06:59.831 EDT 1453 4e869041.5ad postgres [local] postgres LOG: >> duration: 418583.238 ms statement: select pg_start_backup('fortnightly'); >> Oct 2 03:03:18 server_host_name postgres[1453]: [6-1] 2011-10-02 >> 03:03:18.243 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration: >> 8034.385 ms statement: select pg_stop_backup(); In between stop and start >> process server_host_name is receiving all type of DML & DDL and generating >> new WAL file Taking base backup in between start and stop process Client Log >> Details Oct 25 05:16:18 client_server_name postgres[28858]: [2-1] 2011-10-25 >> 05:16:18.202 BST 28858 LOG: could not open file >> "pg_xlog/00002710000047B10000008C" (log file 18353, segment 140): No such >> file or directory Oct 25 05:16:18 client_server_name postgres[28858]: [3-1] >> 2011-10-25 05:16:18.203 BST 28858 LOG: invalid checkpoint record Oct 25 >> 05:16:18 client_server_name postgres[28858]: [4-1] 2011-10-25 05:16:18.203 >> BST 28858 FATAL: could not locate required checkpoint record Oct 25 05:16:18 >> client_server_name postgres[28858]: [4-2] 2011-10-25 05:16:18.203 BST 28858 >> HINT: If you are not restoring from a backup, try removing the file >> "/mnt/new_cluster/backup_label". Oct 25 05:16:18 client_server_name >> postgres[28857]: [1-1] 2011-10-25 05:16:18.205 BST 28857 LOG: startup >> process (PID 28858) exited with exit code 1 Oct 25 05:16:18 >> client_server_name postgres[28857]: [2-1] 2011-10-25 05:16:18.205 BST 28857 >> LOG: aborting startup due to startup process failure Oct 25 05:20:53 >> client_server_name postgres[29030]: [2-1] 2011-10-25 05:20:53.630 BST 29030 >> LOG: could not open file "pg_xlog/00002710000047B100000068" (log file 18353, >> segment 104): No such file or directory Oct 25 05:20:53 client_server_name >> postgres[29030]: [3-1] 2011-10-25 05:20:53.630 BST 29030 FATAL: could not >> find redo location referenced by checkpoint record Oct 25 05:20:53 >> client_server_name postgres[29030]: [3-2] 2011-10-25 05:20:53.630 BST 29030 >> HINT: If you are not restoring from a backup, try removing the file >> "/mnt/new_cluster/backup_label". Oct 25 05:20:53 client_server_name >> postgres[29029]: [1-1] 2011-10-25 05:20:53.633 BST 29029 LOG: startup >> process (PID 29030) exited with exit code 1 Oct 25 05:20:53 >> client_server_name postgres[29029]: [2-1] 2011-10-25 05:20:53.633 BST 29029 >> LOG: aborting startup due to startup process failure manually copy following >> file to pg_xlog folder 00002710000047B10000008C 00002710000047B100000068 >> After words i can start postgres and accessing the database , but same >> error > > As per the logs, do you see missing XLOG files in Archive Destination ? becz > these kind of situations mostly missing files will be in WAL-Archive > location. You need to copy to pg_xlog directory and start the instance. > > As Merlin Said, you need to dig more to know why its crashing by increasing > the LOG-DEBUG level's. Increasing DEBUG level may occupy good amount of > space in log-location, so make sure you have good space for logs to get what > exactly happening at the time of backup in particular. Am not sure whether > its safe to attach ***backtrace*** to instance for information. yeah. also, what's the setting of archive_command (or is it even set)? taking a 'hot' filesystem backup without having an archive_command and not doing any other intervention to guarantee the necessary WAL segments are present will not give you a complete backup. my money is on you having an invalid backup procedure. the only way to take a filesystem snapshot without dealing with WAL files is to bring the database down. merlin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general