I am very new to postgresql and have a continous archiving server to recover the database in case of a hardware failure or pg crash. Following http://www.postgresql.org/docs/9.1/static/continuous-archiving.html, I copy the WAL archives to another machine and apply them regularly. Recently due to pg server crash at my main server I tried to restore the database by pushing a trigger (an empty file, expecting smart failover) and the logs are given below.
prodrestore_error.log
======================
WAL file not present yet. Checking for trigger file...
trigger file found: smart failover
Trigger file: /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 0000000100000001000000CC
WAL file path: /backup/prod/db_backup/archive/0000000100000001000000CC
Restoring to: pg_xlog/RECOVERYXLOG
Sleep interval: 60 seconds
Max wait interval: 0 forever
Command for restore: cp "/backup/prod/db_backup/archive/0000000100000001000000CC" "pg_xlog/RECOVERYXLOG"
Keep archive history: 000000000000000000000000 and later
trigger file found: smart failover
running restore: OK
Trigger file: /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 00000002.history
WAL file path: /backup/prod/db_backup/archive/00000002.history
Restoring to: pg_xlog/RECOVERYHISTORY
Sleep interval: 60 seconds
Max wait interval: 0 forever
Command for restore: cp "/backup/prod/db_backup/archive/00000002.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history: 000000000000000000000000 and later
running restore: cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
not restored
history file not found
Trigger file: /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 00000001.history
WAL file path: /backup/prod/db_backup/archive/00000001.history
Restoring to: pg_xlog/RECOVERYHISTORY
Sleep interval: 60 seconds
Max wait interval: 0 forever
Command for restore: cp "/backup/prod/db_backup/archive/00000001.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history: 000000000000000000000000 and later
running restore: cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
not restored
history file not found
postgresql-9.1-main.log
========================
2015-08-03 08:08:44 IST::@:[5542]:LOG: restored log file "0000000100000001000000CC" from archive
2015-08-03 08:11:44 IST::@:[5542]:LOG: could not open file "pg_xlog/0000000100000001000000CD" (log file 1, segment 205): No such file or directory
2015-08-03 08:11:44 IST::@:[5542]:LOG: redo done at 1/CC001E10
2015-08-03 08:11:44 IST::@:[5542]:LOG: last completed transaction was at log time 2015-08-03 07:59:58.121908+05:30
2015-08-03 08:11:44 IST::@:[5542]:LOG: restored log file "0000000100000001000000CC" from archive
2015-08-03 08:17:44 IST::@:[5542]:LOG: selected new timeline ID: 2
2015-08-03 08:23:45 IST::@:[5542]:LOG: archive recovery complete
2015-08-03 08:23:51 IST::@:[6088]:LOG: autovacuum launcher started
2015-08-03 08:23:51 IST::@:[5541]:LOG: database system is ready to accept connections
The whole process of looking for two history files 00000002.history and 00000001.history, 4 times and finally start the database took approximately 12 minutes. prodrestore_error.log
======================
WAL file not present yet. Checking for trigger file...
trigger file found: smart failover
Trigger file: /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 0000000100000001000000CC
WAL file path: /backup/prod/db_backup/archive/0000000100000001000000CC
Restoring to: pg_xlog/RECOVERYXLOG
Sleep interval: 60 seconds
Max wait interval: 0 forever
Command for restore: cp "/backup/prod/db_backup/archive/0000000100000001000000CC" "pg_xlog/RECOVERYXLOG"
Keep archive history: 000000000000000000000000 and later
trigger file found: smart failover
running restore: OK
Trigger file: /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 00000002.history
WAL file path: /backup/prod/db_backup/archive/00000002.history
Restoring to: pg_xlog/RECOVERYHISTORY
Sleep interval: 60 seconds
Max wait interval: 0 forever
Command for restore: cp "/backup/prod/db_backup/archive/00000002.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history: 000000000000000000000000 and later
running restore: cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
not restored
history file not found
Trigger file: /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 00000001.history
WAL file path: /backup/prod/db_backup/archive/00000001.history
Restoring to: pg_xlog/RECOVERYHISTORY
Sleep interval: 60 seconds
Max wait interval: 0 forever
Command for restore: cp "/backup/prod/db_backup/archive/00000001.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history: 000000000000000000000000 and later
running restore: cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
not restored
history file not found
postgresql-9.1-main.log
========================
2015-08-03 08:08:44 IST::@:[5542]:LOG: restored log file "0000000100000001000000CC" from archive
2015-08-03 08:11:44 IST::@:[5542]:LOG: could not open file "pg_xlog/0000000100000001000000CD" (log file 1, segment 205): No such file or directory
2015-08-03 08:11:44 IST::@:[5542]:LOG: redo done at 1/CC001E10
2015-08-03 08:11:44 IST::@:[5542]:LOG: last completed transaction was at log time 2015-08-03 07:59:58.121908+05:30
2015-08-03 08:11:44 IST::@:[5542]:LOG: restored log file "0000000100000001000000CC" from archive
2015-08-03 08:17:44 IST::@:[5542]:LOG: selected new timeline ID: 2
2015-08-03 08:23:45 IST::@:[5542]:LOG: archive recovery complete
2015-08-03 08:23:51 IST::@:[6088]:LOG: autovacuum launcher started
2015-08-03 08:23:51 IST::@:[5541]:LOG: database system is ready to accept connections
"WAL file not present yet. Checking for trigger file...
trigger file found: fast failover
Trigger file: /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 0000000100000001000000CF
WAL file path: /backup/prod/db_backup/archive/0000000100000001000000CF
Restoring to: pg_xlog/RECOVERYXLOG
Sleep interval: 60 seconds
Max wait interval: 0 forever
Command for restore: cp "/backup/prod/db_backup/archive/0000000100000001000000CF" "pg_xlog/RECOVERYXLOG"
Keep archive history: 000000000000000000000000 and later
trigger file found: smart failover
running restore: OK
and the restore took same time to check the history files.
In this case, I needed almost 14-15 minutes as it looked for 00000003.history aprt from 00000002.history and 00000001.history, even though I restarted the whole process by replacing the base backup afresh.
Am I doing something wrong here or is this was a fast failover expected to work?
3. Each iteration, I did to see the behaviour by stopping the current postgres server, deleting the previous base backup and restoring afresh added search for one more non existing history file like 00000004.history and 00000005.history, further slowing the process.
As I restore from the base backup each time, then push the trigger file after applying all WAL archives after the base backup, I would like to either ignore looking for history files altogether or control how many history files are tried. Also like to speed up starting the database in read write mode, so that I can point my application to this database and continue the business. What can I do for that ?
Thanks for all the help.
Regards,
James