Thanks for the email. It helped and after going through the email and the doc, I realized that the "backup" file had the wrong information, or rather I had the wrong backup files. That will do the kind of errors I have seen. However, I do have one question, I am setting this up as part of the HA process. The standby is a "hot" standby. Now, if the primary fails how do I tell the secondary that come out of recovery mode and move the recovery.conf to recovery.done and start the db. I mean, what error code shall I return? If I return a non-numeric error code, I get the following result [from serverlog]: ==== 00000001000000000000001B pg_xlog/RECOVERYXLOG LOG: restored log file "00000001000000000000001B" from archive 00000001000000000000001C pg_xlog/RECOVERYXLOG [Main: Triggering Recovery!!!] <---- My script detected that it needs to trigger recovery... LOG: could not open file "pg_xlog/00000001000000000000001C" (log file 0, segment 28): No such file or directory LOG: redo done at 0/1B000070 00000001000000000000001B pg_xlog/RECOVERYXLOG Main: Triggering Recovery!!! <--- My script is called again and the script says trigger recovery PANIC: could not open file "pg_xlog/00000001000000000000001B" (log file 0, segment 27): No such file or directory LOG: startup process (PID 32167) was terminated by signal 6 LOG: aborting startup due to startup process failure ==== This is what my script is doing: if ( triggerRecovery() ) { print "Main: Triggering Recovery!!! \n"; return 1; } So, the question is, on detecting that the primary is down and to trigger recovery, what error code shall I return? Or do I have to move the recovery.conf to recovery.done myself and restart the db? Regards Dhaval On 3/20/07, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
"Dhaval Shah" <dhaval.shah.m@xxxxxxxxx> writes: > What am I doing wrong? Lying to the server. If you don't have the requested file, return failure, don't invent something. There are a number of cases where the recovery process asks for files that are quite likely not to exist. > If I indicate that I do not have the concerned file by returning error > code 1, I get the following error in the log: This may indicate that you have an incomplete backup :-(. It's hard to tell from this much info though. What is in pg_control (use pg_controldata to dump) and what is in the backup_label file (that's plain text)? What WAL segment files do you actually have? regards, tom lane
-- Dhaval Shah