Search Postgresql Archives

how to check that recovery is complete

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, 

I have a workflow where I recover from PITR backup and run a query on it. The program that runs query 
checks that it can connect to database in a loop, until it can, and then runs the query. 
This has worked fine far. Recently I upgraded to 11 and I see that I can connect to DB while recovery is 
not complete yet. See this:

< 2020-11-05 03:34:36.114 CST  >LOG:  starting archive recovery
< 2020-11-05 03:34:36.590 CST  >LOG:  restored log file "00000001000002EF000000F9" from archive
< 2020-11-05 03:34:36.641 CST  >LOG:  redo starts at 2EF/F9000028
...
< 2020-11-05 03:34:46.392 CST  >LOG:  restored log file "00000001000002F000000008" from archive
< 2020-11-05 03:34:46.658 CST 127.0.0.1 >FATAL:  the database system is starting up
< 2020-11-05 03:34:47.028 CST  >LOG:  restored log file "00000001000002F000000009" from archive

You can see above fail to connect , but sometime into recover I see;

< 2020-11-05 04:07:51.987 CST  >LOG:  restored log file "00000001000002F200000029" from archive
< 2020-11-05 04:08:23.195 CST 127.0.0.1 >ERROR:  canceling statement due to conflict with recovery
< 2020-11-05 04:08:23.195 CST 127.0.0.1 >DETAIL:  User query might have needed to see row versions that must be removed.
< 2020-11-05 04:08:23.195 CST 127.0.0.1 >STATEMENT:  select count(*) from file
< 2020-11-05 04:08:23.195 CST  >FATAL:  terminating connection due to administrator command
< 2020-11-05 04:08:23.195 CST  >STATEMENT:  select count(*) from file
< 2020-11-05 04:08:23.195 CST  >FATAL:  terminating connection due to administrator command
< 2020-11-05 04:08:23.195 CST  >STATEMENT:  select count(*) from file
< 2020-11-05 04:08:23.232 CST  >LOG:  background worker "parallel worker" (PID 13577) exited with exit code 1
< 2020-11-05 04:08:23.244 CST  >LOG:  background worker "parallel worker" (PID 13578) exited with exit code 1
< 2020-11-05 04:08:23.244 CST 127.0.0.1 >FATAL:  terminating connection due to conflict with recovery
< 2020-11-05 04:08:23.244 CST 127.0.0.1 >DETAIL:  User query might have needed to see row versions that must be removed.
< 2020-11-05 04:08:23.244 CST 127.0.0.1 >HINT:  In a moment you should be able to reconnect to the database and repeat your command.
< 2020-11-05 04:08:25.354 CST  >LOG:  restored log file "00000001000002F20000002A" from archive
< 2020-11-05 04:08:55.555 CST 127.0.0.1 >ERROR:  canceling statement due to conflict with recovery
< 2020-11-05 04:08:55.555 CST 127.0.0.1 >DETAIL:  User query might have needed to see row versions that must be removed.
< 2020-11-05 04:08:55.555 CST 127.0.0.1 >STATEMENT:  select count(*) from file
< 2020-11-05 04:08:55.556 CST  >FATAL:  terminating connection due to administrator command
< 2020-11-05 04:08:55.556 CST  >STATEMENT:  select count(*) from file
< 2020-11-05 04:08:55.561 CST  >FATAL:  terminating connection due to administrator command
< 2020-11-05 04:08:55.561 CST  >STATEMENT:  select count(*) from file
< 2020-11-05 04:08:55.640 CST  >LOG:  background worker "parallel worker" (PID 13683) exited with exit code 1
< 2020-11-05 04:08:55.653 CST  >LOG:  background worker "parallel worker" (PID 13684) exited with exit code 1
< 2020-11-05 04:08:55.653 CST 127.0.0.1 >FATAL:  terminating connection due to conflict with recovery
< 2020-11-05 04:08:55.653 CST 127.0.0.1 >DETAIL:  User query might have needed to see row versions that must be removed.
< 2020-11-05 04:08:55.653 CST 127.0.0.1 >HINT:  In a moment you should be able to reconnect to the database and repeat your command.
< 2020-11-05 04:09:00.307 CST  >LOG:  restored log file "00000001000002F20000002B" from archive

As you can see a query "select count(*) from file" failed due to table not being restored yet. BUT connection was allowed 
before DB was ready.... Only few hours after ;

< 2020-11-05 09:31:30.319 CST  >LOG:  archive recovery complete
< 2020-11-05 09:34:51.729 CST  >LOG:  database system is ready to accept connections

After which the query runs fine without errors. This is bad because I see that select count(*) takes progressively 
longer to execute and the count(*) is not what I am interested in. I run the "real" query after that. As a result I add
hours to program execution time. 

Is there a more robust method to it? Ideally I do not want to be able to connect to db until :

< 2020-11-05 09:31:30.319 CST  >LOG:  archive recovery complete
< 2020-11-05 09:34:51.729 CST  >LOG:  database system is ready to accept connections


And I believe this was the behavior before upgrade. If connection can't be disabled, how can I detect 
condition "database system is ready to accept connections"

I believe "pg_isready" utility would succeed once it can connect. And as can see I could 
connect way before DB is really ready. 

Thanks!
Dmitry






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux