Hi,
this should probably be for pgsql-hackers, but https://www.postgresql.org/list/ mentioned 'You must try elsewhere first!', and this list was second best...
I wanted to point you to this github issue: https://github.com/wal-g/wal-g/issues/1126
Basically, Postgres only knows of 3 types of return codes: 0: No problem, next WAL file... 1 - 125: End of timeline? Ok, lets stop recovery and go online >=126: Ouch, big problem. Better not proceed, but error out with a FAIL instead
Looking at https://tldp.org/LDP/abs/html/exitcodes.html exit codes beyond 125 is all OS related. Like 'Permission problem or command is not an executable', or 'Control-C is fatal error signal 2'.
I would assume that exit code 78 would be a better choice to distinguish errors for the restore_command which are not os-related, but still would be better ending in 'Ouch, big problem. Better not proceed, but error out with a FAIL instead'.
I think I will work on a fix for wal-g to better distinguish in exit codes, but since all I currently can do is exit with a code >= 126, I wanted to bring this to the postgres community too. Furthermore, this is beyond wal-g, basically for everything that runs as a restore_command... Would you consider another exit code to the list so that restore_commands don't need to exit with error codes that where meant to signal OS-level issues?
I wanted to end with this quote from the second link I pointed to:
Which to me is not just for 127, but for all exit codes beyond 125...
Thanks. |