Quoting Tom Lane <tgl@xxxxxxxxxxxxx>: > Mischa Sandberg <mischa_sandberg@xxxxxxxxx> writes: > > Quoting Tom Lane <tgl@xxxxxxxxxxxxx>: > >> I'd bet that the pg_ctl status part is failing. I get exit status > 1 > >> from it if there's no server running. > > > Yes, that was part of the problem with the original startup > script; > > postmaster hadn't even gotten as far as writing postmaster.pid, > > I guess. But pg_ctl status returning 1 could also mean that that > the > > server had come up, hit a critical problem and exited. Hence my > problem; > > this has to detect server failure, reliably, as well. > > You could sleep for a second or so *before* you start looking for > the > pidfile. The systems are under erratic load, due to concurrent cpu and diskio spikes around start-up time. 1-2 secs is not enough to be a guarantee :-( Probably not explaining the issues well; caught between two constraints that aren't really pg's problem; and wide clusters with automated admin, variable hardware and spikes of db restarts are no doubt an oddball edge case. There are workarounds; was hoping for something clean and obvious (to all but me). Switching back to tailing the log files and moving on. Thanks everyone. -- Engineers think that equations approximate reality. Physicists think that reality approximates the equations. Mathematicians never make the connection. -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin