Re: postgresql systemd service fails to start only on boot but not manually

Doron Behar <doron.behar@xxxxxxxxx> · Sun, 23 Sep 2018 13:55:05 +0300

On Sat, Sep 22, 2018 at 04:58:18PM +0200, Christoph Moench-Tegeder wrote:
> ## Doron Behar (doron.behar@xxxxxxxxx):
> 
> > My server fails to start PostgreSQL only on boot, if I restart it
> > manually afterwards it doesn't have any problem starting. Here is the
> > log extracted from the journal:
> > 
> > ```
> > 2018-09-21 20:46:40.028 CEST [306] LOG:  listening on IPv4 address "127.0.0.1", port 5432
> > 2018-09-21 20:46:40.036 CEST [306] LOG:  listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
> > 2018-09-21 20:46:40.233 CEST [337] LOG:  database system was shut down at 2018-09-21 20:46:21 CEST
> > 2018-09-21 20:48:10.441 CEST [352] WARNING:  worker took too long to start; canceled
> > 2018-09-21 20:49:10.469 CEST [352] WARNING:  worker took too long to start; canceled
> 
> This would indicate that your machine is overloaded during start -
> perhaps there's just too much being started at the same time?
> ObRant: that's what happens if people take "system startup duration"
> as a benchmark and optimize for that - sure, running one clumsy shell
> script after another isn't effective usage of today's systems,
> but starting eight dozens programs all at once may have other
> side effects. Really, with the hardware taking small ages to find
> it's own arse before even loading the boot loader, those few seconds
> weren't worth optimizing - and if people reboot their computers so
> often that startup time takes a measurable toll on their productive
> day, perhaps they should rather spend their time thinking about their
> usage pattern than "optimizing" the startup process.
> 
> So, now that I've got that off my chest... your machine propably tries to
> do too much at the same time when booting: the worker processes take
> longer than 90 seconds to start. Slow CPU or storage maybe?
> 
> > 2018-09-21 20:49:10.478 CEST [306] LOG:  database system is ready to accept connections
> > 2018-09-21 20:49:10.486 CEST [306] LOG:  received fast shutdown request
> 
> And in the mean time, systemd has lost it's patience, declares the
> start as failed and terminates the process group. (The default systemd
> timeout is 90 seconds, at least in some releases of systemd, so
> this fits quite nicely).
> 
> You could try to work around this by increasing TimeoutStartSec
> in postgresql's systemd unit (or even globally), which perhaps
> only hides the problem until the next service suddenly doesn't
> start anymore.
> You could move postgresql to the end of the boot order by
> adding "After=..." to the Unit section of the systemd service
> file, the value behind "After=" being all the other services in
> the same target, which should reduce parallelism and improve
> PostgreSQL's startup behaviour.
> A more advanced variant of that would be to create a new
> systemd target, make that start "After" multiuser.target
> or even graphical.target (depending on your setup), make sure
> it "Requires" the current default systemd target and make
> postgresql the only additional service in that target.
> (This would be the cleanest solution, but you should get some
> grasp of systemd and how your specific distribution uses it
> before meddling with the default targets; I don't know every
> distribution/version variant of systemd integration, so I
> can't give that specific instructions here).
> Or you figure out what the heck your machine is running
> during startup any why it is that slow, and try to fix that.
> 
> Regards,
> Christoph

Thanks for your very detailed answer, that helped me a lot. I've
increased `TimeoutSec=` to infinity in the systemd service since it was
set initially to 120 seconds which apparently wasn't enough for my poor
VPS with 2G RAM and 1 CPU core. That worked great, I still feel like I
have slow startups but at least PostgreSQL doesn't totally fail to start
on boot.

I'll try to debug the slow startups on my own, thanks again for
everything!

Doron.