Re: Re: openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd

Colin Watson <cjwatson@xxxxxxxxxx> · Fri, 24 Aug 2018 18:19:09 +0100

On Fri, Aug 24, 2018 at 02:04:13PM +0200, Jochen Bern wrote:
> On 08/23/2018 07:49 PM, Peter Stuge wrote:
> > How could systemd determine whether startup of a foreground daemon
> > completed successfully or failed?
> > Other than explicit notification (like a AF_UNIX message) systemd
> > could only use time; it could wait for the daemon to exit(EXIT_FAILURE)
> > after exec() - but how long is long enough? Every answer is incorrect.
> 
> If we can agree that neither systemd nor "legacy" methods(*) of getting
> feedback from daemon processes will cease to exist just because the
> other side wishes them to hard enough, then complementing either side
> (but preferably systemd) with a (general, configurable, contrib/ subdir
> based) wrapper to translate as needed would seem a pragmatic solution.
> </€.02>
> 
> (*) PID file, lookup in the process table, check for a LISTEN, pattern
> match in a logfile, running a dedicated *client* executable / Nagios
> plugin / ${DAEMON}ctl tool for a test, throwing the daemon a
> SIGAREYOUWELL/shmem/semaphore/... request, you name it

I doubt that anyone using OpenSSH with systemd would want to use a
polling-based (and thus inefficient) hack like that when they could just
apply the tiny patch to slot in an sd_notify call between listen and
accept.  (And I definitely see the logic behind notifying the service
manager at that point; I've dealt with complex services built on top of
OpenSSH that needed to arrange the boot sequence so that they started
only once sshd was actually ready to accept connections, and without
this kind of approach they had to settle for arbitrary delays and race
conditions.)

systemd has its structural problems, but this is one thing it gets
right.  To my mind, the reasons for avoiding linking against libsystemd
with a configure-time switch are essentially political; if you're
running on a systemd-based system then it's paged in anyway so the
runtime cost is negligible, if you're not then sd_notify is already
careful to do nothing and do so cheaply, and in general I think it makes
more sense to use common code to notify the service manager than to
duplicate it.  (I still have a soft spot for the hacky "SIGSTOP yourself
and have init send you SIGCONT when it notices" approach to this problem
that we took in upstart, but I can understand why systemd preferred to
do something else.)

Obviously it's better to get patches upstream wherever possible.  But
honestly, speaking as a downstream who maintains a patch that calls
sd_notify in the right place, I'd rather have to maintain that patch
indefinitely than have a worse hack upstream that I'd then have to undo
or otherwise work around.

-- 
Colin Watson                                       [cjwatson@xxxxxxxxxx]
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev