On Thu, 2023-04-13 at 13:52 +0000, Simon Mullis wrote: > Hi All > > I have a fairly complex (at least to me) setup of a master target spawning multiple services and groups of instance > services that are chained in a specific order. I use systemd to manage all of the sockets that allow data to flow > between these different stages. > I use a master Target (foo.target) defined to manage the services state, so I can easily stop and restart everything. > The first service (bar.service) is oneshot script that starts multiple groups of instance services (the number of > spawned services depends on CPU cores and queue sizes among other things). I have ExecStart and ExecStop scripts in > the unit file. > For example: bar.service - This is the oneshot that spawns "n" baz@.service and "n" qux@.service. There are a lot of > dependencies and so far systemd has done everything I need. > > What do I want? > If there is any failure or issue with any of the child processes spawned from any of the instance units then I would > like the whole fragile house of cards to be torn down and restarted. i.e. the whole foo.target system state to be > restarted, not just the individual instance service (and subsequent process) itself. Note: I'm writing this last, in might be better to see other comments below first. If you're configuring foo.target BindsTo=bar@%i.service _and_ BindsTo=bar@%N.service, that might explain bar@foo.service. man says BindsTo behaviour is similar to that of Requires, which in turn is similar to Wants, for which man has: >> Units listed in this option will be started if the configuring unit is. Now, when you systemctl start foo.target, it has empty %i, BindsTo=bar@%i.service resolves to just BindsTo=bar@.service, which doesn't start anything because bar@.service is template (I wonder why systemd doesn't error in this case). BindsTo=bar@%N.service, however, resolves to BindsTo=bar@foo.service (we're in foo.target), and that systemd can start. IMO, proper setup would be to have: 1) PartOf=foo.target in bar@ and qux@ templates; 2) foo.service: 2.1) running during "configuration" phase (at start-up?); 2.2) generating foo.target override with BindsTo= all bar@ and qux@ instances the are wanted for this node; 2.3) invoking daemon-reload; 2.4) (re)starting foo.target; 3) OnFailure=foo.service in foo.target, for automatic restarts when a bar@ or qux@ fails. This should work so that when you restart (voluntary or not) foo.target, you got bar@s and qux@s stopped due to them being PartOf=foo.target, and started back up due to being BoundBy=foo.target (synthetic setting). I'm not sure, but you might want Before=foo.target in bar@ and qux@ to avoid race between them being stopped and foo.service trying to restart foo.target. > What are my observations? > This all works well except for the instance units. When I include the instance units into the "BindsTo" with the > target, I get additional processes and services launched that I do not expect. > > I have simplified the whole thing to two services, a target and a very simple script. This demonstrates exactly the > same thing that I see in the much more complex version. > > The "master service". This is a oneshot that spawns the instance units. > foo.service > [Unit] > Description=Foo service > BindsTo=foo.target > [Service] > Type=oneshot > ExecStart=some-path-somewhere/foo-start.sh > [Install] > WantedBy=foo.target The above WantedBy is useless, because foo.target directly Requires=foo.service. Though, I bet, you haven't systemctl enabled foo.service. > The instance unit that does the actual work. In this case we have a placeholder to show the problem. In my real > example I have a long chain of services like this that uses systemd managed sockets to pass data along. > bar@.service > [Unit] > Description=Test service Bar instance %i > BindsTo=foo.target > [Service] > Type=simple > ExecStart=sh -c 'while true; do echo Bar %i is alive; sleep 3; done' > [Install] > WantedBy=foo.target I think the above WantedBy doesn't match your intentions. Target wants (actually, requires) foo.service (launcher), and that service decides what bar@ and qux@ instances are wanted. > > The target that allows me to stop and restart everything easily: > [Unit] > Description=Test Services > Requires=foo.service > [Install] > WantedBy=multi-user.target > > > And finally the script called in foo.service: > foo-start.sh > #!/usr/bin/bash > num=4 > eval systemctl start bar@{1..${num}}.service > > In order to tightly couple the processes and services I use BindsTo. But I am getting inconsistent behavior when > trying to apply this to the instance units from the target. Please pardon my pedantry, but it is systemd's support of service units, what tightly couples OS processes and state of those service units. Did you want to tell something about tightly coupling instances of bar@-template services and qux@- template services? > Scenario A: > WITHOUT BindsTo for the instance units in the target: > - Everything stops and starts with the target. > - I get the correct number of processes. > - If I kill one of the PIDs below, systemd only restarts that process - which of course is what most use-cases would > require. > # ps -ef | grep [Bb]ar > root 17878 1 0 15:25 ? 00:00:00 sh -c while true; do echo Bar 1 is alive; sleep 3; done > root 17880 1 0 15:25 ? 00:00:00 sh -c while true; do echo Bar 2 is alive; sleep 3; done > root 17882 1 0 15:25 ? 00:00:00 sh -c while true; do echo Bar 3 is alive; sleep 3; done > root 17887 1 0 15:25 ? 00:00:00 sh -c while true; do echo Bar 4 is alive; sleep 3; done > > # systemctl list-units bar@\*.service > UNIT LOAD ACTIVE SUB DESCRIPTION > bar@1.service loaded active running Test service Bar instance 1 > bar@2.service loaded active running Test service Bar instance 2 > bar@3.service loaded active running Test service Bar instance 3 > bar@4.service loaded active running Test service Bar instance 4 > > Scenario B: > WITH BindsTo in the unit instance file (BindsTo=bar@%i.service or BindsTo=bar@%N.service): This sounds like having [Unit] Description=Test service Bar instance %i BindsTo=foo.target ###↓↓↓↓↓THIS↓↓↓↓ BindsTo=bar@%i.service [Service] Type=simple ExecStart=sh -c 'while true; do echo Bar %i is alive; sleep 3; done' [Install] WantedBy=foo.target Or is this BindsTo= actually in qux@.service? > - Everything stops and start with the target. > - i get EXTRA PROCESSES. > - If I kill one of the PIDs below, everything restarts properly (i.e. the whole target) and I get the behavior I am > looking for. > root 29250 1 0 16:08 ? 00:00:00 sh -c while true; do echo Bar foo is alive; sleep 3; done #<<<< > What's this guy doing here? > root 29256 1 0 16:08 ? 00:00:00 sh -c while true; do echo Bar 1 is alive; sleep 3; done > root 29258 1 0 16:08 ? 00:00:00 sh -c while true; do echo Bar 2 is alive; sleep 3; done > root 29260 1 0 16:08 ? 00:00:00 sh -c while true; do echo Bar 3 is alive; sleep 3; done > root 29262 1 0 16:08 ? 00:00:00 sh -c while true; do echo Bar 4 is alive; sleep 3; done > > So, however systemd is expanding the variables %i or %N, it's including an additional service. > > # systemctl list-units bar@\*.service > UNIT LOAD ACTIVE SUB DESCRIPTION > bar@1.service loaded active running Test service Bar instance 1 > bar@2.service loaded active running Test service Bar instance 2 > bar@3.service loaded active running Test service Bar instance 3 > bar@4.service loaded active running Test service Bar instance 4 > bar@foo.service loaded active running Test service Bar instance foo #<<<< Here he is again! ???? > > Does anyone have any suggestions? Is there a more elegant way to connect the processes to the whole target for > restart purposes? If I were you, I'd probably try to remove useless WantedBy and then try to visualize dependencies with something from here: https://bbs.archlinux.org/viewtopic.php?id=163286 Also, if you run systemctl show bar@foo.service when you can see that, it might provide some insight of what caused it to start. > Maybe this is a bug in my version of systemd but more likely I'm doing something wrong. > > Version info: > # systemctl --version > systemd 247 (247.3-7+deb11u1) > +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP > +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified > > Thank you for reading this far and thank you also in advance for any suggestions. > > Cheers! >