Re: unable to attach pid to service delegated directory in unified mode after restart

Felip Moll <felip@xxxxxxxxxxx> · Mon, 14 Mar 2022 23:12:58 +0100

Hi folks. I continued with my investigation on the best way to solve my problem.
As suggested I am calling StartTransientUnit method with dbus (using libdbus), to start a new scope.
Below are my impressions.

Firing an async D-Bus packet to systemd should be hardly measurable.

But note that you can also run your main service as a service, and

then allocate a *single* scope unit for *all* your payloads. 

The main issue is the scope needs a pid attached to it. I thought that the scope could live without any process inside, but that's not happening.
So every time a user step/job finishes, my main process must take care of it, and launch the scope again on the next coming job.
There's also a race condition when a job is finishing and another one is starting up, at this point the scope can be destroyed but the main process may not realize it.

I also tried to leave the responsibility of setting up the scope to the forked process itself, which is much easier to code and cleaner because of how the software is designed.
The forked process just does the dbus call, and when the scope is ready it is moved to the corresponding cgroup (PIDFile=).

Problem number one: if other processes are in the scope, the dbus call won't work since I am using the same name all the time, e.g. slurmstepd.scope.
So I first need to check if the scope exists and if so put the new slurmstepd process inside. But we still have the race condition, if during this phase all steps ends, systemd will do the cleanup.

Problem number two, there's a significant delay since when creating the scope, until it is ready and the pid attached into it. The only way it worked was to put a 'sleep' after the dbus call and make my process wait for the async call to dbus to be materialized. This is really un-elegant.

That way

you can restart your main service unit independently of the scope

unit, but you only have to issue a single request once for allocating

the scope, and not for each of your payloads.

Yes. That is solved, I can restart slurmd now, but the other part is not true as I just explained.
I need to issue new requests every time the scope is cleaned up by systemd.

But that too means you have to issue a bus call. If you really don't

like talking to systemd this is not going to work of course, but quite

frankly, that's a problem you are making yourself, and I am not

particularly sympathetic to it.

This is not a problem, but the delay of creating a scope plus it being removed all the time is unacceptable.

My only idea now is to start a scope from the main process, adding a "sleep infinity" pid inside, and discharge anyone to ever creating or calling to dbus.
If instead I could just ask systemd to delegate a part of the tree for my processes, then everything would be solved.

Do you have any other suggestions?