Re: Requested transaction contradicts existing jobs: start is destructive

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Fri, 1 May 2020 00:31:51 +0530

On Thu, 30 Apr 2020 at 23:43, Uoti Urpala <uoti.urpala@xxxxxxxxxxx> wrote:
>
> On Thu, 2020-04-30 at 22:18 +0530, Kumar Kartikeya Dwivedi wrote:
> > waiting for the stop request initiated previously to finish). Even if
> > you use fail as the job mode, the error you get back is "Transaction
> > is destructive" or so, not this.
>
> IIRC the patch he mentioned in his mail was one that changed the error
> message from "Transaction is destructive" to this.
>

I see, it seems I overlooked that part. Thanks for correcting me.

Then this is indeed the stage where it tries to apply the transaction,
rather than conflicting jobs in the same transaction, so what I said
above probably does not apply at all.

I used a test case locally to try and reproduce the problem, and it
only manifests when the JOB_FAIL job mode is used, otherwise the stop
job on the unit is cancelled and replaced with the start job.

Something like this:

# anchor.service
[Unit]
Requires=that.slice
After=that.slice
[Service]
ExecStart=/bin/true

# other.service
[Service]
ExecStart=/bin/sleep infinity
/* delay the stop a bit */
ExecStop=/bin/sleep 10

Now, for
$ systemctl start other.service --no-block; systemctl --user stop
that.slice --no-block;

$ systemctl start anchor.service --job-mode fail
fails, while
$ systemctl start anchor.service --job-mode replace

Either that or the stop job installed is irreversible (can emulate
that by using --job-mode replace-irreversibly when stopping
that.slice). So these are the only possibilities. The code hasn't
changed at all since v219 too.

My educated guess is that the session slice is taking time to clean up
because of the default stop timeout due to processes in the session
not responding to SIGTERM (it is waiting for dependent units to stop
before stopping itself by virtue of being ordered after them), and in
the meantime a new login is initiated. Now, the stop job on the user
slice is still waiting, and looking at the code in logind,
manager_start_scope uses the job mode fail, which means the start job
for the user slice won't be able to cancel the sitting stop job, hence
the transaction cannot be applied.

--
Kartikeya
_______________________________________________
systemd-devel mailing list
systemd-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/systemd-devel