On Thu, 7 May 2020 at 19:31, Mark Bannister <mbannister@xxxxxxxxxxxxxx> wrote: > > You're missing something, although I don't know what, because we're > definitely seeing this on 219-67 from > http://vault.centos.org/7.7.1908/os/Source/SPackages/ and it > definitely does not contain the change that you've linked to. The > only custom patch we've applied is > https://github.com/systemd/systemd/commit/cf99f8eacf1c864b19a6a02edea78c43f3185cb7. > Something else is clearly going on here. > Well, maybe you're mixing two issues into one. The original report was about session scopes failing due to a conflict of job types, which you should not be seeing on 219-67 (I have no idea how you would without the session scope having a Requires= dependency on the slice, which wouldn't be the case on that version), but it is certainly possible to get this message otherwise in other cases where two jobs conflict and one of those cannot be deleted due to the relationship constraints. I cannot tell without more context why, but it would be wise to first try and reproduce the problem with the latest systemd version to see if it is already fixed. These errors are less common now as a lot of effort has been spent on finding and fixing some of these corner cases over the years, especially after v219. > Thanks for the suggestions on how to reproduce. I managed to > reproduce it once (first time lucky) with: > > # service=$(systemd-run --uid=$RUNUSER /tmp/myscript 2>&1 | awk > '{sub("\\.$", ""); print $NF}'); systemctl --no-block stop $service & > systemctl --no-block stop user-$RUNUID.slice; sleep 1; systemctl > list-jobs > > ... where /tmp/myscript is: > > #!/bin/sh > trap '' INT TERM HUP > sleep 1d > > However, after the first success, repeated attempts failed to reproduce it. > Sigh... What you posted above is very different from what I asked you to follow (where the conflicting job was queued by logind). I can't really tell what you did there. How did you trigger the conflicting job, and what was the job mode used? If you tried logging in to see if it triggers the problem, well, this unit runs under the system instance (it very likely does, as it appears from your systemd-run invocation), it never creates any dependency on the user slice at all, and therefore does *not* keep it waiting, and hence the race is still as narrow, unless you specify --slice=user-UID.slice. Moreover, on 219-67, it would never create a conflict (given that systemctl show lists user-UID.slice under Wants= and not Requires=). Also, were you able to reproduce with the steps I listed? > While most of the time this is triggered by SSH session, I have > discovered that we have one script that is using systemd-run and which > failed to launch a command with this 'transaction is destructive' > error. Could it be that when this problem is triggered, a systemd-run > could fail? You should see which service is said to have conflicting job types (given that you applied the verbose logging patch). However, if you can't reproduce them with the latest systemd, it would be best to ask your distribution maintainers to integrate the fixes back into the systemd version they maintain. > Alas systemd is so big and so baked into everything these days that we > daren't do more than add a handful of custom patches to the version > shipped with the release. So if there is a small patch we can add > that fixes it in both 219-67 and 219-73, that would be immensely > helpful. I think the PR I mentioned before should help with the logind issue, but I can't tell if it applies cleanly to 219 (and it really only helps post 219-69+, as far I am concerned, the logind message atleast cannot appear on 219-67 (where the scope Wants= the slice); convince me by showing evidence to the contrary). If you're seeing this with other cases, those changes won't help you. -- Kartikeya _______________________________________________ systemd-devel mailing list systemd-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/systemd-devel