Requested transaction contradicts existing jobs: start is destructive

Mark Bannister <mbannister@xxxxxxxxxxxxxx> · Mon, 4 May 2020 11:31:46 +0100

On Thu Apr 30 19:01:51 UTC 2020, Kumar Kartikeya Dwivedi <memxor at
gmail.com> wrote:
> My educated guess is that the session slice is taking time to clean up
> because of the default stop timeout due to processes in the session
> not responding to SIGTERM (it is waiting for dependent units to stop
> before stopping itself by virtue of being ordered after them), and in
> the meantime a new login is initiated. Now, the stop job on the user
> slice is still waiting, and looking at the code in logind,
> manager_start_scope uses the job mode fail, which means the start job
> for the user slice won't be able to cancel the sitting stop job, hence
> the transaction cannot be applied.

When you say the session slice is taking time to clean up, please
excuse my ignorance as
I'm not really up to speed on how session slices are being managed by
systemd, but why
would it matter if a slice takes time to clean up?  Is there a limit
on how many SSH session
slices a user can have or something?  I don't see how that would cause
this particular error.

This is happening a few times a day.  Looking at the logs, we have SSH
sessions opening
and closing successfully all day long for various different accounts.
Occasionally we see
the problematic error message.  I've correlated with other messages
and discovered that
at the same time that we get this in /var/log/messages (the original
reported error):

2020-05-03T16:09:38.735265-04:00 jupiter systemd[1]: Requested
transaction contradicts existing jobs: Transaction for
session-240481.scope/start is destructive (user-17132.slice has 'stop'
job queued, but 'start' is included in transaction).
2020-05-03T16:09:38.735430-04:00 jupiter systemd-logind[1588]: Failed
to start session scope session-240481.scope: Transaction for
session-240481.scope/start is destructive (user-17132.slice has 'stop'
job queued, but 'start' is included in transaction).
2020-05-03T16:09:38.735642-04:00 jupiter systemd[1]: Removed slice
User Slice of jupiter-user.

... we also see this message in /var/log/secure:

020-05-03T16:09:38.737122-04:00 jupiter sshd[11031]:
pam_systemd(sshd:session): Failed to create session: Resource deadlock
avoided
2020-05-03T16:09:38.737380-04:00 jupiter sshd[11031]:
pam_unix(sshd:session): session opened for user jupiter-user by
(uid=0)

Does this 'Resource deadlock avoided' message from pam_systemd help
identify the root
cause, or is that just a side-effect?

Thanks
Mark
_______________________________________________
systemd-devel mailing list
systemd-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/systemd-devel