My experiences with KillUserProcesses=yes on F24

Jason L Tibbitts III <tibbs@xxxxxxxxxxx> · Wed, 31 Aug 2016 09:25:22 -0500

After reading (some of) the "discussion" about systemd-logind's
KillUserProcesses setting, I decided that I'd like to try enabling it
and see how it works and how I can make it useful in my environment.
Sorry, it's long, but I though folks might want to know.

Disclaimer:
I quite like systemd and I will try to avoid strong language like
"bizarre", "surprising", "WTF" and except in one case, "bug".  This is
all based on systemd 229 in F24; I know 230 changes one thing and maybe
it or future versions will change enough that the issues in this
document go away.

Background:
I run a university department network a bunch of desktops and a pile of
other machines accessed by a few hundred users for various functions.
Around 200 machines total.  Where appropriate, I'd like to make sure
that user sessions get shut down properly even when sessions hang or get
confused at startup, but I'd also like to preserve the possibility of
users running background jobs on the desktops (which are often quite
powerful and useful for computation) even after they log out.

So:
I enabled KillUserProcesses on a desktop and rebooted.  Some testing
shows that it works as documented and kills processes at logout.  But as
with any change, I ran into some problems.

Problem 1:
Need to give users a way to run persistent background jobs without
requiring them to learn systemd-run.

Solution 1:
Provide wrappers in /usr/local/bin for nohup, screen and a couple of
local utilities which use nice and nohup internally.  Those wrappers
call systemd-run --user --scope.

Problem 2:
Even under systemd-run --user --scope, things don't persist unless
"linger" is enabled for the user.

Solution 2:
Add loginctl enable-linger to the wrappers.

Problem 3:
loginctl enable-linger requires root privs (in F24's systemd 229; seems
that 230 allows self-lingering by default).

Solution 3:
Add this in /etc/polkit-1/rules.d/50-user-linger.rules:

polkit.addRule(function(action, subject) {
    if (action.id == "org.freedesktop.login1.set-user-linger") {
        return polkit.Result.YES;
    }
});

Now users can run loginctl enable-linger.  For any user.  Oh, well, I
can live with that for now.  And things work!  But....

Problem 4:
"linger" does more than just allow sessions to persist.

A) It changes the way the "user manager" for that user runs.  Without
   linger, a user manager is started when the log in, and it goes away
   when their last process exits (which given KillUserProcesses is
   whenever they log out).  When linger is enabled for a user (in
   many curcumstances, at least), a user manager is immediately started
   if the user isn't already logged in which will never exit.  This has
   interesting consequences.

   I've found that if the user's user manager dies (for any reason you
   might choose) and linger is enable for them, a new one won't be
   started at login.  They have to disable linger, log out, and log back
   in.  Or reboot the machine.

B) It enables "user units" (probably the wrong terminology) which let
   users start up things which run periodically, or at boot, and which
   run under their UIDs.  In order to make this work, the user manager
   will run immediately at system boot time and look in
   ~/.config/systemd for units.

These seem like good ideas, except that:

i) I don't necessarily want users to be able to start units at boot
   time, but that's OK because....

ii) I have home dirs on kerberized NFS.  So this automounts the home
    directories of every lingering user at boot time (which is a problem
    for me) and that directory can't be read anyway, even by the proper
    UID, if the user doesn't have a kerberos ticket.  Plus the network
    isn't necessarily up to the point where user homedirs can be
    accessed at the time when systemd starts the user managers.
    Fortunately in the default case, it doesn't hold open a reference to
    the homedir so the automounter will remove it.  It can still cause a
    lot of mounts, which can take some time and puts more load on my
    already loaded file servers.

    The bottom line there is user sessions aren't going to work in some
    environments, period.

Solution4A:
I don't have one....  If I'm the one killing their processes (for
whatever reason I might have), I have to make sure to disable lingering
for them at the same time.  But if they kill their processes (kill -9 -1
to just terminate everything, if you want to test) then they're
screwed.  They have to disable linger, log out and log back in so that a
user manager is created for them.  Then they can enable linger again.
This just has to be a bug, so I've filed:
https://bugzilla.redhat.com/show_bug.cgi?id=1371721

Solution4B:
Add the following to /etc/systemd/system/delete-lingers.service:

[Unit]
Description=Delete lingering users
Before=network.target

[Service]
Type=oneshot
ExecStart=/usr/bin/rm -rf /var/lib/systemd/linger

[Install]
WantedBy=multi-user.target

Now at each boot, no users are set to linger and no user managers will
be started until login.  The wrapper scripts will re-enable that as
necessary so this doesn't hurt anything.

Conclusion: I can make this work, mostly, unless someones user manager
happens to die.  But really, this is all an unpleasant hack,
necessitated by a mismatch between systemd's design and what I'm trying
to accomplish.  And I don't think that what I'm trying to accomplish is
particularly unreasonable.

What I wish for is for some "property", let's call it "persist" to be
"attached" to a scope in some way (presumably a flag to systemd-run)
which does nothing other than to indicate that the scope will continue
to run after the user's session has terminated.  This wouldn't be a
persistent user setting.  Nothing would start at boot.  The user manager
would start up if necessary at login (even if it had previously been
killed) and persist until all user processes in any scope have exited.

I am perfectly happy with wrappers around programs which indicate that
something is to persist after logins so my users don't have to learn
systemd-run.  I don't think systemd itself needs to know or care which
processes should persist.  I don't care if those wrappers are in the
base system.  If things like nohup or screen are patched to do this
automatically, I'm happy with that but it makes no difference to me.

Hope this is interesting to someone and adds useful content to the
discussion.

 - J<
--
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://lists.fedoraproject.org/admin/lists/devel@xxxxxxxxxxxxxxxxxxxxxxx