idle home NFS gets unmounted although user is still logged in

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

we've a problem with a user session getting killed by systemd although
it shouldn't as far as we understand.

We have a jupyterhub running on a SLES 15 SP4 server. When a users logs
in (widh pam) and starts a server (which is just a jupyterlab process),
jupyterhub spawns this server with sudo so that it runs with the user id.
This call of sudo creates a login session that we can see with loginctl.

The server keeps running as long as the browser tab is open. But when
we close the browser, it takes about 25-35 minutes, then the server
gets killed. The obvious reason (idle servers getting killed by jupyterhub
after a while) is not responsible. It has been configured off and one
can see in the journal that jupyterhub is the last to know that the
users server dies.

We've attached strace with microseconds timestamps to all processes for the
user after he started his server (which is a sudospawner, the jupyterlab
process, a (sd-pam) process and a "/usr/lib/systemd/systemd --user") and to
the inital systemd process (pid 1). Comparing the timestamps we could
see that the initial SIGTERM/SIGHUP signals indeed come from the root systemd
process (pid 1).

In the journal there was no entry that tells us why it happens. All of a sudden we get:
May 15 14:03:33 bioserver3 systemd[1]: Stopping Session c30 of User biouser...
May 15 14:03:33 bioserver3 systemd[1]: session-c30.scope: Deactivated successfully.
May 15 14:03:33 bioserver3 systemd[1]: Stopped Session c30 of User biouser.

and then the cleanup process starts for the user slice etc.

When we changed the config to work without sudospawner (which is not a good
solution, just for testing) then no own login session is created for the
user, only the jupyterlab process gets started. And then it keeps running
and does not get killed. So the problem only happens if the user process
runs in its own login session that one can see with loginctl.

Then we turned on debugging for systemd and then we find this in the journal:

May 16 12:25:38 bioserver3 systemd[1]: home-b.automount: Got direct umount request on /home/b
May 16 12:25:38 bioserver3 systemd[1]: home-b.mount: Trying to enqueue job home-b.mount/stop/replace
May 16 12:25:38 bioserver3 systemd[1]: session-c77.scope: Installed new job session-c77.scope/stop as 220264
May 16 12:25:38 bioserver3 systemd[1]: home-b.mount: Installed new job home-b.mount/stop as 220263
May 16 12:25:38 bioserver3 systemd[1]: home-b.mount: Enqueued job home-b.mount/stop as 220263
May 16 12:25:38 bioserver3 systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/home_2db_2emount interface=org.freedesktop.DBus.Properties member=Prope
rtiesChanged cookie=4500 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
May 16 12:25:38 bioserver3 systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/home_2db_2emount interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=4501 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
May 16 12:25:38 bioserver3 systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/session_2dc77_2escope interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=4502 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
May 16 12:25:38 bioserver3 systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/session_2dc77_2escope interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=4503 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
May 16 12:25:38 bioserver3 systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobNew cookie=4504 reply_cookie=0 signature=uos error-name=n/a error-message=n/a
May 16 12:25:38 bioserver3 systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobNew cookie=4505 reply_cookie=0 signature=uos error-name=n/a error-message=n/a
May 16 12:25:38 bioserver3 systemd[1]: systemd-logind.service: Got notification message from PID 9246 (WATCHDOG=1)
May 16 12:25:38 bioserver3 systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/session_2dc77_2escope interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=4506 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
May 16 12:25:38 bioserver3 systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/session_2dc77_2escope interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=4507 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
May 16 12:25:38 bioserver3 systemd[1]: session-c77.scope changed abandoned -> stop-sigterm
May 16 12:25:38 bioserver3 systemd[1]: Stopping Session c77 of User biouser...
May 16 12:25:38 bioserver3 systemd[1]: home-b.mount: stopping held back, waiting for: session-c77.scope
...

The home of biouser is mounted like this in /etc/fstab:
<server>:/b /home/b  nfs4    defaults,proto=tcp6,rw,soft,bg,nfsvers=4,lock,noauto,x-systemd.automount,x-systemd.idle-timeout=20min,x-systemd.mount-timeout=30s 0 0

So it seems that the idle-timeout for the mount kills the user
session. That shouldn't happen if the user session is still running,
the umount should react with "/home/b busy" and not kill the processes
who are working on /home/b, I guess?

And this happens only of there is a login session for the user and only
if the browser window for jhub is closed. Otherwise /home/b is not considered
idle. And it happens only on one server, not on a second jhub server, although
they both are installed and configured identically (OS and jhub config).

Is it a bug that systemd considers /home/b idle although a user with
home /home/b/biouser still has a login session running? Or can it be
configured somewhere?

Thanks for any ideas or help!
cu,
Frank


--
Dipl.-Inform. Frank Steiner   Web:  http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik    Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17           Phone: +49 89 2180-4049
80333 Muenchen, Germany       Fax:   +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux