Automatically loading svcrdma causes reboot hang on systemd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Fresh minimal Arch system, on kernel 4.1.3 (-1 Arch).  nfs-utils 1.3.2
(-6 Arch).  Mellanox ConnectX MT25418 card, using latest firmware.
systemd 223 (-1 Arch).

If I boot the system, log in, and manually "modprobe svcrdma" and
"echo rdma 20049 > /proc/fs/nfsd/portlist", I can "systemctl reboot"
or "systemctl shutdown" just fine.  Regardless of whether I do so as
quickly as possible (kernel uptime 30-40sec) or wait several minutes.

If svcrdma is loaded and portlist is set through systemd with
"Before=remote-fs-pre.target" and "After=nfs-server.target", I can
reboot or shutdown just fine.  But, if I wait until the kernel has
been up for about 60 seconds or more (I give it 2 min in testing to be
sure), any reboot or shutdown hangs after everything is unmounted,
services are stopped, and it's actually time to "hit" the power
switch.  (Using sysrq-trigger does forces it to reboot.)

If I modify the systemd service, so it is after "network.target
rdma.service auth-rpcgss-module.service nfs-blkmap.service
nfs-config.service nfs-imapd.service nfs-mountd.service
nfs-server.service nfs-server.target nfs-utils.service
rpc-gssd.service rpc-statd-notify.service rpc-statd.service
rpc-svcgssd.service", I can wait as long as I want, and
reboot/shutdown works fine.

This is all without any NFS exports defined - so no NFS clients
actually connected.

As crazy as this sounds (to me anyway), this shows that
svcrdma/portlist will cause a reboot/shutdown lockup if it is loaded
before or during when the RDMA or NFS kernel modules are being
loaded... And works fine if it waits until they are all done.

I would expect modprobe or setting portlist would fail if it wasn't
ready to be loaded, rather than come up, work, and later mysteriously
hanging a reboot/shutdown.

I put about 20 hours into diagnosing this, and the results are 100%
repeatable, even if the above conclusion sounds weird.

I haven't tried out if xprtrdma has the same effect.

If you also run arch, you can see/download the new AUR4 package that
loads the kernel module and sets portlist at:
https://aur4.archlinux.org/packages/nfs-utils-rdma-server/  and the
client version at:
https://aur4.archlinux.org/packages/nfs-utils-rdma-client/

If you "View Changes" on that page, you can see the previous version
which causes the lockup.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux