Re: Automatically loading svcrdma causes reboot hang on systemd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Aug 3, 2015, at 10:23 PM, james harvey <jamespharvey20@xxxxxxxxx> wrote:

> Fresh minimal Arch system, on kernel 4.1.3 (-1 Arch).  nfs-utils 1.3.2
> (-6 Arch).  Mellanox ConnectX MT25418 card, using latest firmware.
> systemd 223 (-1 Arch).
> 
> If I boot the system, log in, and manually "modprobe svcrdma" and
> "echo rdma 20049 > /proc/fs/nfsd/portlist", I can "systemctl reboot"
> or "systemctl shutdown" just fine.  Regardless of whether I do so as
> quickly as possible (kernel uptime 30-40sec) or wait several minutes.
> 
> If svcrdma is loaded and portlist is set through systemd with
> "Before=remote-fs-pre.target" and "After=nfs-server.target", I can
> reboot or shutdown just fine.  But, if I wait until the kernel has
> been up for about 60 seconds or more (I give it 2 min in testing to be
> sure), any reboot or shutdown hangs after everything is unmounted,
> services are stopped, and it's actually time to "hit" the power
> switch.  (Using sysrq-trigger does forces it to reboot.)
> 
> If I modify the systemd service, so it is after "network.target
> rdma.service auth-rpcgss-module.service nfs-blkmap.service
> nfs-config.service nfs-imapd.service nfs-mountd.service
> nfs-server.service nfs-server.target nfs-utils.service
> rpc-gssd.service rpc-statd-notify.service rpc-statd.service
> rpc-svcgssd.service", I can wait as long as I want, and
> reboot/shutdown works fine.
> 
> This is all without any NFS exports defined - so no NFS clients
> actually connected.
> 
> As crazy as this sounds (to me anyway), this shows that
> svcrdma/portlist will cause a reboot/shutdown lockup if it is loaded
> before or during when the RDMA or NFS kernel modules are being
> loaded... And works fine if it waits until they are all done.
> 
> I would expect modprobe or setting portlist would fail if it wasn't
> ready to be loaded, rather than come up, work, and later mysteriously
> hanging a reboot/shutdown.
> 
> I put about 20 hours into diagnosing this, and the results are 100%
> repeatable, even if the above conclusion sounds weird.
> 
> I haven't tried out if xprtrdma has the same effect.
> 
> If you also run arch, you can see/download the new AUR4 package that
> loads the kernel module and sets portlist at:
> https://aur4.archlinux.org/packages/nfs-utils-rdma-server/  and the
> client version at:
> https://aur4.archlinux.org/packages/nfs-utils-rdma-client/
> 
> If you "View Changes" on that page, you can see the previous version
> which causes the lockup.

I'm not an Arch user, so I don't entirely understand what is
going on (whether this is an issue unique to Arch, a problem
with the NFS systemd scripts, or a problem with resource
management in the kernel).

However, let's start with this:

http://marc.info/?l=linux-nfs&m=143647487532461&w=2

Can you apply this patch to your kernel, and see if it helps?


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux