Re: NFS RDMA client hang (fwd)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jan 9, 2014, at 1:26 PM, Håkan Johansson <f96hajo@xxxxxxxxxxx> wrote:

> 
> Two machines, both running debian squeeze.  (the client with a normal NFS root filesystem).  It works to mount an NFS filesystem via IP over IB giving some 650 MB/s.  When I try to follow the instructions at
> 
> https://www.kernel.org/doc/Documentation/filesystems/nfs/nfs-rdma.txt
> 
> the client fails at the very last stage with a kernel panic.

> 192.168.10.10 is the server, 192.168.10.11 is the client
> 
> mount -vvvv -o vers=3,nolock,tcp,proto=rdma,port=20049 192.168.10.10:/scratch.local1 /mnt
> 
> gives
> 
> mount: fstab path: "/etc/fstab"
> mount: mtab path:  "/etc/mtab"
> mount: lock path:  "/etc/mtab~"
> mount: temp path:  "/etc/mtab.tmp"
> mount: UID:        0
> mount: eUID:       0
> mount: no type was given - I'll assume nfs because of the colon
> mount: spec:  "192.168.10.10:/scratch.local1"
> mount: node:  "/mnt"
> mount: types: "nfs"
> mount: opts:  "vers=3,nolock,tcp,proto=rdma,port=20049"
> mount: external mount: argv[0] = "/sbin/mount.nfs"
> mount: external mount: argv[1] = "192.168.10.10:/scratch.local1"
> mount: external mount: argv[2] = "/mnt"
> mount: external mount: argv[3] = "-v"
> mount: external mount: argv[4] = "-o"
> mount: external mount: argv[5] = "rw,vers=3,nolock,tcp,proto=rdma,port=20049"
> mount.nfs: timeout set for Thu Jan  9 18:04:46 2014
> mount.nfs: trying text-based options 'vers=3,nolock,tcp,proto=rdma,port=20049,addr=192.168.10.10'
> 
> and then the client is stuck.  On the console there is a kernel panic
> 
> it is a bit long, so abbreviated here.  (The hang seems easily reproducible, so if really might help you, I could probably make a photo or so)
> 
> ----
> 
> ths appreaded with a 3.12 kernel
> 
> rpcdma: connection to 192.168.10.10:20049 closed (-111)
> rpcdma: connection to 192.168.10.10:20049 closed (-111)
> rpcdma: connection to 192.168.10.10:20049 on mlx4_0, memreg 5 slots 32 ird 16
> 
> very similar below.  the below is with a 3.10 kernel.  in both cases from debian.  also tested with a 3.2 kernel
> 
> general protection fault: 0000 [#1] SMP
> 
> call trace:
> <IRQ>
> tasklet_action+0x73/0xc2
> __do_softirq
> irq_exit
> do_IRQ
> common_interrupt
> <EOI>
> clockevents_program_event
> arch_local_irq_enable
> cpuidle_enter_state
> cpuidle_idle_call
> arch_cpu_idle
> cpu_startup_entry
> start_kernel
> repair_cpu_string
> x86_64_start_kernel
> 
> RIP    rpcrdma_run_tasklet  [xprtrdma]
> 
> ----
> 
> client is a sandy bridge E3-1245
> 
> I bring the infiniband up like this:
> 
> modprobe ib_mthca
> modprobe ib_ipoib
> modprobe mlx4_ib
> modprobe ib_mad
> modprobe ib_umad
> modprobe ib_urdma
> modprobe rdma_cm
> modprobe rdma_ucm
> ibstat
> /etc/init.d/opensm restart
> /sbin/ifconfig ib0 inet 192.168.10.11 up
> IPTABLES=/sbin/iptables
> $IPTABLES -t filter -A OUTPUT -o ib0 -j ACCEPT
> $IPTABLES -t filter -A INPUT -i ib0 -j ACCEPT
> /etc/init.d/opensm restart
> 
> modprobe xprtrdma
> 
> and similar on the server.
> 
> Both have
> 
> InfiniBand: Mellanox Technologies MT25418 [ConnectX VPI PCIe 2.0 2.5GT/s - IB DDR / 10GigE] (rev a0) adapters.
> 
> http://www.ebay.com/itm/HP-452372-001-Infiniband-PCI-E-4X-DDR-Dual-Port-Storage-Host-Channel-Adapter-HCA-/360657396651?pt=UK_Computing_ComputerComponents_InterfaceCards&hash=item53f8db23ab
> 
> Any suggestions?
> 
> ---
> 
> I first sent this mail to nfs-rdma-devel@xxxxxxxxxxxxxxxxxxxxx (as
> suggested in)
> 
> https://www.kernel.org/doc/Documentation/filesystems/nfs/nfs-rdma.txt
> 
> but got:
> 
> sog-mx-1.v43.ch3.sourceforge.com gave this error: unknown user
> 
> mailing list does not exist any longer?

That documentation is probably out of date.

I don’t see anything immediately wrong with your configuration, but NFS/RDMA has suffered from some bit rot over the past few years.  The current upstream kernels are known to have data corruption bugs and panics.

I recommend staying with NFS on IPoIB for the time being.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux