NFS RDMA client hang (fwd)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Two machines, both running debian squeeze. (the client with a normal NFS root filesystem). It works to mount an NFS filesystem via IP over IB giving some 650 MB/s. When I try to follow the instructions at

https://www.kernel.org/doc/Documentation/filesystems/nfs/nfs-rdma.txt

the client fails at the very last stage with a kernel panic.

192.168.10.10 is the server, 192.168.10.11 is the client

mount -vvvv -o vers=3,nolock,tcp,proto=rdma,port=20049 192.168.10.10:/scratch.local1 /mnt

gives

mount: fstab path: "/etc/fstab"
mount: mtab path:  "/etc/mtab"
mount: lock path:  "/etc/mtab~"
mount: temp path:  "/etc/mtab.tmp"
mount: UID:        0
mount: eUID:       0
mount: no type was given - I'll assume nfs because of the colon
mount: spec:  "192.168.10.10:/scratch.local1"
mount: node:  "/mnt"
mount: types: "nfs"
mount: opts:  "vers=3,nolock,tcp,proto=rdma,port=20049"
mount: external mount: argv[0] = "/sbin/mount.nfs"
mount: external mount: argv[1] = "192.168.10.10:/scratch.local1"
mount: external mount: argv[2] = "/mnt"
mount: external mount: argv[3] = "-v"
mount: external mount: argv[4] = "-o"
mount: external mount: argv[5] = "rw,vers=3,nolock,tcp,proto=rdma,port=20049"
mount.nfs: timeout set for Thu Jan  9 18:04:46 2014
mount.nfs: trying text-based options 'vers=3,nolock,tcp,proto=rdma,port=20049,addr=192.168.10.10'

and then the client is stuck.  On the console there is a kernel panic

it is a bit long, so abbreviated here. (The hang seems easily reproducible, so if really might help you, I could probably make a photo or so)

----

ths appreaded with a 3.12 kernel

rpcdma: connection to 192.168.10.10:20049 closed (-111)
rpcdma: connection to 192.168.10.10:20049 closed (-111)
rpcdma: connection to 192.168.10.10:20049 on mlx4_0, memreg 5 slots 32 ird 16

very similar below. the below is with a 3.10 kernel. in both cases from debian. also tested with a 3.2 kernel

general protection fault: 0000 [#1] SMP

call trace:
<IRQ>
tasklet_action+0x73/0xc2
__do_softirq
irq_exit
do_IRQ
common_interrupt
<EOI>
clockevents_program_event
arch_local_irq_enable
cpuidle_enter_state
cpuidle_idle_call
arch_cpu_idle
cpu_startup_entry
start_kernel
repair_cpu_string
x86_64_start_kernel

RIP    rpcrdma_run_tasklet  [xprtrdma]

----

client is a sandy bridge E3-1245

I bring the infiniband up like this:

modprobe ib_mthca
modprobe ib_ipoib
modprobe mlx4_ib
modprobe ib_mad
modprobe ib_umad
modprobe ib_urdma
modprobe rdma_cm
modprobe rdma_ucm
ibstat
/etc/init.d/opensm restart
/sbin/ifconfig ib0 inet 192.168.10.11 up
IPTABLES=/sbin/iptables
$IPTABLES -t filter -A OUTPUT -o ib0 -j ACCEPT
$IPTABLES -t filter -A INPUT -i ib0 -j ACCEPT
/etc/init.d/opensm restart

modprobe xprtrdma

and similar on the server.

Both have

InfiniBand: Mellanox Technologies MT25418 [ConnectX VPI PCIe 2.0 2.5GT/s - IB DDR / 10GigE] (rev a0) adapters.

http://www.ebay.com/itm/HP-452372-001-Infiniband-PCI-E-4X-DDR-Dual-Port-Storage-Host-Channel-Adapter-HCA-/360657396651?pt=UK_Computing_ComputerComponents_InterfaceCards&hash=item53f8db23ab

Any suggestions?

---

I first sent this mail to nfs-rdma-devel@xxxxxxxxxxxxxxxxxxxxx (as
suggested in)

https://www.kernel.org/doc/Documentation/filesystems/nfs/nfs-rdma.txt

but got:

sog-mx-1.v43.ch3.sourceforge.com gave this error: unknown user

mailing list does not exist any longer?

---

Thanks,
Håkan Johansson



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux