Re: Linux kernel v4.15-rc4 and rdma_rxe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2017-12-21 at 10:23 +0200, Moni Shoua wrote:
> On Wed, Dec 20, 2017 at 3:10 AM, Bart Van Assche <Bart.VanAssche@xxxxxxx> wrote:
> > Can anyone who is reading this list tell me whether or not the rdma_rxe
> > driver undergoes regular testing? It was a few months ago that I tried to
> > run the SRP protocol over that driver. When I tried again today I ran into
> > the call trace shown below. I can share the details of the test I ran in
> > case anyone would be interested.
> 
> Please share the test details and we'll look into that.

Thanks Moni!

The call trace in my previous e-mail was caused by a bug in the SRP initiator
driver. I will post the patches that fix that bug after the holidays. But even
after having fixed that bug I noticed a remarkable behavior difference between
the mlx4_ib and rxe drivers. ib_srpt channels get closed properly when using
the mlx4 driver but not when using the rxe driver. The test I ran is as follows:
* Clone, build and install the kernel from branch block-scsi-for-next of
  repository https://github.com/bvanassche/linux. Make sure that the SRP
  initiator and target drivers are enabled in the kernel config. I plan to post
  all patches that are in that repository and that are not yet upstream after
  the holidays.
* Clone https://github.com/bvanassche/srp-test.
* Edit /etc/multipath.conf as indicated in the README.md document in the
  srp-test repository.
* Start multipathd.
* If I run the following command on a system with a ConnectX-3 adapter:
    srp-test/run_tests -d -r 10 -t 02-mq
  then the test finishes after about 11 seconds.
  But if I run the following command on a system without any RDMA adapters:
    srp-test/run_tests -c -d -r 10 -t 02-mq
  then the following output appears:

Unloaded the ib_srpt kernel module
Unloaded the rdma_rxe kernel module
SoftRoCE network interfaces: rxe0
Zero-initializing /dev/ram0 ... done
Zero-initializing /dev/ram1 ... done
Zero-initializing /dev/sdb ... done
Configured SRP target driver
Running test /home/bart/software/infiniband/srp-test/tests/02-mq ...
Test file I/O on top of multipath concurrently with logout and login (0 min; mq)
Using /dev/disk/by-id/dm-uuid-mpath-3600140572616d6469736b31000000000 -> ../../dm-2
Unmounting /root/mnt1 from /dev/mapper/mpathb
SRP LUN /sys/class/scsi_device/5:0:0:0 / sdc: removing /dev/dm-2: done
SRP LUN /sys/class/scsi_device/5:0:0:1 / sde: removing /dev/dm-1: done
SRP LUN /sys/class/scsi_device/5:0:0:2 / sdd: removing /dev/dm-0: done
Unloaded the ib_srp kernel module
Test /home/bart/software/infiniband/srp-test/tests/02-mq succeeded
1 tests succeeded and 0 tests failed

[ test script hangs ]

While the test script hangs the following appears in the system log (please note
that the ib_srpt:srpt_zerolength_write_done: ib_srpt wc->status message is missing):

ib_srpt:srpt_close_ch: ib_srpt 192.168.122.76-32: queued zerolength write
[ ... ]
ib_srpt srpt_disconnect_ch_sync(192.168.122.76-18 state 3): still waiting ...
[ ... ]
INFO: task rmdir:3215 blocked for more than 120 seconds.
      Not tainted 4.15.0-rc4-dbg+ #2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rmdir           D13912  3215   3208 0x00000000
Call Trace:
 __schedule+0x2ad/0xb90
 schedule+0x31/0x90
 schedule_timeout+0x1fb/0x590
 wait_for_completion_timeout+0x11a/0x180
 srpt_close_session+0xba/0x180 [ib_srpt]
 target_shutdown_sessions+0xc8/0xd0 [target_core_mod]
 core_tpg_del_initiator_node_acl+0x7c/0x130 [target_core_mod]
 target_fabric_nacl_base_release+0x20/0x30 [target_core_mod]
 config_item_release+0x5a/0xc0 [configfs]
 config_item_put+0x21/0x24 [configfs]
 configfs_rmdir+0x1ef/0x2f0 [configfs]
 vfs_rmdir+0x6e/0x150
 do_rmdir+0x168/0x1c0
 SyS_rmdir+0x11/0x20

Thanks,

Bart.��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux