On Thu, 2017-12-21 at 10:23 +0200, Moni Shoua wrote: > On Wed, Dec 20, 2017 at 3:10 AM, Bart Van Assche <Bart.VanAssche@xxxxxxx> wrote: > > Can anyone who is reading this list tell me whether or not the rdma_rxe > > driver undergoes regular testing? It was a few months ago that I tried to > > run the SRP protocol over that driver. When I tried again today I ran into > > the call trace shown below. I can share the details of the test I ran in > > case anyone would be interested. > > Please share the test details and we'll look into that. Thanks Moni! The call trace in my previous e-mail was caused by a bug in the SRP initiator driver. I will post the patches that fix that bug after the holidays. But even after having fixed that bug I noticed a remarkable behavior difference between the mlx4_ib and rxe drivers. ib_srpt channels get closed properly when using the mlx4 driver but not when using the rxe driver. The test I ran is as follows: * Clone, build and install the kernel from branch block-scsi-for-next of repository https://github.com/bvanassche/linux. Make sure that the SRP initiator and target drivers are enabled in the kernel config. I plan to post all patches that are in that repository and that are not yet upstream after the holidays. * Clone https://github.com/bvanassche/srp-test. * Edit /etc/multipath.conf as indicated in the README.md document in the srp-test repository. * Start multipathd. * If I run the following command on a system with a ConnectX-3 adapter: srp-test/run_tests -d -r 10 -t 02-mq then the test finishes after about 11 seconds. But if I run the following command on a system without any RDMA adapters: srp-test/run_tests -c -d -r 10 -t 02-mq then the following output appears: Unloaded the ib_srpt kernel module Unloaded the rdma_rxe kernel module SoftRoCE network interfaces: rxe0 Zero-initializing /dev/ram0 ... done Zero-initializing /dev/ram1 ... done Zero-initializing /dev/sdb ... done Configured SRP target driver Running test /home/bart/software/infiniband/srp-test/tests/02-mq ... Test file I/O on top of multipath concurrently with logout and login (0 min; mq) Using /dev/disk/by-id/dm-uuid-mpath-3600140572616d6469736b31000000000 -> ../../dm-2 Unmounting /root/mnt1 from /dev/mapper/mpathb SRP LUN /sys/class/scsi_device/5:0:0:0 / sdc: removing /dev/dm-2: done SRP LUN /sys/class/scsi_device/5:0:0:1 / sde: removing /dev/dm-1: done SRP LUN /sys/class/scsi_device/5:0:0:2 / sdd: removing /dev/dm-0: done Unloaded the ib_srp kernel module Test /home/bart/software/infiniband/srp-test/tests/02-mq succeeded 1 tests succeeded and 0 tests failed [ test script hangs ] While the test script hangs the following appears in the system log (please note that the ib_srpt:srpt_zerolength_write_done: ib_srpt wc->status message is missing): ib_srpt:srpt_close_ch: ib_srpt 192.168.122.76-32: queued zerolength write [ ... ] ib_srpt srpt_disconnect_ch_sync(192.168.122.76-18 state 3): still waiting ... [ ... ] INFO: task rmdir:3215 blocked for more than 120 seconds. Not tainted 4.15.0-rc4-dbg+ #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rmdir D13912 3215 3208 0x00000000 Call Trace: __schedule+0x2ad/0xb90 schedule+0x31/0x90 schedule_timeout+0x1fb/0x590 wait_for_completion_timeout+0x11a/0x180 srpt_close_session+0xba/0x180 [ib_srpt] target_shutdown_sessions+0xc8/0xd0 [target_core_mod] core_tpg_del_initiator_node_acl+0x7c/0x130 [target_core_mod] target_fabric_nacl_base_release+0x20/0x30 [target_core_mod] config_item_release+0x5a/0xc0 [configfs] config_item_put+0x21/0x24 [configfs] configfs_rmdir+0x1ef/0x2f0 [configfs] vfs_rmdir+0x6e/0x150 do_rmdir+0x168/0x1c0 SyS_rmdir+0x11/0x20 Thanks, Bart.��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f