On Fri, 2017-11-03 at 11:50 +0800, Ming Lei wrote: > On Fri, Nov 03, 2017 at 02:42:50AM +0000, Bart Van Assche wrote: > > On Fri, 2017-11-03 at 10:12 +0800, Ming Lei wrote: > > > [root@ibclient srp-test]# ./run_tests > > > modprobe: FATAL: Module target_core_mod is in use. > > > > LIO must be unloaded before srp-test software is started. > > Yeah, I can make this test kick off after running > 'modprobe -fr ib_isert' first, but looks all tests failed without > any hang, could you check if that is the expected result? > > https://pastebin.com/VXe66Jpg I see plenty of "... >/sys/.../add_target failed: Connection timed out" messages. The srp-test scripts use the loopback functionality of IB ports so one or more IB ports must be active. Have you already checked the IB ports state, e.g. by running ibv_devinfo? > Also could you provide some output from debugfs when this hang happens? > such as: > > dispatch/busy/.. under hctxN This is the output that appears on my test setup if the two patches from the patch series "block: remove unnecessary RESTART" are applied: # ~bart/software/infiniband/srp-test/run_tests -f xfs -d -e deadline -r 60 Unloaded the ib_srpt kernel module Zero-initializing /dev/ram0 ... done Zero-initializing /dev/ram1 ... done Zero-initializing /dev/disk/by-id/scsi-33333333000004650 ... done brw-rw---- 1 root disk 1, 0 Nov 3 08:10 /dev/ram0 /dev/ram0 brw-rw---- 1 root disk 1, 1 Nov 3 08:10 /dev/ram1 /dev/ram1 lrwxrwxrwx 1 root root 9 Nov 3 08:11 /dev/disk/by-id/scsi-33333333000004650 -> ../../sdi /dev/sdi Configured SRP target driver Running test /home/bart/software/infiniband/srp-test/tests/01 ... id_ext=0x0002c90300a02d50,ioc_guid=0x0002c90300a02d50,dgid=fe800000000000000002c90300a02d51,pkey=7fff,service_id=0x0002c90300a02d50, >/sys/class/infiniband_srp/ srp-mlx4_0-2/add_target failed: Invalid argument id_ext=0x0002c90300a02d50,ioc_guid=0x0002c90300a02d50,dgid=fe800000000000000002c90300a02d52,pkey=7fff,service_id=0x0002c90300a02d50, >/sys/class/infiniband_srp/ srp-mlx4_0-2/add_target failed: Invalid argument count_luns(): 3 <> 3 Error: unloading kernel module ib_srp failed Test /home/bart/software/infiniband/srp-test/tests/01 failed # cat /sys/module/scsi_mod/parameters/use_blk_mq Y # find /sys/kernel/debug/block -name busy | xargs cat # dmesg -c >/dev/null; echo w > /proc/sysrq-trigger; dmesg [ 709.321750] sysrq: SysRq : Show Blocked State [ 709.327906] task PC stack pid father [ 709.335910] kworker/4:1 D 0 54 2 0x80000000 [ 709.342887] Workqueue: srp_remove srp_remove_work [ib_srp] [ 709.349806] Call Trace: [ 709.353365] __schedule+0x2fa/0xbb0 [ 709.357865] schedule+0x36/0x90 [ 709.361985] async_synchronize_cookie_domain+0x88/0x130 [ 709.368422] ? finish_wait+0x90/0x90 [ 709.373033] async_synchronize_full_domain+0x18/0x20 [ 709.379184] sd_remove+0x4d/0xc0 [sd_mod] [ 709.384282] device_release_driver_internal+0x160/0x210 [ 709.390752] device_release_driver+0x12/0x20 [ 709.396130] bus_remove_device+0x100/0x180 [ 709.401332] device_del+0x1d8/0x340 [ 709.405823] __scsi_remove_device+0xfc/0x130 [ 709.411221] scsi_forget_host+0x25/0x70 [ 709.416108] scsi_remove_host+0x79/0x120 [ 709.421114] srp_remove_work+0x90/0x1d0 [ib_srp] [ 709.426867] process_one_work+0x20a/0x660 [ 709.431971] worker_thread+0x3d/0x3b0 [ 709.436655] kthread+0x13a/0x150 [ 709.440872] ? process_one_work+0x660/0x660 [ 709.446133] ? kthread_create_on_node+0x40/0x40 [ 709.451825] ret_from_fork+0x27/0x40 [ 709.456602] kworker/u66:6 D 0 341 2 0x80000000 [ 709.463371] Workqueue: events_unbound async_run_entry_fn [ 709.469919] Call Trace: [ 709.473325] __schedule+0x2fa/0xbb0 [ 709.477850] ? trace_hardirqs_on_caller+0xfb/0x190 [ 709.483847] schedule+0x36/0x90 [ 709.487982] schedule_timeout+0x22c/0x570 [ 709.493115] ? call_timer_fn+0x330/0x330 [ 709.498133] ? wait_for_completion_io_timeout+0xf7/0x180 [ 709.504732] io_schedule_timeout+0x1e/0x50 [ 709.509938] ? io_schedule_timeout+0x1e/0x50 [ 709.515413] wait_for_completion_io_timeout+0x11f/0x180 [ 709.521920] ? wake_up_q+0x80/0x80 [ 709.526376] blk_execute_rq+0x86/0xc0 [ 709.531135] scsi_execute+0xdb/0x1f0 [ 709.536336] sd_revalidate_disk+0xed/0x1c70 [sd_mod] [ 709.543281] sd_probe_async+0xc3/0x1d0 [sd_mod] [ 709.549524] async_run_entry_fn+0x38/0x160 [ 709.555503] process_one_work+0x20a/0x660 [ 709.561299] worker_thread+0x3d/0x3b0 [ 709.566688] kthread+0x13a/0x150 [ 709.571613] ? process_one_work+0x660/0x660 [ 709.577561] ? kthread_create_on_node+0x40/0x40 [ 709.583910] ret_from_fork+0x27/0x40 [ 709.589174] kworker/5:2 D 0 386 2 0x80000000 [ 709.596616] Workqueue: kaluad alua_rtpg_work [scsi_dh_alua] [ 709.604130] Call Trace: [ 709.608096] __schedule+0x2fa/0xbb0 [ 709.613265] ? trace_hardirqs_on_caller+0xfb/0x190 [ 709.619879] schedule+0x36/0x90 [ 709.624684] schedule_timeout+0x22c/0x570 [ 709.630437] ? call_timer_fn+0x330/0x330 [ 709.636059] ? wait_for_completion_io_timeout+0xf7/0x180 [ 709.643270] io_schedule_timeout+0x1e/0x50 [ 709.649083] ? io_schedule_timeout+0x1e/0x50 [ 709.655159] wait_for_completion_io_timeout+0x11f/0x180 [ 709.662284] ? wake_up_q+0x80/0x80 [ 709.667334] blk_execute_rq+0x86/0xc0 [ 709.672705] scsi_execute+0xdb/0x1f0 [ 709.677938] alua_rtpg_work+0x2b3/0xe8a [scsi_dh_alua] [ 709.684983] process_one_work+0x20a/0x660 [ 709.690759] worker_thread+0x3d/0x3b0 [ 709.696107] kthread+0x13a/0x150 [ 709.700986] ? process_one_work+0x660/0x660 [ 709.706920] ? kthread_create_on_node+0x40/0x40 [ 709.713316] ret_from_fork+0x27/0x40 [ 709.718809] systemd-udevd D 0 1905 463 0x80000106 [ 709.726285] Call Trace: [ 709.730309] __schedule+0x2fa/0xbb0 [ 709.735462] schedule+0x36/0x90 [ 709.740270] io_schedule+0x16/0x40 [ 709.745369] __lock_page+0x10a/0x140 [ 709.750691] ? page_cache_tree_insert+0x100/0x100 [ 709.757280] truncate_inode_pages_range+0x4ff/0x800 [ 709.764044] ? buffer_exit_cpu_dead+0x70/0x70 [ 709.770285] ? find_next_bit+0xb/0x10 [ 709.775691] ? cpumask_next+0x1a/0x20 [ 709.781089] ? smp_call_function_many+0x1e3/0x230 [ 709.787598] ? on_each_cpu_mask+0x52/0xa0 [ 709.793353] ? mark_buffer_async_write+0x20/0x20 [ 709.799741] ? buffer_exit_cpu_dead+0x70/0x70 [ 709.805888] ? on_each_cpu_cond+0x99/0xe0 [ 709.811636] truncate_inode_pages+0x15/0x20 [ 709.817536] kill_bdev+0x35/0x40 [ 709.822400] __blkdev_put+0x6d/0x1f0 [ 709.827625] blkdev_put+0x4e/0x130 [ 709.832673] blkdev_close+0x25/0x30 [ 709.837779] __fput+0xed/0x1f0 [ 709.842430] ____fput+0xe/0x10 [ 709.847030] task_work_run+0x8b/0xc0 [ 709.852251] do_exit+0x38d/0xc70 [ 709.857046] ? get_signal+0x2f6/0x8c0 [ 709.862347] do_group_exit+0x50/0xd0 [ 709.867515] get_signal+0x2ad/0x8c0 [ 709.872601] do_signal+0x28/0x680 [ 709.877452] ? blkdev_read_iter+0x35/0x40 [ 709.883108] ? __vfs_read+0xcc/0x130 [ 709.888237] ? vfs_read+0xa0/0x150 [ 709.893200] exit_to_usermode_loop+0x5a/0xa0 [ 709.899109] do_syscall_64+0x12e/0x170 [ 709.904475] entry_SYSCALL64_slow_path+0x25/0x25 [ 709.910797] RIP: 0033:0x7f5dc4e32d11 [ 709.915917] RSP: 002b:00007ffeb0b10588 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 709.925589] RAX: fffffffffffffffc RBX: 000056230da84490 RCX: 00007f5dc4e32d11 [ 709.934785] RDX: 0000000000000040 RSI: 000056230da844b8 RDI: 0000000000000007 [ 709.943987] RBP: 000056230dabc3c0 R08: 0000000000000000 R09: 000056230da84490 [ 709.953202] R10: 000000000000006f R11: 0000000000000246 R12: 000000008d530000 [ 709.962419] R13: 000000008d530040 R14: 000056230dabc410 R15: 0000000000000040