Re: [PATCH V2 0/2] block: remove unnecessary RESTART

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 03, 2017 at 03:23:14PM +0000, Bart Van Assche wrote:
> On Fri, 2017-11-03 at 11:50 +0800, Ming Lei wrote:
> > On Fri, Nov 03, 2017 at 02:42:50AM +0000, Bart Van Assche wrote:
> > > On Fri, 2017-11-03 at 10:12 +0800, Ming Lei wrote:
> > > > [root@ibclient srp-test]# ./run_tests
> > > > modprobe: FATAL: Module target_core_mod is in use.
> > > 
> > > LIO must be unloaded before srp-test software is started.
> > 
> > Yeah, I can make this test kick off after running
> > 'modprobe -fr ib_isert' first, but looks all tests failed without
> > any hang, could you check if that is the expected result?
> > 
> > 	https://pastebin.com/VXe66Jpg
> 
> I see plenty of "... >/sys/.../add_target failed: Connection timed out"
> messages. The srp-test scripts use the loopback functionality of IB ports
> so one or more IB ports must be active. Have you already checked the IB
> ports state, e.g. by running ibv_devinfo?

[root@ibclient ~]# ibv_devinfo
hca_id:	mlx5_1
	transport:			InfiniBand (0)
	fw_ver:				12.20.1010
	node_guid:			7cfe:9003:0072:6ed3
	sys_image_guid:			7cfe:9003:0072:6ed2
	vendor_id:			0x02c9
	vendor_part_id:			4115
	hw_ver:				0x0
	board_id:			MT_2190110032
	phys_port_cnt:			1
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		4096 (5)
			active_mtu:		4096 (5)
			sm_lid:			1
			port_lid:		1
			port_lmc:		0x00
			link_layer:		InfiniBand

hca_id:	mlx5_0
	transport:			InfiniBand (0)
	fw_ver:				12.20.1010
	node_guid:			7cfe:9003:0072:6ed2
	sys_image_guid:			7cfe:9003:0072:6ed2
	vendor_id:			0x02c9
	vendor_part_id:			4115
	hw_ver:				0x0
	board_id:			MT_2190110032
	phys_port_cnt:			1
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		4096 (5)
			active_mtu:		4096 (5)
			sm_lid:			4
			port_lid:		4
			port_lmc:		0x00
			link_layer:		InfiniBand

Forget to mention, there is failure when running 'make' under srp-test
because shellcheck package is missed in RHEL7. Can that be the issue
of test failure? If yes, could you provide a special version of srp-test
which doesn't depend on shellcheck?

> 
> > Also could you provide some output from debugfs when this hang happens?
> > such as:
> > 
> > 	dispatch/busy/.. under hctxN
> 
> This is the output that appears on my test setup if the two patches from
> the patch series "block: remove unnecessary RESTART" are applied:
> 
> # ~bart/software/infiniband/srp-test/run_tests -f xfs -d -e deadline -r 60                             
> Unloaded the ib_srpt kernel module
> Zero-initializing /dev/ram0 ... done
> Zero-initializing /dev/ram1 ... done
> Zero-initializing /dev/disk/by-id/scsi-33333333000004650 ... done
> brw-rw---- 1 root disk 1, 0 Nov  3 08:10 /dev/ram0
> /dev/ram0
> brw-rw---- 1 root disk 1, 1 Nov  3 08:10 /dev/ram1
> /dev/ram1
> lrwxrwxrwx 1 root root 9 Nov  3 08:11 /dev/disk/by-id/scsi-33333333000004650 -> ../../sdi
> /dev/sdi
> Configured SRP target driver
> Running test /home/bart/software/infiniband/srp-test/tests/01 ...
> id_ext=0x0002c90300a02d50,ioc_guid=0x0002c90300a02d50,dgid=fe800000000000000002c90300a02d51,pkey=7fff,service_id=0x0002c90300a02d50, >/sys/class/infiniband_srp/
> srp-mlx4_0-2/add_target failed: Invalid argument
> id_ext=0x0002c90300a02d50,ioc_guid=0x0002c90300a02d50,dgid=fe800000000000000002c90300a02d52,pkey=7fff,service_id=0x0002c90300a02d50, >/sys/class/infiniband_srp/
> srp-mlx4_0-2/add_target failed: Invalid argument
> count_luns(): 3 <> 3
> Error: unloading kernel module ib_srp failed
> Test /home/bart/software/infiniband/srp-test/tests/01 failed
> # cat /sys/module/scsi_mod/parameters/use_blk_mq  
> Y
> # find /sys/kernel/debug/block -name busy | xargs cat

Can you check 'dispatch' file, hctx state, tags and sched_tags?
And see where is the IO not dispatched?

> # dmesg -c >/dev/null; echo w > /proc/sysrq-trigger; dmesg
> [  709.321750] sysrq: SysRq : Show Blocked State
> [  709.327906]   task                        PC stack   pid father
...
> [  709.456602] kworker/u66:6   D    0   341      2 0x80000000
> [  709.463371] Workqueue: events_unbound async_run_entry_fn
> [  709.469919] Call Trace:
> [  709.473325]  __schedule+0x2fa/0xbb0
> [  709.477850]  ? trace_hardirqs_on_caller+0xfb/0x190
> [  709.483847]  schedule+0x36/0x90
> [  709.487982]  schedule_timeout+0x22c/0x570
> [  709.493115]  ? call_timer_fn+0x330/0x330
> [  709.498133]  ? wait_for_completion_io_timeout+0xf7/0x180
> [  709.504732]  io_schedule_timeout+0x1e/0x50
> [  709.509938]  ? io_schedule_timeout+0x1e/0x50
> [  709.515413]  wait_for_completion_io_timeout+0x11f/0x180
> [  709.521920]  ? wake_up_q+0x80/0x80
> [  709.526376]  blk_execute_rq+0x86/0xc0
> [  709.531135]  scsi_execute+0xdb/0x1f0
> [  709.536336]  sd_revalidate_disk+0xed/0x1c70 [sd_mod]
> [  709.543281]  sd_probe_async+0xc3/0x1d0 [sd_mod]
> [  709.549524]  async_run_entry_fn+0x38/0x160
> [  709.555503]  process_one_work+0x20a/0x660
> [  709.561299]  worker_thread+0x3d/0x3b0
> [  709.566688]  kthread+0x13a/0x150
> [  709.571613]  ? process_one_work+0x660/0x660
> [  709.577561]  ? kthread_create_on_node+0x40/0x40
> [  709.583910]  ret_from_fork+0x27/0x40

Looks hang in scsi_execute().

> [  709.589174] kworker/5:2     D    0   386      2 0x80000000
> [  709.596616] Workqueue: kaluad alua_rtpg_work [scsi_dh_alua]
> [  709.604130] Call Trace:
> [  709.608096]  __schedule+0x2fa/0xbb0
> [  709.613265]  ? trace_hardirqs_on_caller+0xfb/0x190
> [  709.619879]  schedule+0x36/0x90
> [  709.624684]  schedule_timeout+0x22c/0x570
> [  709.630437]  ? call_timer_fn+0x330/0x330
> [  709.636059]  ? wait_for_completion_io_timeout+0xf7/0x180
> [  709.643270]  io_schedule_timeout+0x1e/0x50
> [  709.649083]  ? io_schedule_timeout+0x1e/0x50
> [  709.655159]  wait_for_completion_io_timeout+0x11f/0x180
> [  709.662284]  ? wake_up_q+0x80/0x80
> [  709.667334]  blk_execute_rq+0x86/0xc0
> [  709.672705]  scsi_execute+0xdb/0x1f0
> [  709.677938]  alua_rtpg_work+0x2b3/0xe8a [scsi_dh_alua]
> [  709.684983]  process_one_work+0x20a/0x660
> [  709.690759]  worker_thread+0x3d/0x3b0
> [  709.696107]  kthread+0x13a/0x150
> [  709.700986]  ? process_one_work+0x660/0x660
> [  709.706920]  ? kthread_create_on_node+0x40/0x40
> [  709.713316]  ret_from_fork+0x27/0x40

Same with above.

-- 
Ming



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux