Re: [bug report] blktests srp/002 hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/17/23 15:06, Bart Van Assche wrote:
> On 10/17/23 12:55, Bob Pearson wrote:
>> Well.... the extra tracing did *not* show srp running out of iu's.
>> So I converted cq handling to IB_POLL_SOFTIRQ from IB_POLL_DIRECT.
>> This required adding a spinlock around list_add(&iu->list, ...) in
>> srp_send_done(). The test now runs with all the completions handled
>> correctly. But, it still hangs. So a red herring.
> 
> iu->list manipulations are protected by ch->lock. See also the
> lockdep_assert_held(&ch->lock) statements in the code that does
> manipulate this list and that does not grab ch->lock directly.
> 
> Thanks,
> 
> Bart.

One more clue. When the test hangs, after 120 seconds there is a set
of hung task messages in the logs like:

[  408.844422] ib_srp:srp_parse_in: ib_srp: [fe80::b62e:99ff:fef9:fa2e] -> [fe80::b62e:99ff:fef9:fa2e]:0/11010381%0
[  408.844439] ib_srp:srp_parse_in: ib_srp: [fe80::b62e:99ff:fef9:fa2e]:5555 -> [fe80::b62e:99ff:fef9:fa2e]:5555/11010381%0
[  408.844474] ib_srp:srp_parse_in: ib_srp: [fe80::21bb:9ba3:7562:5fb8%2] -> [fe80::21bb:9ba3:7562:5fb8]:0/11010381%2
[  408.844491] ib_srp:srp_parse_in: ib_srp: [fe80::21bb:9ba3:7562:5fb8%2]:5555 -> [fe80::21bb:9ba3:7562:5fb8]:5555/11010381%2
[  408.844502] scsi host13: ib_srp: Already connected to target port with id_ext=b62e99fffef9fa2e;ioc_guid=b62e99fffef9fa2e;dest=fe80:0000:0000:0000:21bb:9ba3:7562:5fb8
[  605.106839] INFO: task kworker/1:0:25 blocked for more than 120 seconds.
[  605.106857]       Tainted: G    B      OE      6.6.0-rc3+ #10
[  605.106866] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.106872] task:kworker/1:0     state:D stack:0     pid:25    ppid:2      flags:0x00004000
[  605.106887] Workqueue: dio/dm-5 iomap_dio_complete_work
[  605.106904] Call Trace:
[  605.106909]  <TASK>
[  605.106917]  ? __schedule+0x996/0x2c80
[  605.106929]  __schedule+0x9f6/0x2c80
[  605.106945]  ? lock_release+0xc1/0x6f0
[  605.106955]  ? rcu_is_watching+0x23/0x50
[  605.106970]  ? io_schedule_timeout+0xc0/0xc0
[  605.106981]  ? lock_contended+0x740/0x740
[  605.106989]  ? do_raw_spin_lock+0x1c0/0x1c0
[  605.106999]  ? lock_contended+0x740/0x740
[  605.107011]  ? _raw_spin_unlock_irq+0x27/0x60
[  605.107023]  ? trace_hardirqs_on+0x22/0x100
[  605.107037]  ? _raw_spin_unlock_irq+0x27/0x60
[  605.107052]  schedule+0x96/0x150
[  605.107063]  bit_wait+0x1c/0xa0
[  605.107074]  __wait_on_bit+0x42/0x110
[  605.107084]  ? bit_wait_io+0xa0/0xa0
[  605.107099]  __inode_wait_for_writeback+0x11b/0x190
[  605.107112]  ? inode_prepare_wbs_switch+0x160/0x160
[  605.107127]  ? swake_up_one+0xb0/0xb0
[  605.107147]  writeback_single_inode+0xb8/0x250
[  605.107159]  sync_inode_metadata+0xa2/0xe0
[  605.107168]  ? write_inode_now+0x160/0x160
[  605.107186]  ? file_write_and_wait_range+0x54/0xe0
[  605.107199]  generic_buffers_fsync_noflush+0x135/0x160
[  605.107213]  ext4_sync_file+0x3b3/0x620
[  605.107227]  vfs_fsync_range+0x69/0x110
[  605.107237]  ? ext4_getfsmap+0x520/0x520
[  605.107249]  iomap_dio_complete+0x35c/0x3a0
[  605.107259]  ? __schedule+0x9fe/0x2c80
[  605.107272]  ? aio_fsync_work+0x190/0x190
[  605.107284]  iomap_dio_complete_work+0x36/0x50
[  605.107297]  process_one_work+0x46c/0x950


All the active threads are just the same and are all waiting for
an io to complete from scsi. No threads are active in rxe, srp(t)
or scsi. All activity appears to be dead.

Bob



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux