On Mon, Jun 11, 2018 at 4:38 AM, Roman Pen <roman.penyaev@xxxxxxxxxxxxxxxx> wrote: > It is not allowed to reinit q->tag_set_list list entry while RCU grace > period has not completed yet, otherwise the following soft lockup in > blk_mq_sched_restart() happens: > > [ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270] > [ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000 > [ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150 > [ 1064.256510] Call Trace: > [ 1064.256664] <IRQ> > [ 1064.256824] blk_mq_free_request+0xea/0x100 > [ 1064.256987] msg_io_conf+0x59/0xd0 [ibnbd_client] > [ 1064.257175] complete_rdma_req+0xf2/0x230 [ibtrs_client] > [ 1064.257340] ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core] > [ 1064.257502] ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client] > [ 1064.257669] ib_create_qp+0x321/0x380 [ib_core] > [ 1064.257841] ib_process_cq_direct+0xbd/0x120 [ib_core] > [ 1064.258007] irq_poll_softirq+0xb7/0xe0 > [ 1064.258165] __do_softirq+0x106/0x2a2 > [ 1064.258328] irq_exit+0x92/0xa0 > [ 1064.258509] do_IRQ+0x4a/0xd0 > [ 1064.258660] common_interrupt+0x7a/0x7a > [ 1064.258818] </IRQ> > > Meanwhile another context frees other queue but with the same set of > shared tags: > > [ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds. > [ 1288.201833] bash D 0 5910 5820 0x00000000 > [ 1288.202016] Call Trace: > [ 1288.202315] schedule+0x32/0x80 > [ 1288.202462] schedule_timeout+0x1e5/0x380 > [ 1288.203838] wait_for_completion+0xb0/0x120 > [ 1288.204137] __wait_rcu_gp+0x125/0x160 > [ 1288.204287] synchronize_sched+0x6e/0x80 > [ 1288.204770] blk_mq_free_queue+0x74/0xe0 > [ 1288.204922] blk_cleanup_queue+0xc7/0x110 > [ 1288.205073] ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client] > [ 1288.205389] ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client] > [ 1288.205548] kernfs_fop_write+0x109/0x180 > [ 1288.206328] vfs_write+0xb3/0x1a0 > [ 1288.206476] SyS_write+0x52/0xc0 > [ 1288.206624] do_syscall_64+0x68/0x1d0 > [ 1288.206774] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > What happened is the following: > > 1. There are several MQ queues with shared tags. > 2. One queue is about to be freed and now task is in > blk_mq_del_queue_tag_set(). > 3. Other CPU is in blk_mq_sched_restart() and loops over all queues in > tag list in order to find hctx to restart. > > Because linked list entry was modified in blk_mq_del_queue_tag_set() > without proper waiting for a grace period, blk_mq_sched_restart() > never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup. > > Fix is simple: reinit list entry after an RCU grace period elapsed. > > Signed-off-by: Roman Pen <roman.penyaev@xxxxxxxxxxxxxxxx> > Cc: Jens Axboe <axboe@xxxxxxxxx> > Cc: Bart Van Assche <bart.vanassche@xxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxx> > Cc: Sagi Grimberg <sagi@xxxxxxxxxxx> > Cc: Ming Lei <ming.lei@xxxxxxxxxx> > Cc: linux-block@xxxxxxxxxxxxxxx > --- > block/blk-mq.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 0dc9e341c2a7..2a40d60950f4 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2422,7 +2422,6 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q) > > mutex_lock(&set->tag_list_lock); > list_del_rcu(&q->tag_set_list); > - INIT_LIST_HEAD(&q->tag_set_list); > if (list_is_singular(&set->tag_list)) { > /* just transitioned to unshared */ > set->flags &= ~BLK_MQ_F_TAG_SHARED; > @@ -2430,8 +2429,8 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q) > blk_mq_update_tag_set_depth(set, false); > } > mutex_unlock(&set->tag_list_lock); > - > synchronize_rcu(); > + INIT_LIST_HEAD(&q->tag_set_list); > } > > static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set, > -- > 2.13.1 > Good catch: Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx> Thanks, Ming Lei