Bob Liu <bob.liu@xxxxxxxxxx> 于2019年4月9日周二 上午11:11写道: > > This patch was proposed by Roman Pen[3] years ago. > Recently we hit a bug which is likely caused by the same reason,so rebased his > fix to v5.1 and resend. > Below is almost copied from that patch[3]. > > ------ > Long time ago there was a similar fix proposed by Akinobu Mita[1], > but it seems that time everyone decided to fix this subtle race in > percpu-refcount and Tejun Heo[2] did an attempt (as I can see that > patchset was not applied). > > The following is a description of a hang in blk_mq_freeze_queue_wait() - > same fix but a bug from another angle. > > The hang happens on attempt to freeze a queue while another task does > queue unfreeze. > > The root cause is an incorrect sequence of percpu_ref_reinit() and > percpu_ref_kill() and as a result those two can be swapped: > > CPU#0 CPU#1 > ---------------- ----------------- > percpu_ref_kill() > > percpu_ref_kill() << atomic reference does > percpu_ref_reinit() << not guarantee the order > > blk_mq_freeze_queue_wait() << HANG HERE > > percpu_ref_reinit() > > Firstly this wrong sequence raises two kernel warnings: > > 1st. WARNING at lib/percpu-recount.c:309 > percpu_ref_kill_and_confirm called more than once > > 2nd. WARNING at lib/percpu-refcount.c:331 > > But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(), > which waits for a zero of a q_usage_counter, which never happens > because percpu-ref was reinited (instead of being killed) and stays in > PERCPU state forever. > > The simplified sequence above can be reproduced on shared tags, when > queue A is going to die meanwhile another queue B is in init state and > is trying to freeze the queue A, which shares the same tags set: > > CPU#0 CPU#1 > ------------------------------- ------------------------------------ > q1 = blk_mq_init_queue(shared_tags) > > q2 = blk_mq_init_queue(shared_tags): > blk_mq_add_queue_tag_set(shared_tags): > blk_mq_update_tag_set_depth(shared_tags): > blk_mq_freeze_queue(q1) > blk_cleanup_queue(q1) ... > blk_mq_freeze_queue(q1) <<<->>> blk_mq_unfreeze_queue(q1) > > [1] Message id: 1443287365-4244-7-git-send-email-akinobu.mita@xxxxxxxxx > [2] Message id: 1443563240-29306-6-git-send-email-tj@xxxxxxxxxx > [3] https://patchwork.kernel.org/patch/9268199/ > > Signed-off-by: Roman Pen <roman.penyaev@xxxxxxxxxxxxxxxx> > Signed-off-by: Bob Liu <bob.liu@xxxxxxxxxx> > Cc: Akinobu Mita <akinobu.mita@xxxxxxxxx> > Cc: Tejun Heo <tj@xxxxxxxxxx> > Cc: Jens Axboe <axboe@xxxxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxx> > Cc: linux-block@xxxxxxxxxxxxxxx > Cc: linux-kernel@xxxxxxxxxxxxxxx > Replaced Roman's email address. We at 1 & 1 IONOS (former ProfitBricks) have been carried this patch for some years, it has been running in production for some years too, would be good to see it in upstream :) Thanks, Jack Wang Linux Kernel Developer @ 1 & 1 IONOS