On 4/9/19 5:29 PM, Jinpu Wang wrote: > Bob Liu <bob.liu@xxxxxxxxxx> 于2019年4月9日周二 上午11:11写道: >> >> This patch was proposed by Roman Pen[3] years ago. >> Recently we hit a bug which is likely caused by the same reason,so rebased his >> fix to v5.1 and resend. >> Below is almost copied from that patch[3]. >> >> ------ >> Long time ago there was a similar fix proposed by Akinobu Mita[1], >> but it seems that time everyone decided to fix this subtle race in >> percpu-refcount and Tejun Heo[2] did an attempt (as I can see that >> patchset was not applied). >> >> The following is a description of a hang in blk_mq_freeze_queue_wait() - >> same fix but a bug from another angle. >> >> The hang happens on attempt to freeze a queue while another task does >> queue unfreeze. >> >> The root cause is an incorrect sequence of percpu_ref_reinit() and >> percpu_ref_kill() and as a result those two can be swapped: >> >> CPU#0 CPU#1 >> ---------------- ----------------- >> percpu_ref_kill() >> >> percpu_ref_kill() << atomic reference does >> percpu_ref_reinit() << not guarantee the order >> >> blk_mq_freeze_queue_wait() << HANG HERE >> >> percpu_ref_reinit() >> >> Firstly this wrong sequence raises two kernel warnings: >> >> 1st. WARNING at lib/percpu-recount.c:309 >> percpu_ref_kill_and_confirm called more than once >> >> 2nd. WARNING at lib/percpu-refcount.c:331 >> >> But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(), >> which waits for a zero of a q_usage_counter, which never happens >> because percpu-ref was reinited (instead of being killed) and stays in >> PERCPU state forever. >> >> The simplified sequence above can be reproduced on shared tags, when >> queue A is going to die meanwhile another queue B is in init state and >> is trying to freeze the queue A, which shares the same tags set: >> >> CPU#0 CPU#1 >> ------------------------------- ------------------------------------ >> q1 = blk_mq_init_queue(shared_tags) >> >> q2 = blk_mq_init_queue(shared_tags): >> blk_mq_add_queue_tag_set(shared_tags): >> blk_mq_update_tag_set_depth(shared_tags): >> blk_mq_freeze_queue(q1) >> blk_cleanup_queue(q1) ... >> blk_mq_freeze_queue(q1) <<<->>> blk_mq_unfreeze_queue(q1) >> >> [1] Message id: 1443287365-4244-7-git-send-email-akinobu.mita@xxxxxxxxx >> [2] Message id: 1443563240-29306-6-git-send-email-tj@xxxxxxxxxx >> [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_9268199_&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=1ktT0U2YS_I8Zz2o-MS1YcCAzWZ6hFGtyTgvVMGM7gI&m=OcA07QqFechuCug2pqm_-JpGP_mOt0YouTXApdePMGw&s=VM_-8S5gkFo8zUjT5RoY0CkbxN6hQmTwVmslulwsFJM&e= >> >> Signed-off-by: Roman Pen <roman.penyaev@xxxxxxxxxxxxxxxx> >> Signed-off-by: Bob Liu <bob.liu@xxxxxxxxxx> >> Cc: Akinobu Mita <akinobu.mita@xxxxxxxxx> >> Cc: Tejun Heo <tj@xxxxxxxxxx> >> Cc: Jens Axboe <axboe@xxxxxxxxx> >> Cc: Christoph Hellwig <hch@xxxxxx> >> Cc: linux-block@xxxxxxxxxxxxxxx >> Cc: linux-kernel@xxxxxxxxxxxxxxx >> > > Replaced Roman's email address. > > We at 1 & 1 IONOS (former ProfitBricks) have been carried this patch > for some years, > it has been running in production for some years too, Nice to hear that! > would be good to see it in upstream :) Yes. Could anyone have a review? Thanks! > > Thanks, > > Jack Wang > Linux Kernel Developer @ 1 & 1 IONOS >