Re: [RESEND PATCH v4] blk-mq: fix hang caused by freeze/unfreeze sequence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/21/19 5:25 AM, Bob Liu wrote:
The following is a description of a hang in blk_mq_freeze_queue_wait().
The hang happens on attempt to freeze a queue while another task does
queue unfreeze.

The root cause is an incorrect sequence of percpu_ref_resurrect() and
percpu_ref_kill() and as a result those two can be swapped:

  CPU#0                         CPU#1
  ----------------              -----------------
  q1 = blk_mq_init_queue(shared_tags)

                                 q2 = blk_mq_init_queue(shared_tags):
                                   blk_mq_add_queue_tag_set(shared_tags):
                                     blk_mq_update_tag_set_depth(shared_tags):
				     list_for_each_entry()
                                       blk_mq_freeze_queue(q1)
                                        > percpu_ref_kill()
                                        > blk_mq_freeze_queue_wait()

  blk_cleanup_queue(q1)
   blk_mq_freeze_queue(q1)
    > percpu_ref_kill()
                  ^^^^^^ freeze_depth can't guarantee the order

                                       blk_mq_unfreeze_queue()
                                         > percpu_ref_resurrect()

    > blk_mq_freeze_queue_wait()
                  ^^^^^^ Hang here!!!!

This wrong sequence raises kernel warning:
percpu_ref_kill_and_confirm called more than once on blk_queue_usage_counter_release!
WARNING: CPU: 0 PID: 11854 at lib/percpu-refcount.c:336 percpu_ref_kill_and_confirm+0x99/0xb0

But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(),
which waits for a zero of a q_usage_counter, which never happens
because percpu-ref was reinited (instead of being killed) and stays in
PERCPU state forever.

How to reproduce:
  - "insmod null_blk.ko shared_tags=1 nr_devices=0 queue_mode=2"
  - cpu0: python Script.py 0; taskset the corresponding process running on cpu0
  - cpu1: python Script.py 1; taskset the corresponding process running on cpu1

  Script.py:
  ------
  #!/usr/bin/python3

import os
import sys

while True:
     on = "echo 1 > /sys/kernel/config/nullb/%s/power" % sys.argv[1]
     off = "echo 0 > /sys/kernel/config/nullb/%s/power" % sys.argv[1]
     os.system(on)
     os.system(off)
------

This bug was first reported and fixed by Roman, previous discussion:
[1] Message id: 1443287365-4244-7-git-send-email-akinobu.mita@xxxxxxxxx
[2] Message id: 1443563240-29306-6-git-send-email-tj@xxxxxxxxxx
[3] https://patchwork.kernel.org/patch/9268199/

Signed-off-by: Roman Pen <roman.penyaev@xxxxxxxxxxxxxxxx>
Signed-off-by: Bob Liu <bob.liu@xxxxxxxxxx>
Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx>
Reviewed-by: Bart Van Assche <bvanassche@xxxxxxx>
---
  block/blk-core.c       |  3 ++-
  block/blk-mq.c         | 19 ++++++++++---------
  include/linux/blkdev.h |  7 ++++++-
  3 files changed, 18 insertions(+), 11 deletions(-)

Reviewed-by: Hannes Reinecke <hare@xxxxxxxx>

Cheers,

Hannes
--
Dr. Hannes Reinecke		               zSeries & Storage
hare@xxxxxxxx			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux