[bug report] ublk_drv: hang while removing ublk character device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Now ublk_drv has been pushed into master branch and I am running tests on it.
With newest(master) kernel and newest(master) ublksrv[1], a test case(generic/001) of ublksrv failed(hanged):

$sudo make test_all
make -s -C ubdsrv/tests run_test_all R=10
running generic/001
        run fio with delete ublk-loop test
        run fio on ublk(uring_comp 1) with delete 1

and the dmesg shows:

[Wed Aug  3 19:07:28 2022] INFO: task ublk:44727 blocked for more than 122 seconds.
[Wed Aug  3 19:07:28 2022]       Tainted: G S          E      5.19.0 #117
[Wed Aug  3 19:07:28 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Aug  3 19:07:28 2022] task:ublk            state:D stack:    0 pid:44727 ppid: 44650 flags:0x00004000
[Wed Aug  3 19:07:28 2022] Call Trace:
[Wed Aug  3 19:07:28 2022]  <TASK>
[Wed Aug  3 19:07:28 2022]  __schedule+0x212/0x600
[Wed Aug  3 19:07:28 2022]  schedule+0x5d/0xd0
[Wed Aug  3 19:07:28 2022]  ublk_ctrl_del_dev+0x133/0x1c0
[Wed Aug  3 19:07:28 2022]  ? cpuacct_percpu_seq_show+0x10/0x10
[Wed Aug  3 19:07:28 2022]  ublk_ctrl_uring_cmd+0x1a7/0x1e0
[Wed Aug  3 19:07:28 2022]  ? io_uring_cmd_prep+0x30/0x30
[Wed Aug  3 19:07:28 2022]  io_uring_cmd+0x55/0xe0
[Wed Aug  3 19:07:28 2022]  io_issue_sqe+0x196/0x310
[Wed Aug  3 19:07:28 2022]  io_submit_sqes+0x116/0x370
[Wed Aug  3 19:07:28 2022]  __do_sys_io_uring_enter+0x313/0x5a0
[Wed Aug  3 19:07:28 2022]  do_syscall_64+0x35/0x80
[Wed Aug  3 19:07:28 2022]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[Wed Aug  3 19:07:28 2022] RIP: 0033:0x7f6de1c13936
[Wed Aug  3 19:07:28 2022] RSP: 002b:00007ffcdbf42bc8 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
[Wed Aug  3 19:07:28 2022] RAX: ffffffffffffffda RBX: 0000000000442f60 RCX: 00007f6de1c13936
[Wed Aug  3 19:07:28 2022] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000004
[Wed Aug  3 19:07:28 2022] RBP: 0000000000442f60 R08: 0000000000000000 R09: 0000000000000008
[Wed Aug  3 19:07:28 2022] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[Wed Aug  3 19:07:28 2022] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[Wed Aug  3 19:07:28 2022]  </TASK>

My environment:

(1) kernel: master(head is e2b542100719a93f8cdf6d90185410d38a57a4c1)
(2) ubdsrv: master(head is 304151a7ef031413df26302e86b457eb1bad908f)
(3) liburing: 2.2 release [2]

How to reproduce:

(1) clone kernel master branch. Please make sure that ublk_drv.c is in drivers/block directory.
(2) build the kernel, ublk_drv should be a module(M) or built-in(*).
(3) modprobe ublk_drv(if you choose 'M' while configuring the kernel)
(4) clone ming's ublksrv[1] and make. You should use gcc-10(or higher) and liburing(I choose 2.2[2])
(4) run tests by: make test_all

You should find that the first test: generic/001 hangs and the kernel prints message shown above.

My analysis:

(1) ublk_ctrl_del_dev+0x133 should be drivers/block/ublk_drv.c:1387. It is:
    wait_event(ublk_idr_wq, ublk_idr_freed(idx)) called in ublk_ctrl_del_dev()

(2) We hang beacuse we are infinitely waiting for a freed idr(such as idx 0 for /dev/ublkc0).

(3) This idr should be freed while calling ublk_cdev_rel()
    which is set as ->release() method for one ublk character device(such as /dev/ublkc0).

(4) I think ublk_cdev_rel() is not correctly called while removing /dev/ublkc0. Then the
    infinite wait_event happens.

[1] https://github.com/ming1/ubdsrv
[2] https://github.com/axboe/liburing/releases/tag/liburing-2.2



-- 
Ziyang Zhang



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux