On May 05, 2023 / 11:28, Ziyang Zhang wrote: > Hi, > > ublk can passthrough I/O requests to userspce daemons. It is very important > to test ublk crash handling since the userspace part is not reliable. > Especially we should test removing device, killing ublk daemons and user > recovery feature. > > The first patch add user recovery support in miniublk. > > The second patch add five new tests for ublk to cover above cases. > > V2: > - Check parameters in recovery > - Add one small delay before deleting device > - Write informative description Ziyang, thanks for the v2 patches and sorry for this slow response. Please find my comments in line. FYI, I also ran the new test cases on kernel v6.4-rc2, and observed failure of ublk/001. The failure cause is the lockdep WARN [1]. The test case already found an issue, so it proves that the test is valuable :) [1] [ 204.288195] run blktests ublk/001 at 2023-05-16 17:52:14 [ 206.755085] ====================================================== [ 206.756063] WARNING: possible circular locking dependency detected [ 206.756595] 6.4.0-rc2 #6 Not tainted [ 206.756924] ------------------------------------------------------ [ 206.757436] iou-wrk-1070/1071 is trying to acquire lock: [ 206.757891] ffff88811f1420a8 (&ctx->uring_lock){+.+.}-{3:3}, at: __io_req_complete_post+0x792/0xd50 [ 206.758625] but task is already holding lock: [ 206.759166] ffff88812c3f66c0 (&ub->mutex){+.+.}-{3:3}, at: ublk_stop_dev+0x2b/0x400 [ublk_drv] [ 206.759865] which lock already depends on the new lock. [ 206.760623] the existing dependency chain (in reverse order) is: [ 206.761282] -> #1 (&ub->mutex){+.+.}-{3:3}: [ 206.761811] __mutex_lock+0x185/0x18b0 [ 206.762192] ublk_ch_uring_cmd+0x511/0x1630 [ublk_drv] [ 206.762678] io_uring_cmd+0x1ec/0x3d0 [ 206.763081] io_issue_sqe+0x461/0xb70 [ 206.763477] io_submit_sqes+0x794/0x1c50 [ 206.763857] __do_sys_io_uring_enter+0x736/0x1ce0 [ 206.764368] do_syscall_64+0x5c/0x90 [ 206.764724] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 206.765244] -> #0 (&ctx->uring_lock){+.+.}-{3:3}: [ 206.765813] __lock_acquire+0x2f25/0x5f00 [ 206.766272] lock_acquire+0x1a9/0x4e0 [ 206.766633] __mutex_lock+0x185/0x18b0 [ 206.767042] __io_req_complete_post+0x792/0xd50 [ 206.767500] io_uring_cmd_done+0x27d/0x300 [ 206.767918] ublk_cancel_dev+0x1c6/0x410 [ublk_drv] [ 206.768416] ublk_stop_dev+0x2ad/0x400 [ublk_drv] [ 206.768853] ublk_ctrl_uring_cmd+0x14fd/0x3bf0 [ublk_drv] [ 206.769411] io_uring_cmd+0x1ec/0x3d0 [ 206.769772] io_issue_sqe+0x461/0xb70 [ 206.770175] io_wq_submit_work+0x2b5/0x710 [ 206.770600] io_worker_handle_work+0x6b8/0x1620 [ 206.771066] io_wq_worker+0x4ef/0xb50 [ 206.771461] ret_from_fork+0x2c/0x50 [ 206.771817] other info that might help us debug this: [ 206.773807] Possible unsafe locking scenario: [ 206.775596] CPU0 CPU1 [ 206.776607] ---- ---- [ 206.777604] lock(&ub->mutex); [ 206.778496] lock(&ctx->uring_lock); [ 206.779601] lock(&ub->mutex); [ 206.780656] lock(&ctx->uring_lock); [ 206.781561] *** DEADLOCK *** [ 206.783778] 1 lock held by iou-wrk-1070/1071: [ 206.784697] #0: ffff88812c3f66c0 (&ub->mutex){+.+.}-{3:3}, at: ublk_stop_dev+0x2b/0x400 [ublk_drv] [ 206.786005] stack backtrace: [ 206.787493] CPU: 1 PID: 1071 Comm: iou-wrk-1070 Not tainted 6.4.0-rc2 #6 [ 206.788576] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 [ 206.789819] Call Trace: [ 206.790617] <TASK> [ 206.791395] dump_stack_lvl+0x57/0x90 [ 206.792284] check_noncircular+0x27b/0x310 [ 206.793168] ? __pfx_mark_lock+0x10/0x10 [ 206.794068] ? __pfx_check_noncircular+0x10/0x10 [ 206.795017] ? lock_acquire+0x1a9/0x4e0 [ 206.795871] ? lockdep_lock+0xca/0x1c0 [ 206.796750] ? __pfx_lockdep_lock+0x10/0x10 [ 206.797665] __lock_acquire+0x2f25/0x5f00 [ 206.798569] ? __pfx___lock_acquire+0x10/0x10 [ 206.799492] ? try_to_wake_up+0x806/0x1a30 [ 206.800395] ? __pfx_lock_release+0x10/0x10 [ 206.801306] lock_acquire+0x1a9/0x4e0 [ 206.802143] ? __io_req_complete_post+0x792/0xd50 [ 206.803092] ? __pfx_lock_acquire+0x10/0x10 [ 206.803998] ? lock_is_held_type+0xce/0x120 [ 206.804866] ? find_held_lock+0x2d/0x110 [ 206.805760] ? __pfx___might_resched+0x10/0x10 [ 206.806684] ? lock_release+0x378/0x650 [ 206.807568] __mutex_lock+0x185/0x18b0 [ 206.808440] ? __io_req_complete_post+0x792/0xd50 [ 206.809379] ? mark_held_locks+0x96/0xe0 [ 206.810359] ? __io_req_complete_post+0x792/0xd50 [ 206.811294] ? _raw_spin_unlock_irqrestore+0x4c/0x60 [ 206.812208] ? lockdep_hardirqs_on+0x7d/0x100 [ 206.813078] ? __pfx___mutex_lock+0x10/0x10 [ 206.813936] ? __wake_up_common_lock+0xe8/0x150 [ 206.814817] ? __pfx___wake_up_common_lock+0x10/0x10 [ 206.815736] ? percpu_counter_add_batch+0x9f/0x160 [ 206.816643] ? __io_req_complete_post+0x792/0xd50 [ 206.817541] __io_req_complete_post+0x792/0xd50 [ 206.818429] ? mark_held_locks+0x96/0xe0 [ 206.819276] io_uring_cmd_done+0x27d/0x300 [ 206.820129] ? kasan_quarantine_put+0xd6/0x1e0 [ 206.821015] ? __pfx_io_uring_cmd_done+0x10/0x10 [ 206.821915] ? per_cpu_remove_cache+0x80/0x80 [ 206.822794] ? slab_free_freelist_hook+0x9e/0x1c0 [ 206.823697] ublk_cancel_dev+0x1c6/0x410 [ublk_drv] [ 206.824665] ? kobject_put+0x190/0x4a0 [ 206.825503] ublk_stop_dev+0x2ad/0x400 [ublk_drv] [ 206.826410] ublk_ctrl_uring_cmd+0x14fd/0x3bf0 [ublk_drv] [ 206.827377] ? __pfx_ublk_ctrl_uring_cmd+0x10/0x10 [ublk_drv] [ 206.828376] ? selinux_uring_cmd+0x1cc/0x260 [ 206.829268] ? __pfx_selinux_uring_cmd+0x10/0x10 [ 206.830169] ? lock_acquire+0x1a9/0x4e0 [ 206.831007] io_uring_cmd+0x1ec/0x3d0 [ 206.831833] io_issue_sqe+0x461/0xb70 [ 206.832651] io_wq_submit_work+0x2b5/0x710 [ 206.833488] io_worker_handle_work+0x6b8/0x1620 [ 206.834345] io_wq_worker+0x4ef/0xb50 [ 206.835143] ? __pfx_io_wq_worker+0x10/0x10 [ 206.835979] ? lock_release+0x378/0x650 [ 206.836784] ? ret_from_fork+0x12/0x50 [ 206.837586] ? __pfx_lock_release+0x10/0x10 [ 206.838419] ? do_raw_spin_lock+0x12e/0x270 [ 206.839250] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 206.840111] ? __pfx_io_wq_worker+0x10/0x10 [ 206.840947] ret_from_fork+0x2c/0x50 [ 206.841738] </TASK>