Re: [bug report] io_uring: fsfreeze deadlocks when performing O_DIRECT writes

Jens Axboe <axboe@xxxxxxxxx> · Thu, 31 Oct 2024 07:54:19 -0600

On 10/31/24 5:20 AM, Peter Mann wrote:
> Hello,
> 
> it appears that there is a high probability of a deadlock occuring when performing fsfreeze on a filesystem which is currently performing multiple io_uring O_DIRECT writes.
> 
> Steps to reproduce:
> 1. Mount xfs or ext4 filesystem on /mnt
> 
> 2. Start writing to the filesystem. Must use io_uring, direct io and iodepth>1 to reproduce:
> fio --ioengine=io_uring --direct=1 --bs=4k --size=100M --rw=randwrite --loops=100000 --iodepth=32 --name=test --filename=/mnt/fio_test
> 
> 3. Run this in another shell. For me it deadlocks almost immediately:
> while true; do fsfreeze -f /mnt/; echo froze; fsfreeze -u /mnt/; echo unfroze; done
> 
> 4. Fsfreeze and all tasks attempting to write /mnt get stuck:
> At this point all stuck processes cannot be killed by SIGKILL and they are stuck in uninterruptible sleep.
> If you try 'touch /mnt/a' for example, the new process gets stuck in the exact same way as well.
> 
> This gets printed when running 6.11.4 with some debug options enabled:
> [  539.586122] Showing all locks held in the system:
> [  539.612972] 1 lock held by khungtaskd/35:
> [  539.626204]  #0: ffffffffb3b1c100 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x32/0x1e0
> [  539.640561] 1 lock held by dmesg/640:
> [  539.654282]  #0: ffff9fd541a8e0e0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x74/0x2d0
> [  539.669220] 2 locks held by fio/647:
> [  539.684253]  #0: ffff9fd54fe720b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
> [  539.699565]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
> [  539.715587] 2 locks held by fio/648:
> [  539.732293]  #0: ffff9fd54fe710b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
> [  539.749121]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
> [  539.765484] 2 locks held by fio/649:
> [  539.781483]  #0: ffff9fd541a8f0b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
> [  539.798785]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
> [  539.815466] 2 locks held by fio/650:
> [  539.831966]  #0: ffff9fd54fe740b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
> [  539.849527]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
> [  539.867469] 1 lock held by fsfreeze/696:
> [  539.884565]  #0: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: freeze_super+0x20a/0x600
> 
> I reproduced this bug on nvme, sata ssd, virtio disks and lvm logical volumes.
> It deadlocks on all kernels that I tried (all on amd64):
> 6.12-rc5 (compiled from kernel.org)
> 6.11.4 (compiled from kernel.org)
> 6.10.11-1~bpo12+1 (debian)
> 6.1.0-23 (debian)
> 5.14.0-427.40.1.el9_4.x86_64 (rocky linux)
> 5.10.0-33-amd64 (debian)
> 
> I tried to compile some older ones to check if it's a regression, but
> those either didn't compile or didn't boot in my VM, sorry about that.
> If you have anything specific for me to try, I'm happy to help.
> 
> Found this issue as well, so it seems like it's not just me:
> https://gitlab.com/qemu-project/qemu/-/issues/881
> Note that mariadb 10.6 adds support for io_uring, and that proxmox backups perform fsfreeze in the guest VM.
> 
> Originally I discovered this after a scheduled lvm snapshot of mariadb
> got stuck. It appears that lvm calls dm_suspend, which then calls
> freeze_super, so it looks like the same bug to me. I discovered the
> simpler fsfreeze/fio reproduction method when I tried to find a
> workaround.

Thanks for the report! I'm pretty sure this is due to the freezing not
allowing task_work to run, which prevents completions from being run.
Hence you run into a situation where freezing isn't running the very IO
completions that will free up the rwsem, with IO issue being stuck on
the freeze having started.

I'll take a look...

-- 
Jens Axboe