On Fri, Nov 11, 2022 at 12:14 PM Song Liu <song@xxxxxxxxxx> wrote: > > On Thu, Nov 10, 2022 at 10:25 AM Zhang Tianci > <zhangtianci.1997@xxxxxxxxxxxxx> wrote: > > > > > > fio -filename=testfile -ioengine=libaio -bs=16M -size=10G -numjobs=100 > > > -iodepth=1 -runtime=60 > > > -rw=write -group_reporting -name="test" > > > > > > Then I found the first deadlock state, but it is not the real reason. > > > > > > I will do a test with the latest kernel. I will report to you the result later. > > > > > I can reproduce the first deadlock in linux-6.1-rc4. > > There are 26 stripe_head and 26 fio threads blocked with same backtrace: > > > > #0 [ffffc9000cd0f8b0] __schedule at ffffffff818b3c3c > > #1 [ffffc9000cd0f940] schedule at ffffffff818b4313 > > #2 [ffffc9000cd0f950] md_bitmap_startwrite at ffffffffc063354a [md_mod] > > #3 [ffffc9000cd0f9c0] __add_stripe_bio at ffffffffc064fbd6 [raid456] > > #4 [ffffc9000cd0fa00] raid5_make_request at ffffffffc065a84c [raid456] > > #5 [ffffc9000cd0fb30] md_handle_request at ffffffffc0628496 [md_mod] > > #6 [ffffc9000cd0fb98] __submit_bio at ffffffff813f308f > > #7 [ffffc9000cd0fbb8] submit_bio_noacct_nocheck at ffffffff813f3501 > > #8 [ffffc9000cd0fc00] __block_write_full_page at ffffffff8134ca64 > > #9 [ffffc9000cd0fc60] __writepage at ffffffff8123f4a3 > > #10 [ffffc9000cd0fc78] write_cache_pages at ffffffff8123fb57 > > #11 [ffffc9000cd0fd70] generic_writepages at ffffffff8123feef > > #12 [ffffc9000cd0fdc0] do_writepages at ffffffff81241f12 > > #13 [ffffc9000cd0fe28] filemap_fdatawrite_wbc at ffffffff8123306b > > #14 [ffffc9000cd0fe48] __filemap_fdatawrite_range at ffffffff81239154 > > #15 [ffffc9000cd0fec0] file_write_and_wait_range at ffffffff812393e1 > > #16 [ffffc9000cd0fef0] blkdev_fsync at ffffffff813ec223 > > #17 [ffffc9000cd0ff08] do_fsync at ffffffff81342798 > > #18 [ffffc9000cd0ff30] __x64_sys_fsync at ffffffff813427e0 > > #19 [ffffc9000cd0ff38] do_syscall_64 at ffffffff818a6114 > > #20 [ffffc9000cd0ff50] entry_SYSCALL_64_after_hwframe at ffffffff81a0009b > > Thanks for this information. > > I guess this is with COUNTER_MAX of 4? And it is slightly different to the > issue you found? Yes, I hack COUNTER_MAX to 4, I think this could increase the probability of bitmap counter racing. And this kind of deadlock is very difficult to happen without hacking. It just happened when I debugged, but it help me find a guess(the second deadlock state in the first email) about the real reason. > > I will try to look into this next week (taking some time off this week). Thanks, Tianci