On Mon, Apr 12, 2021 at 3:23 PM Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: > > Hi Miklos, > > 在 2021/3/27 14:36, Baolin Wang 写道: > > We can meet below deadlock scenario when writing back dirty pages, and > > writing files at the same time. The deadlock scenario can be reproduced > > by: > > > > - A writeback worker thread A is trying to write a bunch of dirty pages by > > fuse_writepages(), and the fuse_writepages() will lock one page (named page 1), > > add it into rb_tree with setting writeback flag, and unlock this page 1, > > then try to lock next page (named page 2). > > > > - But at the same time a file writing can be triggered by another process B, > > to write several pages by fuse_perform_write(), the fuse_perform_write() > > will lock all required pages firstly, then wait for all writeback pages > > are completed by fuse_wait_on_page_writeback(). > > > > - Now the process B can already lock page 1 and page 2, and wait for page 1 > > waritehack is completed (page 1 is under writeback set by process A). But > > process A can not complete the writeback of page 1, since it is still > > waiting for locking page 2, which was locked by process B already. > > > > A deadlock is occurred. > > > > To fix this issue, we should make sure each page writeback is completed > > after lock the page in fuse_fill_write_pages() separately, and then write > > them together when all pages are stable. > > > > [1450578.772896] INFO: task kworker/u259:6:119885 blocked for more than 120 seconds. > > [1450578.796179] kworker/u259:6 D 0 119885 2 0x00000028 > > [1450578.796185] Workqueue: writeback wb_workfn (flush-0:78) > > [1450578.796188] Call trace: > > [1450578.798804] __switch_to+0xd8/0x148 > > [1450578.802458] __schedule+0x280/0x6a0 > > [1450578.806112] schedule+0x34/0xe8 > > [1450578.809413] io_schedule+0x20/0x40 > > [1450578.812977] __lock_page+0x164/0x278 > > [1450578.816718] write_cache_pages+0x2b0/0x4a8 > > [1450578.820986] fuse_writepages+0x84/0x100 [fuse] > > [1450578.825592] do_writepages+0x58/0x108 > > [1450578.829412] __writeback_single_inode+0x48/0x448 > > [1450578.834217] writeback_sb_inodes+0x220/0x520 > > [1450578.838647] __writeback_inodes_wb+0x50/0xe8 > > [1450578.843080] wb_writeback+0x294/0x3b8 > > [1450578.846906] wb_do_writeback+0x2ec/0x388 > > [1450578.850992] wb_workfn+0x80/0x1e0 > > [1450578.854472] process_one_work+0x1bc/0x3f0 > > [1450578.858645] worker_thread+0x164/0x468 > > [1450578.862559] kthread+0x108/0x138 > > [1450578.865960] INFO: task doio:207752 blocked for more than 120 seconds. > > [1450578.888321] doio D 0 207752 207740 0x00000000 > > [1450578.888329] Call trace: > > [1450578.890945] __switch_to+0xd8/0x148 > > [1450578.894599] __schedule+0x280/0x6a0 > > [1450578.898255] schedule+0x34/0xe8 > > [1450578.901568] fuse_wait_on_page_writeback+0x8c/0xc8 [fuse] > > [1450578.907128] fuse_perform_write+0x240/0x4e0 [fuse] > > [1450578.912082] fuse_file_write_iter+0x1dc/0x290 [fuse] > > [1450578.917207] do_iter_readv_writev+0x110/0x188 > > [1450578.921724] do_iter_write+0x90/0x1c8 > > [1450578.925598] vfs_writev+0x84/0xf8 > > [1450578.929071] do_writev+0x70/0x110 > > [1450578.932552] __arm64_sys_writev+0x24/0x30 > > [1450578.936727] el0_svc_common.constprop.0+0x80/0x1f8 > > [1450578.941694] el0_svc_handler+0x30/0x80 > > [1450578.945606] el0_svc+0x10/0x14 > > > > Suggested-by: Peng Tao <tao.peng@xxxxxxxxxxxxxxxxx> > > Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> > > Do you have any comments for this patch set? Thanks. Hi, I guess this is related: https://lore.kernel.org/linux-fsdevel/20210209100115.GB1208880@xxxxxxxxxxxxxxxxxxxxxxxxx/ Can you verify that the patch at the above link fixes your issue? Thanks, Miklos