Re: [PATCH] [fuse] alloc_page nofs avoid deadlock

Ed Tsai <ed.tsai@xxxxxxxxxxxx> · Tue, 12 Jul 2022 09:16:39 +0800

On Mon, 2022-07-11 at 15:49 +0800, Miklos Szeredi wrote:
> On Mon, 13 Jun 2022 at 11:29, Ed Tsai <ed.tsai@xxxxxxxxxxxx> wrote:
> > 
> > On Mon, 2022-06-13 at 16:45 +0800, Miklos Szeredi wrote:
> > > On Fri, 10 Jun 2022 at 09:48, Ed Tsai <ed.tsai@xxxxxxxxxxxx>
> > > wrote:
> > > 
> > > > Recently, we get this deadlock issue again.
> > > > fuse_flush_time_update()
> > > > use sync_inode_metadata() and it only write the metadata, so
> > > > the
> > > > writeback worker could still be blocked becaused of file data.
> > > > 
> > > > I try to use write_inode_now() instead of sync_inode_metadata()
> > > > and
> > > > the
> > > > writeback thread will not be blocked anymore. I don't think
> > > > this is
> > > > a
> > > > good solution, but this confirm that there is still a potential
> > > > deadlock because of file data. WDYT.
> > > 
> > > I'm not sure how that happens.  Normally writeback doesn't
> > > block.  Can
> > > you provide the stack traces of all related tasks in the
> > > deadlock?
> > > 
> > > Thanks,
> > > Miklos
> > 
> > The writeback worker
> > ppid=22915 pid=22915 S cpu=6 prio=120 wait=3614s kworker/u16:21
> > vmlinux  request_wait_answer + 64
> > vmlinux  __fuse_request_send + 328
> > vmlinux  fuse_request_send + 60
> > vmlinux  fuse_simple_request + 376
> > vmlinux  fuse_flush_times + 276
> > vmlinux  fuse_write_inode + 104 (inode=0xFFFFFFD516CC4780, ff=0)
> > vmlinux  write_inode + 384
> > vmlinux  __writeback_single_inode + 960
> > vmlinux  writeback_sb_inodes + 892
> > vmlinux  __writeback_inodes_wb + 156
> > vmlinux  wb_writeback + 512
> > vmlinux  wb_check_background_flush + 600
> > vmlinux  wb_do_writeback + 644
> > vmlinux  wb_workfn + 756
> > vmlinux  process_one_work + 628
> > vmlinux  worker_thread + 708
> > vmlinux  kthread + 376
> > vmlinux  ret_from_fork + 16
> > 
> > Thread-11
> > ppid=3961 pid=26057 D cpu=4 prio=120 wait=3614s Thread-11
> > vmlinux  __inode_wait_for_writeback + 108
> > vmlinux  inode_wait_for_writeback + 156
> > vmlinux  evict + 160
> > vmlinux  iput_final + 292
> > vmlinux  iput + 600
> > vmlinux  dentry_unlink_inode + 212
> > vmlinux  __dentry_kill + 228
> > vmlinux  shrink_dentry_list + 408
> > vmlinux  prune_dcache_sb + 80
> > vmlinux  super_cache_scan + 272
> > vmlinux  do_shrink_slab + 944
> > vmlinux  shrink_slab + 1104
> > vmlinux  shrink_node + 712
> > vmlinux  shrink_zones + 188
> > vmlinux  do_try_to_free_pages + 348
> > vmlinux  try_to_free_pages + 848
> > vmlinux  __perform_reclaim + 64
> > vmlinux  __alloc_pages_direct_reclaim + 64
> > vmlinux  __alloc_pages_slowpath + 1296
> > vmlinux  __alloc_pages_nodemask + 2004
> > vmlinux  __alloc_pages + 16
> > vmlinux  __alloc_pages_node + 16
> > vmlinux  alloc_pages_node + 16
> > vmlinux  __read_swap_cache_async + 172
> > vmlinux  read_swap_cache_async + 12
> > vmlinux  swapin_readahead + 328
> > vmlinux  do_swap_page + 844
> > vmlinux  handle_pte_fault + 268
> > vmlinux  __handle_speculative_fault + 548
> > vmlinux  handle_speculative_fault + 44
> > vmlinux  do_page_fault + 500
> > vmlinux  do_translation_fault + 64
> > vmlinux  do_mem_abort + 72
> > vmlinux  el0_sync + 1032
> > 
> > ppid=3961 is com.google.android.providers.media.module, and it is
> > the
> > android fuse daemon.
> > 
> > So, the daemon and wb worker were wait for each other.
> 
> Is commit 5c791fe1e2a4 ("fuse: make sure reclaim doesn't write the
> inode") applied to this kernel?
> 
> Thanks,
> Miklos

Yes, it has been applied to our kernel.

fuse_flush_time_update() only write metadata to disk in that patch.
Here, I tried to use write_inode_now() instead of
sync_inode_metadata(), and no more system hang is observed. So I
suppose the deadlock is caused by inode data.

Best,
Ed Tsai