Re: [PATCH v2 2/2] fuse: remove tmp folio for writebacks and internal rb tree

Shakeel Butt <shakeel.butt@xxxxxxxxx> · Thu, 17 Oct 2024 22:57:51 -0700

On Thu, Oct 17, 2024 at 06:30:08PM GMT, Joanne Koong wrote:
> On Tue, Oct 15, 2024 at 3:01 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> >
> > On Mon, 14 Oct 2024 at 20:23, Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> >
> > > This change sets AS_NO_WRITEBACK_RECLAIM on the inode mapping so that
> > > FUSE folios are not reclaimed and waited on while in writeback, and
> > > removes the temporary folio + extra copying and the internal rb tree.
> >
> > What about sync(2)?   And page migration?
> >
> > Hopefully there are no other cases, but I think a careful review of
> > places where generic code waits for writeback is needed before we can
> > say for sure.
> 
> The places where I see this potential deadlock still being possible are:
> * page migration when handling a page fault:
>      In particular, this path: handle_mm_fault() ->
> __handle_mm_fault() -> handle_pte_fault() -> do_numa_page() ->
> migrate_misplaced_folio() -> migrate_pages() -> migrate_pages_sync()
> -> migrate_pages_batch() -> migrate_folio_unmap() ->
> folio_wait_writeback()

So, this is numa fault and if fuse server is not mapping the fuse folios
which it is serving, in its address space then this is not an issue.
However hugepage allocation on page fault can cause compaction which
might migrate unrelated fuse folios. So, fuse server doing compaction
is an issue and we need to resolve similar to reclaim codepath. (Though
I think for THP it is not doing MIGRATE_SYNC but doing for gigantic
hugetlb pages).

> * syscalls that trigger waits on writeback, which will lead to
> deadlock if a single-threaded fuse server calls this when servicing
> requests:
>     - sync(), sync_file_range(), fsync(), fdatasync()
>     - swapoff()
>     - move_pages()
> 
> I need to analyze the page fault path more to get a clearer picture of
> what is happening, but so far this looks like a valid case for a
> correctly written fuse server to run into.
> For the syscalls however, is it valid/safe in general (disregarding
> the writeback deadlock scenario for a minute) for fuse servers to be
> invoking these syscalls in their handlers anyways?
> 
> The other places where I see a generic wait on writeback seem safe:
> * splice, page_cache_pipe_buf_try_steal() (fs/splice.c):
>    We hit this in fuse when we try to move a page from the pipe buffer
> into the page cache (fuse_try_move_page()) for the SPLICE_F_MOVE case.
> This wait seems fine, since the folio that's being waited on is the
> folio in the pipe buffer which is not a fuse folio.
> * memory failure (mm/memory_failure.c):
>    Soft offlining a page and handling page memory failure - these can
> be triggered asynchronously (memory_failure_work_func()), but this
> should be fine for the fuse use case since the server isn't blocked on
> servicing any writeback requests while memory failure handling is
> waiting on writeback
> * page truncation (mm/truncate.c):
>    Same here. These cases seem fine since the server isn't blocked on
> servicing writeback requests while truncation waits on writeback
> 
> 
> Thanks,
> Joanne
> 
> >
> > Thanks,
> > Miklos