Re: [PATCH v2 2/2] fuse: remove tmp folio for writebacks and internal rb tree

Joanne Koong <joannelkoong@xxxxxxxxx> · Fri, 18 Oct 2024 12:57:08 -0700

On Thu, Oct 17, 2024 at 10:57 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
>
> On Thu, Oct 17, 2024 at 06:30:08PM GMT, Joanne Koong wrote:
> > On Tue, Oct 15, 2024 at 3:01 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> > >
> > > On Mon, 14 Oct 2024 at 20:23, Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> > >
> > > > This change sets AS_NO_WRITEBACK_RECLAIM on the inode mapping so that
> > > > FUSE folios are not reclaimed and waited on while in writeback, and
> > > > removes the temporary folio + extra copying and the internal rb tree.
> > >
> > > What about sync(2)?   And page migration?
> > >
> > > Hopefully there are no other cases, but I think a careful review of
> > > places where generic code waits for writeback is needed before we can
> > > say for sure.
> >
> > The places where I see this potential deadlock still being possible are:
> > * page migration when handling a page fault:
> >      In particular, this path: handle_mm_fault() ->
> > __handle_mm_fault() -> handle_pte_fault() -> do_numa_page() ->
> > migrate_misplaced_folio() -> migrate_pages() -> migrate_pages_sync()
> > -> migrate_pages_batch() -> migrate_folio_unmap() ->
> > folio_wait_writeback()
>
> So, this is numa fault and if fuse server is not mapping the fuse folios
> which it is serving, in its address space then this is not an issue.
> However hugepage allocation on page fault can cause compaction which
> might migrate unrelated fuse folios. So, fuse server doing compaction
> is an issue and we need to resolve similar to reclaim codepath. (Though
> I think for THP it is not doing MIGRATE_SYNC but doing for gigantic
> hugetlb pages).

Thanks for the explanation. Would you mind pointing me to the
compaction function where this triggers the migrate? Is this in
compact_zone() where it calls migrate_pages() on the cc->migratepages
list?

>
> > * syscalls that trigger waits on writeback, which will lead to
> > deadlock if a single-threaded fuse server calls this when servicing
> > requests:
> >     - sync(), sync_file_range(), fsync(), fdatasync()
> >     - swapoff()
> >     - move_pages()
> >
> > I need to analyze the page fault path more to get a clearer picture of
> > what is happening, but so far this looks like a valid case for a
> > correctly written fuse server to run into.
> > For the syscalls however, is it valid/safe in general (disregarding
> > the writeback deadlock scenario for a minute) for fuse servers to be
> > invoking these syscalls in their handlers anyways?
> >
> > The other places where I see a generic wait on writeback seem safe:
> > * splice, page_cache_pipe_buf_try_steal() (fs/splice.c):
> >    We hit this in fuse when we try to move a page from the pipe buffer
> > into the page cache (fuse_try_move_page()) for the SPLICE_F_MOVE case.
> > This wait seems fine, since the folio that's being waited on is the
> > folio in the pipe buffer which is not a fuse folio.
> > * memory failure (mm/memory_failure.c):
> >    Soft offlining a page and handling page memory failure - these can
> > be triggered asynchronously (memory_failure_work_func()), but this
> > should be fine for the fuse use case since the server isn't blocked on
> > servicing any writeback requests while memory failure handling is
> > waiting on writeback
> > * page truncation (mm/truncate.c):
> >    Same here. These cases seem fine since the server isn't blocked on
> > servicing writeback requests while truncation waits on writeback
> >
> >
> > Thanks,
> > Joanne
> >
> > >
> > > Thanks,
> > > Miklos