On Thu, Oct 17, 2024 at 10:57 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > On Thu, Oct 17, 2024 at 06:30:08PM GMT, Joanne Koong wrote: > > On Tue, Oct 15, 2024 at 3:01 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > On Mon, 14 Oct 2024 at 20:23, Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > > > > > > This change sets AS_NO_WRITEBACK_RECLAIM on the inode mapping so that > > > > FUSE folios are not reclaimed and waited on while in writeback, and > > > > removes the temporary folio + extra copying and the internal rb tree. > > > > > > What about sync(2)? And page migration? > > > > > > Hopefully there are no other cases, but I think a careful review of > > > places where generic code waits for writeback is needed before we can > > > say for sure. > > > > The places where I see this potential deadlock still being possible are: > > * page migration when handling a page fault: > > In particular, this path: handle_mm_fault() -> > > __handle_mm_fault() -> handle_pte_fault() -> do_numa_page() -> > > migrate_misplaced_folio() -> migrate_pages() -> migrate_pages_sync() > > -> migrate_pages_batch() -> migrate_folio_unmap() -> > > folio_wait_writeback() > > So, this is numa fault and if fuse server is not mapping the fuse folios > which it is serving, in its address space then this is not an issue. > However hugepage allocation on page fault can cause compaction which > might migrate unrelated fuse folios. So, fuse server doing compaction > is an issue and we need to resolve similar to reclaim codepath. (Though > I think for THP it is not doing MIGRATE_SYNC but doing for gigantic > hugetlb pages). Thanks for the explanation. Would you mind pointing me to the compaction function where this triggers the migrate? Is this in compact_zone() where it calls migrate_pages() on the cc->migratepages list? > > > * syscalls that trigger waits on writeback, which will lead to > > deadlock if a single-threaded fuse server calls this when servicing > > requests: > > - sync(), sync_file_range(), fsync(), fdatasync() > > - swapoff() > > - move_pages() > > > > I need to analyze the page fault path more to get a clearer picture of > > what is happening, but so far this looks like a valid case for a > > correctly written fuse server to run into. > > For the syscalls however, is it valid/safe in general (disregarding > > the writeback deadlock scenario for a minute) for fuse servers to be > > invoking these syscalls in their handlers anyways? > > > > The other places where I see a generic wait on writeback seem safe: > > * splice, page_cache_pipe_buf_try_steal() (fs/splice.c): > > We hit this in fuse when we try to move a page from the pipe buffer > > into the page cache (fuse_try_move_page()) for the SPLICE_F_MOVE case. > > This wait seems fine, since the folio that's being waited on is the > > folio in the pipe buffer which is not a fuse folio. > > * memory failure (mm/memory_failure.c): > > Soft offlining a page and handling page memory failure - these can > > be triggered asynchronously (memory_failure_work_func()), but this > > should be fine for the fuse use case since the server isn't blocked on > > servicing any writeback requests while memory failure handling is > > waiting on writeback > > * page truncation (mm/truncate.c): > > Same here. These cases seem fine since the server isn't blocked on > > servicing writeback requests while truncation waits on writeback > > > > > > Thanks, > > Joanne > > > > > > > > Thanks, > > > Miklos