On 10/28/24 22:58, Joanne Koong wrote: > On Fri, Oct 25, 2024 at 3:40 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: >> >>> Same here, I need to look some more into the compaction / page >>> migration paths. I'm planning to do this early next week and will >>> report back with what I find. >>> >> >> These are my notes so far: >> >> * We hit the folio_wait_writeback() path when callers call >> migrate_pages() with mode MIGRATE_SYNC >> ... -> migrate_pages() -> migrate_pages_sync() -> >> migrate_pages_batch() -> migrate_folio_unmap() -> >> folio_wait_writeback() >> >> * These are the places where we call migrate_pages(): >> 1) demote_folio_list() >> Can ignore this. It calls migrate_pages() in MIGRATE_ASYNC mode >> >> 2) __damon_pa_migrate_folio_list() >> Can ignore this. It calls migrate_pages() in MIGRATE_ASYNC mode >> >> 3) migrate_misplaced_folio() >> Can ignore this. It calls migrate_pages() in MIGRATE_ASYNC mode >> >> 4) do_move_pages_to_node() >> Can ignore this. This calls migrate_pages() in MIGRATE_SYNC mode but >> this path is only invoked by the move_pages() syscall. It's fine to >> wait on writeback for the move_pages() syscall since the user would >> have to deliberately invoke this on the fuse server for this to apply >> to the server's fuse folios >> >> 5) migrate_to_node() >> Can ignore this for the same reason as in 4. This path is only invoked >> by the migrate_pages() syscall. >> >> 6) do_mbind() >> Can ignore this for the same reason as 4 and 5. This path is only >> invoked by the mbind() syscall. >> >> 7) soft_offline_in_use_page() >> Can skip soft offlining fuse folios (eg folios with the >> AS_NO_WRITEBACK_WAIT mapping flag set). >> The path for this is soft_offline_page() -> soft_offline_in_use_page() >> -> migrate_pages(). soft_offline_page() only invokes this for in-use >> pages in a well-defined state (see ret value of get_hwpoison_page()). >> My understanding of soft offlining pages is that it's a mitigation >> strategy for handling pages that are experiencing errors but are not >> yet completely unusable, and its main purpose is to prevent future >> issues. It seems fine to skip this for fuse folios. >> >> 8) do_migrate_range() >> 9) compact_zone() >> 10) migrate_longterm_unpinnable_folios() >> 11) __alloc_contig_migrate_range() >> >> 8 to 11 needs more investigation / thinking about. I don't see a good >> way around these tbh. I think we have to operate under the assumption >> that the fuse server running is malicious or benevolently but >> incorrectly written and could possibly never complete writeback. So we >> definitely can't wait on these but it also doesn't seem like we can >> skip waiting on these, especially for the case where the server uses >> spliced pages, nor does it seem like we can just fail these with >> -EBUSY or something. I see some code paths with -EAGAIN in migration. Could you explain why we can't just fail migration for fuse write-back pages? >> > > I'm still not seeing a good way around this. > > What about this then? We add a new fuse sysctl called something like > "/proc/sys/fs/fuse/writeback_optimization_timeout" where if the sys > admin sets this, then it opts into optimizing writeback to be as fast > as possible (eg skipping the page copies) and if the server doesn't > fulfill the writeback by the set timeout value, then the connection is > aborted. > > Alternatively, we could also repurpose > /proc/sys/fs/fuse/max_request_timeout from the request timeout > patchset [1] but I like the additional flexibility and explicitness > having the "writeback_optimization_timeout" sysctl gives. > > Any thoughts on this? I'm a bit worried that we might lock up the system until time out is reached - not ideal. Especially as timeouts are in minutes now. But even a slightly stuttering video system not be great. I think we should give users/admin the choice then, if they prefer slow page copies or fast, but possibly shortly unresponsive system. Thank, Bernd