On Thu, Dec 19, 2024 at 11:14:49AM -0500, Zi Yan wrote: > On 19 Dec 2024, at 11:09, Bernd Schubert wrote: > > > On 12/19/24 17:02, Zi Yan wrote: > >> On 19 Dec 2024, at 11:00, Zi Yan wrote: > >>> On 19 Dec 2024, at 10:56, Bernd Schubert wrote: > >>> > >>>> On 12/19/24 16:55, Zi Yan wrote: > >>>>> On 19 Dec 2024, at 10:53, Shakeel Butt wrote: > >>>>> > >>>>>> On Thu, Dec 19, 2024 at 04:47:18PM +0100, David Hildenbrand wrote: > >>>>>>> On 19.12.24 16:43, Shakeel Butt wrote: > >>>>>>>> On Thu, Dec 19, 2024 at 02:05:04PM +0100, David Hildenbrand wrote: > >>>>>>>>> On 23.11.24 00:23, Joanne Koong wrote: > >>>>>>>>>> For migrations called in MIGRATE_SYNC mode, skip migrating the folio if > >>>>>>>>>> it is under writeback and has the AS_WRITEBACK_INDETERMINATE flag set on its > >>>>>>>>>> mapping. If the AS_WRITEBACK_INDETERMINATE flag is set on the mapping, the > >>>>>>>>>> writeback may take an indeterminate amount of time to complete, and > >>>>>>>>>> waits may get stuck. > >>>>>>>>>> > >>>>>>>>>> Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> > >>>>>>>>>> Reviewed-by: Shakeel Butt <shakeel.butt@xxxxxxxxx> > >>>>>>>>>> --- > >>>>>>>>>> mm/migrate.c | 5 ++++- > >>>>>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-) > >>>>>>>>>> > >>>>>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c > >>>>>>>>>> index df91248755e4..fe73284e5246 100644 > >>>>>>>>>> --- a/mm/migrate.c > >>>>>>>>>> +++ b/mm/migrate.c > >>>>>>>>>> @@ -1260,7 +1260,10 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, > >>>>>>>>>> */ > >>>>>>>>>> switch (mode) { > >>>>>>>>>> case MIGRATE_SYNC: > >>>>>>>>>> - break; > >>>>>>>>>> + if (!src->mapping || > >>>>>>>>>> + !mapping_writeback_indeterminate(src->mapping)) > >>>>>>>>>> + break; > >>>>>>>>>> + fallthrough; > >>>>>>>>>> default: > >>>>>>>>>> rc = -EBUSY; > >>>>>>>>>> goto out; > >>>>>>>>> > >>>>>>>>> Ehm, doesn't this mean that any fuse user can essentially completely block > >>>>>>>>> CMA allocations, memory compaction, memory hotunplug, memory poisoning... ?! > >>>>>>>>> > >>>>>>>>> That sounds very bad. > >>>>>>>> > >>>>>>>> The page under writeback are already unmovable while they are under > >>>>>>>> writeback. This patch is only making potentially unrelated tasks to > >>>>>>>> synchronously wait on writeback completion for such pages which in worst > >>>>>>>> case can be indefinite. This actually is solving an isolation issue on a > >>>>>>>> multi-tenant machine. > >>>>>>>> > >>>>>>> Are you sure, because I read in the cover letter: > >>>>>>> > >>>>>>> "In the current FUSE writeback design (see commit 3be5a52b30aa ("fuse: > >>>>>>> support writable mmap"))), a temp page is allocated for every dirty > >>>>>>> page to be written back, the contents of the dirty page are copied over to > >>>>>>> the temp page, and the temp page gets handed to the server to write back. > >>>>>>> This is done so that writeback may be immediately cleared on the dirty > >>>>>>> page," > >>>>>>> > >>>>>>> Which to me means that they are immediately movable again? > >>>>>> > >>>>>> Oh sorry, my mistake, yes this will become an isolation issue with the > >>>>>> removal of the temp page in-between which this series is doing. I think > >>>>>> the tradeoff is between extra memory plus slow write performance versus > >>>>>> temporary unmovable memory. > >>>>> > >>>>> No, the tradeoff is slow FUSE performance vs whole system slowdown due to > >>>>> memory fragmentation. AS_WRITEBACK_INDETERMINATE indicates it is not > >>>>> temporary. > >>>> > >>>> Is there is a difference between FUSE TMP page being unmovable and > >>>> AS_WRITEBACK_INDETERMINATE folios/pages being unmovable? > >> > >> (Fix my response location) > >> > >> Both are unmovable, but you can control where FUSE TMP page > >> can come from to avoid spread across the entire memory space. For example, > >> allocate a contiguous region as a TMP page pool. > > > > Wouldn't it make sense to have that for fuse writeback pages as well? > > Fuse tries to limit dirty pages anyway. > > Can fuse constraint the location of writeback pages? Something like what > I proposed[1], migrating pages to a location before their writeback? Will > that be a performance concern? > > In terms of the number of dirty pages, you only need one page out of 512 > pages to prevent 2MB THP from allocation. For CMA allocation, one unmovable > page can kill one contiguous range. What is the limit of fuse dirty pages? > > [1] https://lore.kernel.org/linux-mm/90C41581-179F-40B6-9801-9C9DBBEB1AF4@xxxxxxxxxx/ I think this whole concern of fuse making system memory unmovable forever is overblown. Fuse is already using a temp (unmovable) page for the writeback and is slow and is being removed in this series.