Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings

David Hildenbrand <david@xxxxxxxxxx> · Mon, 30 Dec 2024 20:52:04 +0100

What sounds plausible for me is:

a) Make this only affect the actual deadlock path: sync migration
     during compaction. Communicate it either using some "context"
     information or with a new MIGRATE_SYNC_COMPACTION.
b) Call it sth. like AS_WRITEBACK_MIGHT_DEADLOCK_ON_RECLAIM to express
      that very deadlock problem.
c) Leave all others sync migration users alone for now

The deadlock path is separate from sync migration. The deadlock arises
from a corner case where cgroupv1 reclaim waits on a folio under
writeback where that writeback itself is blocked on reclaim.

Okay, so compaction (IOW this patch) is not relevant at all to resolve 
the deadlock in any way, correct?

For a second I thought I understood how this patch here relates to the 
deadlock :)

Would that prevent the deadlock? Even *better* would be to to be able to
ask the fs if starting writeback on a specific folio could deadlock.
Because in most cases, as I understand, we'll  not actually run into the
deadlock and would just want to wait for writeback to just complete
(esp. compaction).

(I still think having folios under writeback for a long time might be a
problem, but that's indeed something to sort out separately in the
future, because I suspect NFS has similar issues. We'd want to "wait
with timeout" and e.g., cancel writeback during memory
offlining/alloc_cma ...)

I'm looking back at some of the discussions in v2 [1] and I'm still
not clear on how memory fragmentation for non-movable pages differs
from memory fragmentation from movable pages and whether one is worse
than the other. Currently fuse uses movable temp pages (allocated with
gfp flags GFP_NOFS | __GFP_HIGHMEM), and these can run into the same

Why are they movable? Do you also specify __GFP_MOVABLE?

If not, they are unmovable and are never allocated from 
ZONE_MOVABLE/MIGRATE_CMA -- and usually only from MIGRATE_UNMOVBALE, to 
group these unmovable pages.

issue where a buggy/malicious server may never complete writeback.

If the temp pages are not allocated using __GFP_MOVABLE, they are just 
like any other kernel allocation -- unmovable. Nobody would even try 
migrating them, ever. And they are allocated from memory regions where 
that is expected.

This has the same effect of fragmenting memory and has a worse memory
cost to the system in terms of memory used. With not having temp pages
though, now in this scenario, pages allocated in a movable page block
can't be compacted and that memory is fragmented. 

Yes. With temp pages, they simply grouped naturally "where they belong".

After all, pagecache pages are allocated using __GFP_MOVABLE, which 
implies "this thing is movable" -- so the buddy can place them in 
physical memory regions that allow only for movable allocations or 
minimize fragmentation.

My (basic and maybe
incorrect) understanding is that memory gets allocated through a buddy
allocator and moveable vs nonmovable pages get allocated to
corresponding blocks that match their type, but there's no other
difference otherwise. Is this understanding correct? Or is there some
substantial difference between fragmentation for movable vs nonmovable
blocks?

I assume not regarding fragmentation.

In general, I see two main issues:

A) We are no longer waiting on writeback, even though we expect in sane 
environments that writeback will happen and we it might be worthwhile to 
just wait for writeback so we can migrate these folios.

B) We allow turning movable pages to be unmovable, possibly forever/long 
time, and there is no way to make them movable again (e.g., cancel 
writeback).

I'm wondering if A) is actually a new issue introduced by this change. 
Can folios with busy temp pages (writeback cleared on folio, but temp 
pages are still around) be migrated? I will look into some details once 
I'm back from vacation.

--
Cheers,

David / dhildenb