What sounds plausible for me is:
a) Make this only affect the actual deadlock path: sync migration
during compaction. Communicate it either using some "context"
information or with a new MIGRATE_SYNC_COMPACTION.
b) Call it sth. like AS_WRITEBACK_MIGHT_DEADLOCK_ON_RECLAIM to express
that very deadlock problem.
c) Leave all others sync migration users alone for now
The deadlock path is separate from sync migration. The deadlock arises
from a corner case where cgroupv1 reclaim waits on a folio under
writeback where that writeback itself is blocked on reclaim.
Okay, so compaction (IOW this patch) is not relevant at all to resolve
the deadlock in any way, correct?
For a second I thought I understood how this patch here relates to the
deadlock :)
Would that prevent the deadlock? Even *better* would be to to be able to
ask the fs if starting writeback on a specific folio could deadlock.
Because in most cases, as I understand, we'll not actually run into the
deadlock and would just want to wait for writeback to just complete
(esp. compaction).
(I still think having folios under writeback for a long time might be a
problem, but that's indeed something to sort out separately in the
future, because I suspect NFS has similar issues. We'd want to "wait
with timeout" and e.g., cancel writeback during memory
offlining/alloc_cma ...)
I'm looking back at some of the discussions in v2 [1] and I'm still
not clear on how memory fragmentation for non-movable pages differs
from memory fragmentation from movable pages and whether one is worse
than the other. Currently fuse uses movable temp pages (allocated with
gfp flags GFP_NOFS | __GFP_HIGHMEM), and these can run into the same
Why are they movable? Do you also specify __GFP_MOVABLE?
If not, they are unmovable and are never allocated from
ZONE_MOVABLE/MIGRATE_CMA -- and usually only from MIGRATE_UNMOVBALE, to
group these unmovable pages.
issue where a buggy/malicious server may never complete writeback.
If the temp pages are not allocated using __GFP_MOVABLE, they are just
like any other kernel allocation -- unmovable. Nobody would even try
migrating them, ever. And they are allocated from memory regions where
that is expected.
This has the same effect of fragmenting memory and has a worse memory
cost to the system in terms of memory used. With not having temp pages
though, now in this scenario, pages allocated in a movable page block
can't be compacted and that memory is fragmented.
Yes. With temp pages, they simply grouped naturally "where they belong".
After all, pagecache pages are allocated using __GFP_MOVABLE, which
implies "this thing is movable" -- so the buddy can place them in
physical memory regions that allow only for movable allocations or
minimize fragmentation.
My (basic and maybe
incorrect) understanding is that memory gets allocated through a buddy
allocator and moveable vs nonmovable pages get allocated to
corresponding blocks that match their type, but there's no other
difference otherwise. Is this understanding correct? Or is there some
substantial difference between fragmentation for movable vs nonmovable
blocks?
I assume not regarding fragmentation.
In general, I see two main issues:
A) We are no longer waiting on writeback, even though we expect in sane
environments that writeback will happen and we it might be worthwhile to
just wait for writeback so we can migrate these folios.
B) We allow turning movable pages to be unmovable, possibly forever/long
time, and there is no way to make them movable again (e.g., cancel
writeback).
I'm wondering if A) is actually a new issue introduced by this change.
Can folios with busy temp pages (writeback cleared on folio, but temp
pages are still around) be migrated? I will look into some details once
I'm back from vacation.
--
Cheers,
David / dhildenb