On Fri, 2025-01-10 at 22:13 +0100, David Hildenbrand wrote: > On 10.01.25 21:28, Jeff Layton wrote: > > On Thu, 2025-01-09 at 12:22 +0100, David Hildenbrand wrote: > > > On 07.01.25 19:07, Shakeel Butt wrote: > > > > On Tue, Jan 07, 2025 at 09:34:49AM +0100, David Hildenbrand wrote: > > > > > On 06.01.25 19:17, Shakeel Butt wrote: > > > > > > On Mon, Jan 06, 2025 at 11:19:42AM +0100, Miklos Szeredi wrote: > > > > > > > On Fri, 3 Jan 2025 at 21:31, David Hildenbrand <david@xxxxxxxxxx> wrote: > > > > > > > > In any case, having movable pages be turned unmovable due to persistent > > > > > > > > writaback is something that must be fixed, not worked around. Likely a > > > > > > > > good topic for LSF/MM. > > > > > > > > > > > > > > Yes, this seems a good cross fs-mm topic. > > > > > > > > > > > > > > So the issue discussed here is that movable pages used for fuse > > > > > > > page-cache cause a problems when memory needs to be compacted. The > > > > > > > problem is either that > > > > > > > > > > > > > > - the page is skipped, leaving the physical memory block unmovable > > > > > > > > > > > > > > - the compaction is blocked for an unbounded time > > > > > > > > > > > > > > While the new AS_WRITEBACK_INDETERMINATE could potentially make things > > > > > > > worse, the same thing happens on readahead, since the new page can be > > > > > > > locked for an indeterminate amount of time, which can also block > > > > > > > compaction, right? > > > > > > > > > > Yes, as memory hotplug + virtio-mem maintainer my bigger concern is these > > > > > pages residing in ZONE_MOVABLE / MIGRATE_CMA areas where there *must not be > > > > > unmovable pages ever*. Not triggered by an untrusted source, not triggered > > > > > by an trusted source. > > > > > > > > > > It's a violation of core-mm principles. > > > > > > > > The "must not be unmovable pages ever" is a very strong statement and we > > > > are violating it today and will keep violating it in future. Any > > > > page/folio under lock or writeback or have reference taken or have been > > > > isolated from their LRU is unmovable (most of the time for small period > > > > of time). > > > > > > ^ this: "small period of time" is what I meant. > > > > > > Most of these things are known to not be problematic: retrying a couple > > > of times makes it work, that's why migration keeps retrying. > > > > > > Again, as an example, we allow short-term O_DIRECT but disallow > > > long-term page pinning. I think there were concerns at some point if > > > O_DIRECT might also be problematic (I/O might take a while), but so far > > > it was not a problem in practice that would make CMA allocations easily > > > fail. > > > > > > vmsplice() is a known problem, because it behaves like O_DIRECT but > > > actually triggers long-term pinning; IIRC David Howells has this on his > > > todo list to fix. [I recall that seccomp disallows vmsplice by default > > > right now] > > > > > > These operations are being done all over the place in kernel. > > > > Miklos gave an example of readahead. > > > > > > I assume you mean "unmovable for a short time", correct, or can you > > > point me at that specific example; I think I missed that. > > > > > > > The per-CPU LRU caches are another > > > > case where folios can get stuck for long period of time. > > > > > > Which is why memory offlining disables the lru cache. See > > > lru_cache_disable(). Other users that care about that drain the LRU on > > > all cpus. > > > > > > > Reclaim and > > > > compaction can isolate a lot of folios that they need to have > > > > too_many_isolated() checks. So, "must not be unmovable pages ever" is > > > > impractical. > > > > > > "must only be short-term unmovable", better? > > > > > > > Still a little ambiguous. > > > > How short is "short-term"? Are we talking milliseconds or minutes? > > Usually a couple of seconds, max. For memory offlining, slightly longer > times are acceptable; other things (in particular compaction or CMA > allocations) will give up much faster. > > > > > Imposing a hard timeout on writeback requests to unprivileged FUSE > > servers might give us a better guarantee of forward-progress, but it > > would probably have to be on the order of at least a minute or so to be > > workable. > > Yes, and that might already be a bit too much, especially if stuck on > waiting for folio writeback ... so ideally we could find a way to > migrate these folios that are under writeback and it's not your ordinary > disk driver that responds rather quickly. > That would be ideal I think. One thought: In practice, a lot of these writeback handers use the folio up front and then don't need to touch it again afterward until the reply comes in and they clear the writeback bit. Maybe we could add a mechanism where the writeback handers could mark the folio as being moveable after the first phase was done? When the reply comes in, they would clear that mark and check whether it's been moved in the interim, and fix up the appropriate pointers if so? Implementing that sounds a bit complex though since it's effectively a new locking scheme. > Right now we do it via these temp pages, and I can see how that's > undesirable. > > For NFS etc. we probably never ran into this, because it's all used in > fairly well managed environments and, well, I assume NFS easily outdates > CMA and ZONE_MOVABLE :) > > > >>> > > > > The point is that, yes we should aim to improve things but in iterations > > > > and "must not be unmovable pages ever" is not something we can achieve > > > > in one step. > > > > > > I agree with the "improve things in iterations", but as > > > AS_WRITEBACK_INDETERMINATE has the FOLL_LONGTERM smell to it, I think we > > > are making things worse. > > > > > > And as this discussion has been going on for too long, to summarize my > > > point: there exist conditions where pages are short-term unmovable, and > > > possibly some to be fixed that turn pages long-term unmovable (e.g., > > > vmsplice); that does not mean that we can freely add new conditions that > > > turn movable pages unmovable long-term or even forever. > > > > > > Again, this might be a good LSF/MM topic. If I would have the capacity I > > > would suggest a topic around which things are know to cause pages to be > > > short-term or long-term unmovable/unsplittable, and which can be > > > handled, which not. Maybe I'll find the time to propose that as a topic. > > > > > > > > > This does sound like great LSF/MM fodder! I predict that this session > > will run long! ;) > > Heh, fully agreed! :) > -- Jeff Layton <jlayton@xxxxxxxxxx>