On 07.01.25 19:07, Shakeel Butt wrote:
On Tue, Jan 07, 2025 at 09:34:49AM +0100, David Hildenbrand wrote:
On 06.01.25 19:17, Shakeel Butt wrote:
On Mon, Jan 06, 2025 at 11:19:42AM +0100, Miklos Szeredi wrote:
On Fri, 3 Jan 2025 at 21:31, David Hildenbrand <david@xxxxxxxxxx> wrote:
In any case, having movable pages be turned unmovable due to persistent
writaback is something that must be fixed, not worked around. Likely a
good topic for LSF/MM.
Yes, this seems a good cross fs-mm topic.
So the issue discussed here is that movable pages used for fuse
page-cache cause a problems when memory needs to be compacted. The
problem is either that
- the page is skipped, leaving the physical memory block unmovable
- the compaction is blocked for an unbounded time
While the new AS_WRITEBACK_INDETERMINATE could potentially make things
worse, the same thing happens on readahead, since the new page can be
locked for an indeterminate amount of time, which can also block
compaction, right?
Yes, as memory hotplug + virtio-mem maintainer my bigger concern is these
pages residing in ZONE_MOVABLE / MIGRATE_CMA areas where there *must not be
unmovable pages ever*. Not triggered by an untrusted source, not triggered
by an trusted source.
It's a violation of core-mm principles.
The "must not be unmovable pages ever" is a very strong statement and we
are violating it today and will keep violating it in future. Any
page/folio under lock or writeback or have reference taken or have been
isolated from their LRU is unmovable (most of the time for small period
of time).
^ this: "small period of time" is what I meant.
Most of these things are known to not be problematic: retrying a couple
of times makes it work, that's why migration keeps retrying.
Again, as an example, we allow short-term O_DIRECT but disallow
long-term page pinning. I think there were concerns at some point if
O_DIRECT might also be problematic (I/O might take a while), but so far
it was not a problem in practice that would make CMA allocations easily
fail.
vmsplice() is a known problem, because it behaves like O_DIRECT but
actually triggers long-term pinning; IIRC David Howells has this on his
todo list to fix. [I recall that seccomp disallows vmsplice by default
right now]
These operations are being done all over the place in kernel.
Miklos gave an example of readahead.
I assume you mean "unmovable for a short time", correct, or can you
point me at that specific example; I think I missed that.
The per-CPU LRU caches are another
case where folios can get stuck for long period of time.
Which is why memory offlining disables the lru cache. See
lru_cache_disable(). Other users that care about that drain the LRU on
all cpus.
Reclaim and
compaction can isolate a lot of folios that they need to have
too_many_isolated() checks. So, "must not be unmovable pages ever" is
impractical.
"must only be short-term unmovable", better?
The point is that, yes we should aim to improve things but in iterations
and "must not be unmovable pages ever" is not something we can achieve
in one step.
I agree with the "improve things in iterations", but as
AS_WRITEBACK_INDETERMINATE has the FOLL_LONGTERM smell to it, I think we
are making things worse.
And as this discussion has been going on for too long, to summarize my
point: there exist conditions where pages are short-term unmovable, and
possibly some to be fixed that turn pages long-term unmovable (e.g.,
vmsplice); that does not mean that we can freely add new conditions that
turn movable pages unmovable long-term or even forever.
Again, this might be a good LSF/MM topic. If I would have the capacity I
would suggest a topic around which things are know to cause pages to be
short-term or long-term unmovable/unsplittable, and which can be
handled, which not. Maybe I'll find the time to propose that as a topic.
--
Cheers,
David / dhildenb