On 10.01.25 21:43, Jeff Layton wrote:
On Fri, 2025-01-10 at 21:20 +0100, David Hildenbrand wrote:
On 10.01.25 21:16, Jeff Layton wrote:
On Tue, 2025-01-07 at 09:34 +0100, David Hildenbrand wrote:
On 06.01.25 19:17, Shakeel Butt wrote:
On Mon, Jan 06, 2025 at 11:19:42AM +0100, Miklos Szeredi wrote:
On Fri, 3 Jan 2025 at 21:31, David Hildenbrand <david@xxxxxxxxxx> wrote:
In any case, having movable pages be turned unmovable due to persistent
writaback is something that must be fixed, not worked around. Likely a
good topic for LSF/MM.
Yes, this seems a good cross fs-mm topic.
So the issue discussed here is that movable pages used for fuse
page-cache cause a problems when memory needs to be compacted. The
problem is either that
- the page is skipped, leaving the physical memory block unmovable
- the compaction is blocked for an unbounded time
While the new AS_WRITEBACK_INDETERMINATE could potentially make things
worse, the same thing happens on readahead, since the new page can be
locked for an indeterminate amount of time, which can also block
compaction, right?
Yes, as memory hotplug + virtio-mem maintainer my bigger concern is
these pages residing in ZONE_MOVABLE / MIGRATE_CMA areas where there
*must not be unmovable pages ever*. Not triggered by an untrusted
source, not triggered by an trusted source.
It's a violation of core-mm principles.
Even if we have a timeout of 60s, making things like alloc_contig_page()
wait for that long on writeback is broken and needs to be fixed.
And the fix is not to skip these pages, that's a workaround.
I'm hoping I can find an easy way to trigger this also with NFS.
I imagine that you can just open a file and start writing to it, pull
the plug on the NFS server, and then issue a fsync or something to
ensure some writeback occurs.
Yes, that's the plan, thanks!
Any dirty pagecache folios should be stuck in writeback at that point.
The NFS client is also very patient about waiting for the server to
come back, so it should stay that way indefinitely.
Yes, however the default timeout for UDP is fairly small (for TCP
certainly much longer). So one thing I'd like to understand what that
"cancel writeback -> redirty folio" on timeout does, and when it
actually triggers with TCP vs UDP timeouts.
The lifetime of the pagecache pages is not at all related to the socket
lifetimes. IOW, the client can completely lose the connection to the
server and the page will just stay dirty until the connection can be
reestablished and the server responds.
Right. It cannot get reclaimed while that is the case.
The exception here is if you mount with "-o soft" in which case, an RPC
request will time out with an error after a major RPC timeout (usually
after a minute or so). See nfs(5) for the gory details of timeouts and
retransmission. The default is "-o hard" since that's necessary for
data-integrity in the face of spotty network connections.
Once a soft mount has a writeback RPC time out, the folio is marked
clean and a writeback error is set on the mapping, so that fsync() will
return an error.
I assume that's the code I stumbled over in nfs_page_async_flush(),
where we end up calling folio_redirty_for_writepage() +
nfs_redirty_request(), unless we run into a fatal error; in that case,
we end up in nfs_write_error() where we set the mapping error and stop
writeback using nfs_page_end_writeback().
--
Cheers,
David / dhildenb