On 07.11.24 21:20, Matthew Wilcox wrote:
On Thu, Nov 07, 2024 at 05:34:40PM +0100, David Hildenbrand wrote:
On 07.11.24 17:09, Matthew Wilcox wrote:
On Thu, Nov 07, 2024 at 04:07:08PM +0100, David Hildenbrand wrote:
I'm debugging an interesting problem: split_folio() will fail on dirty
folios on XFS, and I am not sure who will trigger the writeback in a timely
manner so code relying on the split to work at some point (in sane setups
where page pinning is not applicable) can make progress.
You could call something like filemap_write_and_wait_range()?
Thanks, have to look into some details of that.
Looks like the folio_clear_dirty_for_io() is buried in
folio_prepare_writeback(), so that part is taken care of.
Guess I have to fo from folio to "mapping,lstart,lend" such that
__filemap_fdatawrite_range() would look up the folio again. Sounds doable.
(I assume I have to drop the folio lock+reference before calling that)
I was thinking you'd do it higher in the callchain than
gmap_make_secure(). Presumably userspace says "I want to make this
256MB range secure" and we can start by writing back that entire
256MB chunk of address space.
That doesn't prevent anybody from dirtying it in-between, of course,
so you can still get -EBUSY and have to loop round again.
I'm afraid that won't really work.
On the one hand, we might be allocating these pages (+disk blocks)
during the unpack operation -- where we essentially trigger page faults
first using gmap_fault() -- so the pages might not even exist before the
gmap_make_secure() during unpack. One work around would be to
preallocate+writeback from user space, but it doesn't sound quite right.
But the bigger problem I see is that the initial "unpack" operation is
not the only case where we trigger this conversion to "secure" state.
Once the VM is running, we can see calls on arbitrary guest memory even
during page faults, when gmap_make_secure() is called via
gmap_convert_to_secure().
I'm still not sure why we see essentially no progress being made, even
though we temporarily drop the PTL, mmap lock, folio lock, folio ref ...
maybe related to us triggering a write fault that somehow ends up
setting the folio dirty :/ Or because writeback is simply too slow /
backs off.
I'll play with handling -EBUSY from split_folio() differently: if the
folio is under writeback, wait on that. If the folio is dirty, trigger
writeback. And I'll look into whether we really need a writable PTE, I
suspect not, because we are not actually "modifying" page content.
--
Cheers,
David / dhildenb