On 03.06.24 15:23, Dave Hansen wrote:
On 6/3/24 02:35, Byungchul Park wrote:
...> In luf's point of view, the points where the deferred flush should be
performed are simply:
1. when changing the vma maps, that might be luf'ed.
2. when updating data of the pages, that might be luf'ed.
It's simple, but the devil is in the details as always.
All we need to do is to indentify the points:
1. when changing the vma maps, that might be luf'ed.
a) mmap and munmap e.i. fault handler or unmap_region().
b) permission to writable e.i. mprotect or fault handler.
c) what I'm missing.
I'd say it even more generally: anything that installs a PTE which is
inconsistent with the original PTE. That, of course, includes writes.
But it also includes crazy things that we do like uprobes. Take a look
at __replace_page().
I think the page_vma_mapped_walk() checks plus the ptl keep LUF at bay
there. But it needs some really thorough review.
But the bigger concern is that, if there was a problem, I can't think of
a systematic way to find it.
Fully agreed!
2. when updating data of the pages, that might be luf'ed.
a) updating files through vfs e.g. file_end_write().
b) updating files through writable maps e.i. 1-a) or 1-b).
c) what I'm missing.
Filesystems or block devices that change content without a "write" from
the local system. Network filesystems and block devices come to mind.
I honestly don't know what all the rules are around these, but they
could certainly be troublesome.
There appear to be some interactions for NFS between file locking and
page cache flushing.
But, stepping back ...
I'd honestly be a lot more comfortable if there was even a debugging LUF
mode that enforced a rule that said:
1. A LUF'd PTE can't be rewritten until after a luf_flush() occurs
I was playing with the idea of using a PTE marker. Then it's clear for
munmap/mremap/page faults that there is an outstanding flush required.
the alternative might be a VMA flag, but that's harder to actually
enforce an invariant.
2. A LUF'd page's position in the page cache can't be replaced until
after a luf_flush()
That's the most tricky bit. I think these are the VFS concerns like
1) Page migration/reclaim ends up freeing the old page. TLB not flushed.
2) write() to the new page / write from other process to the new page
3) CPU reads stale content from old page
PTE markers can't handle that.
--
Cheers,
David / dhildenb