On 01.08.23 14:23, Jason Gunthorpe wrote:
On Tue, Aug 01, 2023 at 02:22:12PM +0200, David Hildenbrand wrote:
On 01.08.23 14:19, Jason Gunthorpe wrote:
On Tue, Aug 01, 2023 at 05:32:38AM +0000, Kasireddy, Vivek wrote:
You get another invalidate because the memfd removes the zero pages
that hmm_range_fault installed in the PTEs before replacing them with
actual writable pages. Then you do the move, and another
hmm_range_fault, and basically the whole thing over again. Except this
time instead of returning zero pages it returns actual writable
page.
Ok, when I tested earlier (by registering an invalidate callback) but without
hmm_range_fault(), I did not find this additional invalidate getting triggered.
Let me try with hmm_range_fault() and see if everything works as expected.
Thank you for your help.
If you do not get an invalidate then there is a pretty serious bug in
the mm that needs fixing.
Anything hmm_range_fault() returns must be invalidated if the
underying CPU mapping changes for any reasons. Since hmm_range_fault()
will populate zero pages when reading from a hole in a memfd, it must
also get an invalidation when the zero pages are changed into writable
pages.
Can you point me at the code that returns that (shared) zero page?
It calls handle_mm_fault() - shouldn't that do it? Same as if the CPU
read faulted the page?
To the best of my knowledge, the shared zeropage is only used in
MAP_PRIVATE|MAP_AON mappings and in weird DAX mappings.
If that changed, we have to fix FOLL_PIN|FOLL_LONGTERM for MAP_SHARED VMAs.
If you read-fault on a memfd hole, you should get a proper "zeroed"
pagecache page that effectively "filled that hole" -- so there is no
file hole anymore.
--
Cheers,
David / dhildenb