On 25.11.22 22:37, Jann Horn wrote:
pagetable walks on address ranges mapped by VMAs can be done under the mmap
lock, the lock of an anon_vma attached to the VMA, or the lock of the VMA's
address_space. Only one of these needs to be held, and it does not need to
be held in exclusive mode.
Under those circumstances, the rules for concurrent access to page table
entries are:
- Terminal page table entries (entries that don't point to another page
table) can be arbitrarily changed under the page table lock, with the
exception that they always need to be consistent for
hardware page table walks and lockless_pages_from_mm().
This includes that they can be changed into non-terminal entries.
- Non-terminal page table entries (which point to another page table)
can not be modified; readers are allowed to READ_ONCE() an entry, verify
that it is non-terminal, and then assume that its value will stay as-is.
Retracting a page table involves modifying a non-terminal entry, so
page-table-level locks are insufficient to protect against concurrent
page table traversal; it requires taking all the higher-level locks under
which it is possible to start a page walk in the relevant range in
exclusive mode.
The collapse_huge_page() path for anonymous THP already follows this rule,
but the shmem/file THP path was getting it wrong, making it possible for
concurrent rmap-based operations to cause corruption.
This sounds sane and correct to me. No expert on file-THP, though.
For anon-THP it's the mmap lock and the rmap locks. I assume the only
difference for file-THP is that the rmap lock is actually the mapping
lock. Looking at rmap_walk_file(), that seems to be the case.
I wish at least PTE table removal could be done easier ... I already
experimented some time ago with some ideas (e.g., lock in PMD table
memmap) but it's all far from trivial and space in the memmap is rare.
--
Thanks,
David / dhildenb