On 04/12/2023 17:26, James Houghton wrote: > It is currently possible for a userspace application to enter a page > fault loop when using HugeTLB pages implemented with contiguous PTEs > when HAFDBS is not available. This happens because: > 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean > (PTE_DIRTY | PTE_RDONLY | PTE_WRITE). Hi James, Do you know how this happens? AFAIK, this is the set of valid bit combinations, and PTE_RDONLY|PTE_WRITE|PTE_DIRTY is not one of them. Perhaps the real solution is to understand how this is happening and prevent it? /* * PTE bits configuration in the presence of hardware Dirty Bit Management * (PTE_WRITE == PTE_DBM): * * Dirty Writable | PTE_RDONLY PTE_WRITE PTE_DIRTY (sw) * 0 0 | 1 0 0 * 0 1 | 1 1 0 * 1 0 | 1 0 1 * 1 1 | 0 1 x * * When hardware DBM is not present, the sofware PTE_DIRTY bit is updated via * the page fault mechanism. Checking the dirty status of a pte becomes: * * PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY) */ Thanks, Ryan > 2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling > the memory access on a system without HAFDBS, we will get a page > fault. > 3. HugeTLB will check if it needs to update the dirty bits on the PTE. > For contiguous PTEs, it will check to see if the pgprot bits need > updating. In this case, HugeTLB wants to write a sequence of > sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about > to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()), > so it thinks no update is necessary. > > Please see this[1] reproducer. > > I think (though I may be wrong) that both step (1) and step (3) are > buggy. > > The first patch in this series fixes step (3); instead of checking if > pte_dirty is matching in __cont_access_flags_changed, check pte_hw_dirty > and pte_sw_dirty separately. > > The second patch in this series makes step (1) less likely to occur. > Without this patch, we can get the kernel to write a sw-dirty, hw-clean > PTE with the following steps (showing the relevant VMA flags and pgprot > bits): > i. Create a valid, writable contiguous PTE. > VMA vmflags: VM_SHARED | VM_READ | VM_WRITE > VMA pgprot bits: PTE_RDONLY | PTE_WRITE > PTE pgprot bits: PTE_DIRTY | PTE_WRITE > ii. mprotect the VMA to PROT_NONE. > VMA vmflags: VM_SHARED > VMA pgprot bits: PTE_RDONLY > PTE pgprot bits: PTE_DIRTY | PTE_RDONLY > iii. mprotect the VMA back to PROT_READ | PROT_WRITE. > VMA vmflags: VM_SHARED | VM_READ | VM_WRITE > VMA pgprot bits: PTE_RDONLY | PTE_WRITE > PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY > > Applying either one of the two patches in this patchset will fix the > particular issue with HugeTLB pages implemented with contiguous PTEs. > It's possible that only one of these patches should be taken, or that > the right fix is something else entirely. > > [1]: https://gist.github.com/48ca/11d1e466deee032cb35aa8c2280f93b0 > > James Houghton (2): > arm64: hugetlb: Distinguish between hw and sw dirtiness in > __cont_access_flags_changed > arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify > > arch/arm64/include/asm/pgtable.h | 6 ++++++ > arch/arm64/mm/hugetlbpage.c | 5 ++++- > 2 files changed, 10 insertions(+), 1 deletion(-) > > > base-commit: 645a9a454fdb7e698a63a275edca6a17ef97afc4