David Hildenbrand <david@xxxxxxxxxx> writes: > On 26.05.23 11:07, Karim Manaouil wrote: >> On Thu, May 25, 2023 at 02:55:30PM +0200, David Hildenbrand wrote: >>> On 25.05.23 12:06, Karim Manaouil wrote: >>>> Hi, >>>> >>>> In do_anonymous_page(), a new page is allocated and zeroed, and the >>>> corresponding page struct is initialised (setting flags PageUptodate, >>>> PageSwapBacked, etc. and initialising the various counters). >>>> >>>> Then, set_pte_at() is called directly without calling smp_wmb() to make >>>> the updates above visible on other CPUs. >>>> >>>> This could race with a page table walker. The walker can read the new pte >>>> and try to access the page struct or the page content before the changes >>>> above were made visible. >>> >>> Only after acquiring the page table lock (which the writer first has to >>> release), right? >> In many cases, the walkers don't take the page table locks (e.g. >> mm/hmm.c). > > Looks like we really should be locking the page table in > hmm_vma_walk_pmd() instead of only doing a pte_offset_map(). > > It's all very racy without that ... > > Even the !pte_present(pte) check is racy ... hmm_range_fault() on it's own is racy, but it's supposed to be used with mmu interval notifiers which provide a sequence number and a driver mutex to synchronise against pte changes. See for example dmirror_range_snapshot() in lib/test_hmm.c.