On 31.05.23 06:55, Alistair Popple wrote:
David Hildenbrand <david@xxxxxxxxxx> writes:
On 26.05.23 11:07, Karim Manaouil wrote:
On Thu, May 25, 2023 at 02:55:30PM +0200, David Hildenbrand wrote:
On 25.05.23 12:06, Karim Manaouil wrote:
Hi,
In do_anonymous_page(), a new page is allocated and zeroed, and the
corresponding page struct is initialised (setting flags PageUptodate,
PageSwapBacked, etc. and initialising the various counters).
Then, set_pte_at() is called directly without calling smp_wmb() to make
the updates above visible on other CPUs.
This could race with a page table walker. The walker can read the new pte
and try to access the page struct or the page content before the changes
above were made visible.
Only after acquiring the page table lock (which the writer first has to
release), right?
In many cases, the walkers don't take the page table locks (e.g.
mm/hmm.c).
Looks like we really should be locking the page table in
hmm_vma_walk_pmd() instead of only doing a pte_offset_map().
It's all very racy without that ...
Even the !pte_present(pte) check is racy ...
hmm_range_fault() on it's own is racy, but it's supposed to be used with
mmu interval notifiers which provide a sequence number and a driver
mutex to synchronise against pte changes. See for example
dmirror_range_snapshot() in lib/test_hmm.c.
How is this supposed to work with racing do_swap_page() that converts
!pte_present() -> pte_present() without triggering any mmu notifier AFAIKs?
--
Thanks,
David / dhildenb