Re: [PATCH v2 1/2] mm/migrate_device.c: Copy pte dirty bit to page

Peter Xu <peterx@xxxxxxxxxx> · Wed, 17 Aug 2022 15:07:09 -0400

On Wed, Aug 17, 2022 at 03:41:16PM +1000, Alistair Popple wrote:
> My primary concern with batching is ensuring a CPU write after clearing
> a clean PTE but before flushing the TLB does the "right thing" (ie. faults
> if the PTE is not present).

Fair enough.  Exactly I have that same concern. But I think Nadav replied
very recently on this in the previous thread, quotting from him [1]:

  I keep not remembering this erratum correctly. IIRC, the erratum says
  that the access/dirty might be set, but it does not mean that a write is
  possible after the PTE is cleared (i.e., the dirty/access might be set on
  the non-present PTE, but the access itself would fail). So it is not an
  issue in this case - losing A/D would not impact correctness since the
  access should fail.

I don't really know whether he means this, but I really think the hardware
should behave like that or otherwise I can't see how it can go right.

Let's assume if after pte cleared the page can still be written, then
afaict ptep_clear_flush() is not safe either, because fundamentally it is
two operations happening in sequence, of: (1) ptep_get_and_clear(), and (2)
conditionally do flush_tlb_page() when needed.

If page can be written with TLB cached but without pte present, what if
some process writes to memory during step (1) and (2)?  AFAIU that's the
same question as using raw ptep_get_and_clear() and a batched tlb flush.

IOW, I don't see how a tlb batched solution can be worse than using per-pte
ptep_clear_flush().  It may enlarge the race window but fundamentally
(iiuc) they're the same thing here as long as there's no atomic way to both
"clear pte and flush tlb".

[1] https://lore.kernel.org/lkml/E37036E0-566E-40C7-AD15-720CDB003227@xxxxxxxxx/

-- 
Peter Xu