Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Nadav Amit <nadav.amit@xxxxxxxxx> · Thu, 27 Oct 2022 14:44:46 -0700

On Oct 27, 2022, at 1:31 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> So there are two levels of tlb flush optimizations
> 
> (a) avoiding them entirely in the first place
> 
> (b) the whole "once you have to flush, keep track of lazy modes and
> TLB generations, and flush ranges"
> 
> And honestly, I think you ignored (a), and that's where we do exactly
> those kinds of "this case doesn't need to flush AT ALL" things.

I did try to avoid TLB flushes by introducing pte_needs_flush() and avoiding
flushes based on the architectural PTE changes. There are even more x86
arch-based opportunities to further avoid TLB flushes (and then only flush
the TLB if spurious #PF occurs).

Personally, I still think that making decisions on flushes based on (mostly)
only the arch state makes the code more robust against misuse (e.g., see
various confusions between mmu_gather’s fullmm and need_flush_all).

Having said that, I will follow your feedback that the extra complexity
worth the extra performance.

Anyhow, admittedly, I need to give it more thought. For instance, in respect
to the code that you mentioned (in zap_pte_range), after reading it again,
it seems strange: how is ok to defer the TLB flush after the rmap was
removed, even if it is done while the PTL is held.
folio_clear_dirty_for_io() would not sync on the PTL afterwards, so the page
might be later re-dirtied using a stale cached PTE. Oh well.