> On Mar 11, 2022, at 12:41 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > On 3/11/22 11:07, Nadav Amit wrote: >> From: Nadav Amit <namit@xxxxxxxxxx> >> >> Calls to change_protection_range() on THP can trigger, at least on x86, >> two TLB flushes for one page: one immediately, when pmdp_invalidate() is >> called by change_huge_pmd(), and then another one later (that can be >> batched) when change_protection_range() finishes. >> >> The first TLB flush is only necessary to prevent the dirty bit (and with >> a lesser importance the access bit) from changing while the PTE is >> modified. However, this is not necessary as the x86 CPUs set the >> dirty-bit atomically with an additional check that the PTE is (still) >> present. One caveat is Intel's Knights Landing that has a bug and does >> not do so. > > First of all, thank you for your diligence here. This is a super > obscure issue. I think I put handling for it in the kernel and I'm not > sure I would have even thought about this angle. > > That said, I'm not sure this is all necessary. > > Yes, the Dirty bit can get set unexpectedly in some PTEs. But, the > question is whether it is *VALUABLE* and needs to be preserved. The > current kernel code pretty much just lets the hardware set the Dirty bit > and then ignores it. If it were valuable, ignoring it would have been a > bad thing. We'd be losing data on today's kernels because the hardware > told us about a write that happened but that the kernel ignored. > > My mental model of what the microcode responsible for the erratum does > is something along these lines: > > if (write) > pte |= _PAGE_DIRTY; > if (!pte_present(pte)) > #PF > > The PTE is marked dirty, but the write never actually executes. The > thread that triggered the A/D setting *also* gets a fault. > This makes perfect sense. I guess I misunderstood or forgot the erratum. But feel free to recheck. It would allow to remove the KNL check, and probably the first patch in this series. But I don’t think it would allow to get rid of pmdp_invalidate_ad() since I do not fell comfortable just to use pmdp_establish() directly: I do not know about other architectures well enough to say that they have the same atomicity guarantees when it comes to A/D bits. > I'll double-check with some Intel folks to make sure I'm not missing > something. But, either way, I don't think we should be going to this > much trouble for the good ol' Xeon Phi. I doubt there are many still > around and I *REALLY* doubt they're running new kernels. > > *If* we need this (and I'm not convinced we do), my first instinct would > be to just do this instead: > > clear_cpu_cap(c, X86_FEATURE_PSE); > > on KNL systems. If anyone cares, they know where to find us. I think that it is not necessary and your understanding of the erratum is the right one. Let me know if you find it is not the case. BTW: Thanks for the quick response, and sorry for the time it took me to send v3.