On Wed, Jan 12, 2022 at 3:14 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Wed, Jan 12, 2022, David Matlack wrote: > > When the TDP MMU is write-protection GFNs for page table protection (as > > opposed to for dirty logging, or due to the HVA not being writable), it > > checks if the SPTE is already write-protected and if so skips modifying > > the SPTE and the TLB flush. > > > > This behavior is incorrect because the SPTE may be write-protected for > > dirty logging. This implies that the SPTE could be locklessly be made > > writable on the next write access, and that vCPUs could still be running > > with writable SPTEs cached in their TLB. > > > > Fix this by unconditionally setting the SPTE and only skipping the TLB > > flush if the SPTE was already marked !MMU-writable or !Host-writable, > > which guarantees the SPTE cannot be locklessly be made writable and no > > vCPUs are running the writable SPTEs cached in their TLBs. > > > > Technically it would be safe to skip setting the SPTE as well since: > > > > (a) If MMU-writable is set then Host-writable must be cleared > > and the only way to set Host-writable is to fault the SPTE > > back in entirely (at which point any unsynced shadow pages > > reachable by the new SPTE will be synced and MMU-writable can > > be safetly be set again). > > > > and > > > > (b) MMU-writable is never consulted on its own. > > > > And in fact this is what the shadow MMU does when write-protecting guest > > page tables. However setting the SPTE unconditionally is much easier to > > reason about and does not require a huge comment explaining why it is safe. > > I disagree. I looked at the code+comment before reading the full changelog and > typed up a response saying the code should be: > > if (!is_writable_pte(iter.old_spte) && > !spte_can_locklessly_be_made_writable(spte)) > break; > > Then I went read the changelog and here we are :-) > > I find that much more easier to grok, e.g. in plain English: "if the SPTE isn't > writable and can't be made writable, there's nothing to do". Oh interesting. I actually find that confusing because it can easily lead to the MMU-writable bit staying set. Here we are protecting GFNs and we're opting to leave the MMU-writable bit set. It takes a lot of digging to figure out that this is safe because if MMU-writable is set and the SPTE cannot be locklessly be made writable then it implies Host-writable is clear, and Host-writable can't be reset without syncing the all shadow pages reachable by the MMU. Oh and the MMU-writable bit is never consulted on its own (e.g. We never iterate through all SPTEs to find the ones that are !MMU-writable). Maybe my understanding is horribly off since this all seems unnecessarily convoluted, and the cost of always clearing MMU-writable is just an extra bitwise-OR. The TLB flush is certainly unnecessary if the SPTE is already !Host-writable, which is what this commit does. > > Versus "unconditionally clear the writable bits because ???, but only flush if > the write was actually necessary", with a slightly opinionated translation :-) If MMU-writable is already clear we can definitely break. I had that in a previous version of the patch by checking if iter.old_spte == new_spte but it seemed unnecessary since the guts of tdp_mmu_spte_set() already optimizes for this. > > And with that, you don't need to do s/spte_set/flush. Though I would be in favor > of a separate patch to do s/spte_set/write_protected here and in the caller, to > match kvm_mmu_slot_gfn_write_protect(). I'm not sure write_protected would not be a good variable name because even if we did not write-protect the SPTE (i.e. PT_WRITABLE_MASK was already clear) we may still need a TLB flush to ensure no CPUs have a writable SPTE in their TLB. Perhaps we have different definitions for "write-protecting"?