On 23/06/2018 02:46, Junaid Shahid wrote: > CPU 1 CPU 2 > ----- ----- > 1.2 SPTE.W = 1 > 2.1 Guest writes a GPTE for GVA X. > (GPTE being in the guest page table shadowed > by the SP from CPU 1.) > This reads SPTE.W during the page table walk. > Since SPTE.W is read as 1, there is no fault. > > 2.2 Guest issues TLB flush. > That causes a VM Exit. > > 2.3 kvm_mmu_sync_pages() reads sp->unsync. > Since it is false, so it just returns. > > 2.4 Guest accesses GVA X. > Since the mapping in the SP was not updated, > so the old mapping for GVA X gets used. > 1.1 sp->unsync = true > > The outcome in 2.4 is obviously incorrect. So we need to have a > write barrier that ensures that 1.1 happens before 1.2, which > prevents this outcome. > > If we map this to the canonical example that you mentioned at the > start, then x maps to the SPTE, y maps to sp->unsync, the smp_wmb() > between WRITE_ONCE(y) and WRITE_ONCE(x) maps to an smp_wmb() between 1.1 > and 1.2 and the smp_rmb() between READ_ONCE(x) and READ_ONCE(y) maps to > the implicit barrier in 2.2. Aha, so my confusion here comes before the read of the GPTE is used after kvm_mmu_sync_pages, but it's a cached value that was read *before* sp->unsync. If you take the cached read into account, it definitely looks like the canonical example. The two-CPU case is much easier to follow. So let's put both memory barriers... which unintuitively don't pair with each other. Something like: 1) /* * When reading sp->unsync after a vmexit, the processor might have * some SPTEs with SPTE.W=1 cached in the TLB. The vmexit already * ensures that sp->unsync is read after the TLB fill (x86 doesn't * reorder reads, and vmentry/vmexit provides a compiler barrier); * here we ensure that sp->unsync is written before the SPTE. * Otherwise, a TLB flush might skip the synchronization of the pages * we're write protecting: <insert example> */ smp_wmb(); 2) /* * Read sp->unsync before the spinlock. sp->unsync is written in * kvm_mmu->page_fault with mmu_lock held; this barrier pairs with * spin_lock's acquire barrier, which guarantees that the spinlock * is written before sp->unsync. */ smp_rmb(); Feel free to make it less verbose. :) Paolo