On 21/06/2018 22:17, Junaid Shahid wrote: > On 06/19/2018 10:58 PM, Paolo Bonzini wrote: >> On 20/06/2018 00:42, Junaid Shahid wrote: >>> Actually, now that I think about it again, I don't think that >>> any read barrier is needed here. I don't believe that the >>> scenario that I described in the comments in this patch needs a >>> read barrier (though please correct me if that isn't the case). >>> The code path that I actually had in mind when putting the >>> barrier here appears to already include a read barrier that I >>> missed seeing earlier. >>> >>> In any case, it seems that smp_read_barrier_depends() is a no-op >>> on x86. So we can just remove it altogether, unless you think >>> that it would be useful to keep it just as documentation. >> >> It's not entirely a no-op, it's (on x86) a compiler optimization >> barrier. >> >> In the comment above smp_wmb you should say what is the read >> barrier that is "paired" with the smp_wmb. If there is none, that >> most likely means that the smp_read_barrier_depends _is_ actually >> needed. > > As best as I can tell, there isn't any explicit read barrier needed > corresponding to the smp_wmb() in mmu_need_write_protect(). In the > example scenario described there that necessitates the wmb(), the VM > exit (that resulted in the call to kvm_mmu_sync_roots) would act as > an implicit barrier that would order the sp->unsync read after > whatever condition triggered the sync (e.g. a guest TLB flush). Even > if smp_read_barrier_depends() acts as a compiler barrier, it wouldn't > add anything there. (It is very well possible that I am missing > something. Please do let me know if that is the case.) Uhm, actually now I believe that the smp_wmb() is not needed. Remember that, in the common case where you're "passing a message" from a writer to the reader, the writer writes A before B, while the reader reads B before A. A is the "message", B is a readiness flag: r1 = READ_ONCE(x); WRITE_ONCE(y) = 1; smp_rmb(); smp_wmb(); r2 = READ_ONCE(y); WRITE_ONCE(x) = 1; BUG_ON(r1 == 1 && r2 == 0); But this is not what is happening between sp->unsync and SPTE. The important ordering on the write side cannot be "write sp->unsync before SPTE", because the ordering on the read side would be "read SPTE before sp->unsync". This is of course not the case. Rather, what we want here is for the reader to be conservative in taking the spinlock. The unsync flag doesn't only tell you that you have some work to do, it also tells you if there _could_ be a concurrent update in progress and you really need to check the spinlock. So the message is the spinlock, the flag is sp->unsync. So the writer needs to write sp->unsync after writing the spinlock, which is guaranteed by spin_lock. But the reader needs to read the spinlock after reading sp->unsync, and that requires an smp_rmb(). It's true that the reader needs to read the SPTE after sp->unsync. However that just works, through the mediation of the spinlock: - if sp->unsync is set then the reader, thanks to smp_rmb()+spin_lock(), will see the writer's spin_unlock before proceeding with the sync. Therefore the SPTE will always be written before the reader reads it. - if sp->unsync is clear, the reader cannot see *any* effect of the writer's critical section, including any SPTE write. That's because the BUG_ON in the above pattern translates to "BUG_ON(sp->unsync && !writer_took_the_spin_lock);". And fortunately we can use model-checking to validate the above. Copy this into cppmem (http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/index.html) int main() { atomic_int unsync=0; atomic_int lock=0; int data=0; {{{ { r11=unsync.load(memory_order_acquire); r12=lock.load(memory_order_acquire).readsvalue(1); r12=lock.load(memory_order_acquire).readsvalue(2); r13=data; } ||| { r21=lock.load(memory_order_relaxed).readsvalue(0); lock.store(1, memory_order_seq_cst); unsync.store(1, memory_order_relaxed); data = 1; lock.store(2, memory_order_release); } }}} return 0; } Note that above I have a relaxed store to unsync (i.e. no barriers) and an acquire store from unsync (i.e. rmb). Click "run" and you'll get "2 consistent, race free" execution. One is for the case where thread 1 reads 0 for unsync, because it executes before the store; the second is for the case where it reads 1, and then in the graph below you'll see that this execution has "data=1". There is only one snag in the reasoning. If I change the first acquire to relaxed, it still works, meaning that the smp_rmb() is unnecessary too. I am not sure why that is the case, but Documentation/memory-barriers.txt strongly suggests that we shouldn't expect that: Memory operations that occur before an ACQUIRE operation may appear to happen after it completes. So in the end I'm more confident about the smp_wmb() being unnecessary, than the smp_rmb(). (Not smp_read_barrier_depends(), since unsync->spinlock does not involve a data dependency). Paolo