On Tue, Jun 06, 2023, Mingwei Zhang wrote: > > > > > > > > I don't understand the need for READ_ONCE() here. That implies that > > > > there is something tricky going on, and I don't think that's the case. > > > > > > READ_ONCE() is just telling the compiler not to remove the read. Since > > > this is reading a global variable, the compiler might just read a > > > previous copy if the value has already been read into a local > > > variable. But that is not the case here... > > > > > > Note I see there is another READ_ONCE for > > > kvm->arch.indirect_shadow_pages, so I am reusing the same thing. > > > > I agree with Jim, using READ_ONCE() doesn't make any sense. I suspect it may have > > been a misguided attempt to force the memory read to be as close to the write_lock() > > as possible, e.g. to minimize the chance of a false negative. > > Sean :) Your suggestion is the opposite with Jim. He is suggesting > doing nothing, but your suggestion is doing way more than READ_ONCE(). Not really. Jim is asserting that the READ_ONCE() is pointless, and I completely agree. I am also saying that I think there is a real memory ordering issue here, and that it was being papered over by the READ_ONCE() in kvm_mmu_pte_write(). > > So I think this? > > Hmm. I agree with both points above, but below, the change seems too > heavyweight. smp_wb() is a mfence(), i.e., serializing all > loads/stores before the instruction. Doing that for every shadow page > creation and destruction seems a lot. No, the smp_*b() variants are just compiler barriers on x86. > In fact, the case that only matters is '0->1' which may potentially > confuse kvm_mmu_pte_write() when it reads 'indirect_shadow_count', but > the majority of the cases are 'X => X + 1' where X != 0. So, those > cases do not matter. So, if we want to add barriers, we only need it > for 0->1. Maybe creating a new variable and not blocking > account_shadow() and unaccount_shadow() is a better idea? > > Regardless, the above problem is related to interactions among > account_shadow(), unaccount_shadow() and kvm_mmu_pte_write(). It has > nothing to do with the 'reexecute_instruction()', which is what this > patch is about. So, I think having a READ_ONCE() for > reexecute_instruction() should be good enough. What do you think. The reexecute_instruction() case should be fine without any fanciness, it's nothing more than a heuristic, i.e. neither a false positive nor a false negative will impact functional correctness, and nothing changes regardless of how many times the compiler reads the variable outside of mmu_lock. I was thinking that it would be better to have a single helper to locklessly access indirect_shadow_pages, but I agree that applying the barriers to reexecute_instruction() introduces a different kind of confusion. Want to post a v2 of yours without a READ_ONCE(), and I'll post a separate fix for the theoretical kvm_mmu_pte_write() race? And then Paolo can tell me that there's no race and school me on lockless programming once more ;-)