On Fri, 26 Aug 2022 11:50:24 +0100, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > On 8/24/22 00:47, Marc Zyngier wrote: > >> I definitely don't think I 100% understand all the ordering things since > >> they're complicated.. but my understanding is that the reset procedure > >> didn't need memory barrier (unlike pushing, where we have explicit wmb), > >> because we assumed the userapp is not hostile so logically it should only > >> modify the flags which is a 32bit field, assuming atomicity guaranteed. > > Atomicity doesn't guarantee ordering, unfortunately. Take the > > following example: CPU0 is changing a bunch of flags for GFNs A, B, C, > > D that exist in the ring in that order, and CPU1 performs an ioctl to > > reset the page state. > > > > CPU0: > > write_flag(A, KVM_DIRTY_GFN_F_RESET) > > write_flag(B, KVM_DIRTY_GFN_F_RESET) > > write_flag(C, KVM_DIRTY_GFN_F_RESET) > > write_flag(D, KVM_DIRTY_GFN_F_RESET) > > [...] > > > > CPU1: > > ioctl(KVM_RESET_DIRTY_RINGS) > > > > Since CPU0 writes do not have any ordering, CPU1 can observe the > > writes in a sequence that have nothing to do with program order, and > > could for example observe that GFN A and D have been reset, but not B > > and C. This in turn breaks the logic in the reset code (B, C, and D > > don't get reset), despite userspace having followed the spec to the > > letter. If each was a store-release (which is the case on x86), it > > wouldn't be a problem, but nothing calls it in the documentation. > > > > Maybe that's not a big deal if it is expected that each CPU will issue > > a KVM_RESET_DIRTY_RINGS itself, ensuring that it observe its own > > writes. But expecting this to work across CPUs without any barrier is > > wishful thinking. > > Agreed, but that's a problem for userspace to solve. If userspace > wants to reset the fields in different CPUs, it has to synchronize > with its own invoking of the ioctl. userspace has no choice. It cannot order on its own the reads that the kernel will do to *other* rings. > That is, CPU0 must ensure that a ioctl(KVM_RESET_DIRTY_RINGS) is done > after (in the memory-ordering sense) its last write_flag(D, > KVM_DIRTY_GFN_F_RESET). If there's no such ordering, there's no > guarantee that the write_flag will have any effect. The problem isn't on CPU0 The problem is that CPU1 does observe inconsistent data on arm64, and I don't think this difference in behaviour is acceptable. Nothing documents this, and there is a baked in assumption that there is a strong ordering between writes as well as between writes and read. > The main reason why I preferred a global KVM_RESET_DIRTY_RINGS ioctl > was because it takes kvm->slots_lock so the execution would be > serialized anyway. Turning slots_lock into an rwsem would be even > worse because it also takes kvm->mmu_lock (since slots_lock is a > mutex, at least two concurrent invocations won't clash with each other > on the mmu_lock). Whatever the reason, the behaviour should be identical on all architectures. As is is, it only really works on x86, and I contend this is a bug that needs fixing. Thankfully, this can be done at zero cost for x86, and at that of a set of load-acquires on other architectures. M. -- Without deviation from the norm, progress is not possible.