On Thu, Oct 28, 2021, Maxim Levitsky wrote: > On Fri, 2021-10-08 at 19:12 -0700, Sean Christopherson wrote: > > Use READ_ONCE() when loading the posted interrupt descriptor control > > field to ensure "old" and "new" have the same base value. If the > > compiler emits separate loads, and loads into "new" before "old", KVM > > could theoretically drop the ON bit if it were set between the loads. > > > > Fixes: 28b835d60fcc ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted") > > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > > --- > > arch/x86/kvm/vmx/posted_intr.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c > > index 414ea6972b5c..fea343dcc011 100644 > > --- a/arch/x86/kvm/vmx/posted_intr.c > > +++ b/arch/x86/kvm/vmx/posted_intr.c > > @@ -53,7 +53,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu) > > > > /* The full case. */ > > do { > > - old.control = new.control = pi_desc->control; > > + old.control = new.control = READ_ONCE(pi_desc->control); > > > > dest = cpu_physical_id(cpu); > > > > @@ -104,7 +104,7 @@ static void __pi_post_block(struct kvm_vcpu *vcpu) > > "Wakeup handler not enabled while the vCPU was blocking"); > > > > do { > > - old.control = new.control = pi_desc->control; > > + old.control = new.control = READ_ONCE(pi_desc->control); > > > > dest = cpu_physical_id(vcpu->cpu); > > > > @@ -160,7 +160,7 @@ int pi_pre_block(struct kvm_vcpu *vcpu) > > "Posted Interrupt Suppress Notification set before blocking"); > > > > do { > > - old.control = new.control = pi_desc->control; > > + old.control = new.control = READ_ONCE(pi_desc->control); > > > > /* set 'NV' to 'wakeup vector' */ > > new.nv = POSTED_INTR_WAKEUP_VECTOR; > > I wish there was a way to mark fields in a struct, as requiring 'READ_ONCE' on them > so that compiler would complain if this isn't done, or automatically use 'READ_ONCE' > logic. Hmm, I think you could make an argument that ON and thus the whole "control" word should be volatile. AFAICT, tagging just "on" as volatile actually works. There's even in a clause in Documentation/process/volatile-considered-harmful.rst that calls this out as a (potentially) legitimate use case. - Pointers to data structures in coherent memory which might be modified by I/O devices can, sometimes, legitimately be volatile. That said, I think I actually prefer forcing the use of READ_ONCE. The descriptor requires more protections than what volatile provides, namely that all writes need to be atomic. So given that volatile alone isn't sufficient, I'd prefer to have the code itself be more self-documenting. E.g. this compiles and does mess up the expected size. diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h index 7f7b2326caf5..149df3b18789 100644 --- a/arch/x86/kvm/vmx/posted_intr.h +++ b/arch/x86/kvm/vmx/posted_intr.h @@ -11,9 +11,9 @@ struct pi_desc { union { struct { /* bit 256 - Outstanding Notification */ - u16 on : 1, + volatile u16 on : 1; /* bit 257 - Suppress Notification */ - sn : 1, + u16 sn : 1, /* bit 271:258 - Reserved */ rsvd_1 : 14; /* bit 279:272 - Notification Vector */ @@ -23,7 +23,7 @@ struct pi_desc { /* bit 319:288 - Notification Destination */ u32 ndst; }; - u64 control; + volatile u64 control; }; u32 rsvd[6]; } __aligned(64);