On Sat, Jun 18, 2016 at 02:43:51PM +0100, Alan Jenkins wrote: > On 18/06/2016, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > > > >Alan Jenkins reports hang at > >https://bugzilla.redhat.com/show_bug.cgi?id=1337667. > > > >* APIC write: expiration = 1000. > >* LAPIC tsc deadline code sets timer to 1000-30. > >* Timer fires at 970. > >* Guest writes TSC=w. > > > >Guest fails to VM-entry to process signal to perform > >"vmload" in userspace. > > > >Case 1: w > 970: > >Guest entry can be performed. > > > >Case 2: w < 970: > >Guest entry should not be performed because "An interrupt is generated > >when the logical processor’s time-stamp counter equals or exceeds the > >target value in the IA32_TSC_DEADLINE MSR." > > > >Setting of APIC test resets all APIC state, including > >any pending interrupts. > > > >Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> > > > >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >index ea306ad..89be6e9 100644 > >--- a/arch/x86/kvm/x86.c > >+++ b/arch/x86/kvm/x86.c > >@@ -2991,6 +2991,7 @@ static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu > >*vcpu, > > { > > kvm_apic_post_state_restore(vcpu, s); > > update_cr8_intercept(vcpu); > >+ vcpu->arch.apic->lapic_timer.expired_tscdeadline = 0; > > > > return 0; > > } > > Hi > > I'm happy to check my own test if this is considered a solution. > (And apologies for how my own efforts go... flailing around and not > always communicating effectively...) Your report had many details. > 1) I think the commit message could be improved. Specifically it > says the problem is due to TSC write, then that we need to fix LAPIC > reset. It doesn't say what the connection is between the two. Right. > (I.e. you would have to read the bug report, & then realize that it > _happens_ to involve a LAPIC state restore as well, because it's > triggered by reloading the entire VM state from a snapshot) > > Doing so might make the intention of this commit clearer. I've been > trying to work out a high quality fix, which prevents the lockups > revealed by this case. I think this commit would fix my bug nicely, > but it doesn't help e.g. if the LAPIC restore was omitted from the > sequence. > > (Both TSC and the value of TSC deadline are set by MSRs, not by > writing to the LAPIC. So I don't think the LAPIC state restore is > necessary to trigger the issue). > > > 2) If this approach is considered expedient, I have two comments - > > i) isn't it better to set lapic_timer.expired_tscdeadline in > lapic.c, i.e. in apic_post_state_restore() ? > > ii) then, wouldn't it be more correct to set it between > hrtimer_cancel() and start_apic_timer()? It doesn't sound right to > clear expired_tscdeadline, in case it was set immediately by > start_apic_timer() > > Alan Advancing the timer expiration should only be necessary on guest initiated writes. Lets improve the commit message. Can you confirm the patch fixes your problem? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html