On Fri, Aug 07, 2020 at 02:36:17PM +0200, Paolo Bonzini wrote: > On 05/08/20 02:07, Paul E. McKenney wrote: > > > > We are seeing occasional odd hangs, but only in cases where guest OSes > > are being migrated. Migrating more often makes the hangs happen more > > frequently. > > > > Added debug showed that the hung CPU is stuck trying to send an IPI (e.g., > > smp_call_function_single()). The hung CPU thinks that it has sent the > > IPI, but the destination CPU has interrupts enabled (-not- disabled, > > enabled, as in ready, willing, and able to take interrupts). In fact, > > the destination CPU usually is going about its business as if nothing > > was wrong, which makes me suspect that the IPI got lost somewhere along > > the way. > > > > I bumbled a bit through the qemu and KVM source, and didn't find anything > > synchronizing IPIs and migrations, though given that I know pretty much > > nothing about either qemu or KVM, this doesn't count for much. > > The code migrating the interrupt controller is in > kvm_x86_ops.sync_pir_to_irr (which calls vmx_sync_pir_to_irr) and > kvm_apic_get_state. kvm_apic_get_state is called after CPUs are stopped. > > It's possible that we're missing a kvm_x86_ops.sync_pir_to_irr call > somewhere. It would be surprising but it would explain the symptoms > very well. Thank you for the info, Paolo! I will see what I can find. ;-) Thanx, Paul