Roman Kagan <rkagan@xxxxxxxxxxxxx> writes: > The SynIC message delivery protocol allows the message originator to > request, should the message slot be busy, to be notified when it's free. > > However, this is unnecessary and even undesirable for messages generated > by SynIC timers in periodic mode: if the period is short enough compared > to the time the guest spends in the timer interrupt handler, so the > timer ticks start piling up, the excessive interactions due to this > notification and retried message delivery only makes the things worse. > > [This was observed, in particular, with Windows L2 guests setting > (temporarily) the periodic timer to 2 kHz, and spending hundreds of > microseconds in the timer interrupt handler due to several L2->L1 exits; > under some load in L0 this could exceed 500 us so the timer ticks > started to pile up and the guest livelocked.] > > Relieve the situation somewhat by not retrying message delivery for > periodic SynIC timers. This appears to remain within the "lazy" lost > ticks policy for SynIC timers as implemented in KVM. > > Note that it doesn't solve the fundamental problem of livelocking the > guest with a periodic timer whose period is smaller than the time needed > to process a tick, but it makes it a bit less likely to be triggered. > We seem to be ignoring 'lazy' setting for stimers completely; Going forward, do you think it would make sense to try to implement both lazy and non-lazy modes for stimers as they're described in TLFS (basically, we need to implement 'non-lazy' mode)? (now I'm interested in which mode Windows and Hyper-V actually use). > Signed-off-by: Roman Kagan <rkagan@xxxxxxxxxxxxx> > --- > arch/x86/kvm/hyperv.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c > index 17d04d2c6d4f..3f2b93ad9ccf 100644 > --- a/arch/x86/kvm/hyperv.c > +++ b/arch/x86/kvm/hyperv.c > @@ -557,7 +557,7 @@ static int stimer_get_count(struct kvm_vcpu_hv_stimer *stimer, u64 *pcount) > } > > static int synic_deliver_msg(struct kvm_vcpu_hv_synic *synic, u32 sint, > - struct hv_message *src_msg) > + struct hv_message *src_msg, bool no_retry) > { > struct kvm_vcpu *vcpu = synic_to_vcpu(synic); > int msg_off = offsetof(struct hv_message_page, sint_message[sint]); > @@ -584,6 +584,9 @@ static int synic_deliver_msg(struct kvm_vcpu_hv_synic *synic, u32 sint, > return r; > > if (hv_hdr.message_type != HVMSG_NONE) { > + if (no_retry) > + return 0; > + > hv_hdr.message_flags.msg_pending = 1; > r = kvm_vcpu_write_guest_page(vcpu, msg_page_gfn, > &hv_hdr.message_flags, > @@ -616,11 +619,17 @@ static int stimer_send_msg(struct kvm_vcpu_hv_stimer *stimer) > struct hv_message *msg = &stimer->msg; > struct hv_timer_message_payload *payload = > (struct hv_timer_message_payload *)&msg->u.payload; > + /* > + * don't retry message delivery for periodic ticks, otherwise they may > + * pile up > + */ Capital 'D' and hard stop at the end would be appreciated :-) Also, it may make sense to mention our 'laziness' here. > + bool no_retry = stimer->config & HV_STIMER_PERIODIC; > > payload->expiration_time = stimer->exp_time; > payload->delivery_time = get_time_ref_counter(vcpu->kvm); > return synic_deliver_msg(vcpu_to_synic(vcpu), > - HV_STIMER_SINT(stimer->config), msg); > + HV_STIMER_SINT(stimer->config), msg, > + no_retry); > } > > static void stimer_expiration(struct kvm_vcpu_hv_stimer *stimer) No mater how hard we try missing ticks for periodic timers is inevitable and TLFS states that for both lazy and non-lazy modes. As we seem to be implementing 'lazy' mode for now I don't envision breakages. Reviewed-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> -- Vitaly