Re: [PATCH 2/2] x86: kvm: hyperv: don't retry message delivery for periodic timers

Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> · Tue, 11 Dec 2018 13:45:05 +0100

Roman Kagan <rkagan@xxxxxxxxxxxxx> writes:

> The SynIC message delivery protocol allows the message originator to
> request, should the message slot be busy, to be notified when it's free.
>
> However, this is unnecessary and even undesirable for messages generated
> by SynIC timers in periodic mode: if the period is short enough compared
> to the time the guest spends in the timer interrupt handler, so the
> timer ticks start piling up, the excessive interactions due to this
> notification and retried message delivery only makes the things worse.
>
> [This was observed, in particular, with Windows L2 guests setting
> (temporarily) the periodic timer to 2 kHz, and spending hundreds of
> microseconds in the timer interrupt handler due to several L2->L1 exits;
> under some load in L0 this could exceed 500 us so the timer ticks
> started to pile up and the guest livelocked.]
>
> Relieve the situation somewhat by not retrying message delivery for
> periodic SynIC timers.  This appears to remain within the "lazy" lost
> ticks policy for SynIC timers as implemented in KVM.
>
> Note that it doesn't solve the fundamental problem of livelocking the
> guest with a periodic timer whose period is smaller than the time needed
> to process a tick, but it makes it a bit less likely to be triggered.
>

We seem to be ignoring 'lazy' setting for stimers completely; Going
forward, do you think it would make sense to try to implement both
lazy and non-lazy modes for stimers as they're described in TLFS
(basically, we need to implement 'non-lazy' mode)?

(now I'm interested in which mode Windows and Hyper-V actually use).

> Signed-off-by: Roman Kagan <rkagan@xxxxxxxxxxxxx>
> ---
>  arch/x86/kvm/hyperv.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 17d04d2c6d4f..3f2b93ad9ccf 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -557,7 +557,7 @@ static int stimer_get_count(struct kvm_vcpu_hv_stimer *stimer, u64 *pcount)
>  }
>  
>  static int synic_deliver_msg(struct kvm_vcpu_hv_synic *synic, u32 sint,
> -			     struct hv_message *src_msg)
> +			     struct hv_message *src_msg, bool no_retry)
>  {
>  	struct kvm_vcpu *vcpu = synic_to_vcpu(synic);
>  	int msg_off = offsetof(struct hv_message_page, sint_message[sint]);
> @@ -584,6 +584,9 @@ static int synic_deliver_msg(struct kvm_vcpu_hv_synic *synic, u32 sint,
>  		return r;
>  
>  	if (hv_hdr.message_type != HVMSG_NONE) {
> +		if (no_retry)
> +			return 0;
> +
>  		hv_hdr.message_flags.msg_pending = 1;
>  		r = kvm_vcpu_write_guest_page(vcpu, msg_page_gfn,
>  					      &hv_hdr.message_flags,
> @@ -616,11 +619,17 @@ static int stimer_send_msg(struct kvm_vcpu_hv_stimer *stimer)
>  	struct hv_message *msg = &stimer->msg;
>  	struct hv_timer_message_payload *payload =
>  			(struct hv_timer_message_payload *)&msg->u.payload;
> +	/*
> +	 * don't retry message delivery for periodic ticks, otherwise they may
> +	 * pile up
> +	 */

Capital 'D' and hard stop at the end would be appreciated :-) Also, it
may make sense to mention our 'laziness' here.

> +	bool no_retry = stimer->config & HV_STIMER_PERIODIC;
>  
>  	payload->expiration_time = stimer->exp_time;
>  	payload->delivery_time = get_time_ref_counter(vcpu->kvm);
>  	return synic_deliver_msg(vcpu_to_synic(vcpu),
> -				 HV_STIMER_SINT(stimer->config), msg);
> +				 HV_STIMER_SINT(stimer->config), msg,
> +				 no_retry);
>  }
>  
>  static void stimer_expiration(struct kvm_vcpu_hv_stimer *stimer)

No mater how hard we try missing ticks for periodic timers is inevitable
and TLFS states that for both lazy and non-lazy modes. As we seem to be
implementing 'lazy' mode for now I don't envision breakages.

Reviewed-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>

-- 
Vitaly