Re: [PATCH] virtio_ring: fix packed ring event may missing

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Sun, 27 Oct 2019 05:54:58 -0400

On Thu, Oct 24, 2019 at 11:50:51AM +0800, Jason Wang wrote:
> 
> On 2019/10/24 上午11:26, Liu, Yong wrote:
> > 
> > > -----Original Message-----
> > > From: Jason Wang [mailto:jasowang@xxxxxxxxxx]
> > > Sent: Tuesday, October 22, 2019 9:06 PM
> > > To: Liu, Yong <yong.liu@xxxxxxxxx>; mst@xxxxxxxxxx; Bie, Tiwei
> > > <tiwei.bie@xxxxxxxxx>
> > > Cc: virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > > Subject: Re: [PATCH] virtio_ring: fix packed ring event may missing
> > > 
> > > 
> > > On 2019/10/22 下午2:48, Liu, Yong wrote:
> > > > Hi Jason,
> > > > My answers are inline.
> > > > 
> > > > > -----Original Message-----
> > > > > From: Jason Wang [mailto:jasowang@xxxxxxxxxx]
> > > > > Sent: Tuesday, October 22, 2019 10:45 AM
> > > > > To: Liu, Yong <yong.liu@xxxxxxxxx>; mst@xxxxxxxxxx; Bie, Tiwei
> > > > > <tiwei.bie@xxxxxxxxx>
> > > > > Cc: virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > > > > Subject: Re: [PATCH] virtio_ring: fix packed ring event may missing
> > > > > 
> > > > > 
> > > > > On 2019/10/22 上午1:10, Marvin Liu wrote:
> > > > > > When callback is delayed, virtio expect that vhost will kick when
> > > > > > rolling over event offset. Recheck should be taken as used index may
> > > > > > exceed event offset between status check and driver event update.
> > > > > > 
> > > > > > However, it is possible that flags was not modified if descriptors are
> > > > > > chained or in_order feature was negotiated. So flags at event offset
> > > > > > may not be valid for descriptor's status checking. Fix it by using last
> > > > > > used index as replacement. Tx queue will be stopped if there's not
> > > > > > enough freed buffers after recheck.
> > > > > > 
> > > > > > Signed-off-by: Marvin Liu <yong.liu@xxxxxxxxx>
> > > > > > 
> > > > > > diff --git a/drivers/virtio/virtio_ring.c
> > > b/drivers/virtio/virtio_ring.c
> > > > > > index bdc08244a648..a8041e451e9e 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -1499,9 +1499,6 @@ static bool
> > > > > virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> > > > > >     		 * counter first before updating event flags.
> > > > > >     		 */
> > > > > >     		virtio_wmb(vq->weak_barriers);
> > > > > > -	} else {
> > > > > > -		used_idx = vq->last_used_idx;
> > > > > > -		wrap_counter = vq->packed.used_wrap_counter;
> > > > > >     	}
> > > > > > 
> > > > > >     	if (vq->packed.event_flags_shadow ==
> > > VRING_PACKED_EVENT_FLAG_DISABLE)
> > > > > {
> > > > > > @@ -1518,7 +1515,9 @@ static bool
> > > > > virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> > > > > >     	 */
> > > > > >     	virtio_mb(vq->weak_barriers);
> > > > > > 
> > > > > > -	if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
> > > > > > +	if (is_used_desc_packed(vq,
> > > > > > +				vq->last_used_idx,
> > > > > > +				vq->packed.used_wrap_counter)) {
> > > > > >     		END_USE(vq);
> > > > > >     		return false;
> > > > > >     	}
> > > > > Hi Marvin:
> > > > > 
> > > > > Two questions:
> > > > > 
> > > > > 1) Do we support IN_ORDER in kernel driver?
> > > > > 
> > > > Not support by now. But issue still can be possible if in_direct disabled
> > > and meanwhile descs are chained.
> > > > Due to packed ring desc status should check one by one, chose arbitrary
> > > position may cause issue.
> > > 
> > > 
> > > Right, then it's better to mention IN_ORDER as future features.
> > > 
> > > 
> > > > > 2) Should we check IN_ORDER in this case otherwise we may end up with
> > > > > interrupt storm when IN_ORDER is not negotiated?
> > > > Interrupt number will not increase here, event offset value calculated as
> > > before.
> > > > Here just recheck whether new used descs is enough for next around xmit.
> > > > If backend was slow, most likely Tx queue will sleep for a while until
> > > used index go over event offset.
> > > 
> > > 
> > > Ok, but what if the backend is almost as fast as guest driver? E.g in
> > > virtio-net we had:
> > > 
> > >       if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
> > >           netif_stop_subqueue(dev, qnum);
> > >           if (!use_napi &&
> > >               unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> > >               /* More just got used, free them then recheck. */
> > >               free_old_xmit_skbs(sq, false);
> > >               if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> > >                   netif_start_subqueue(dev, qnum);
> > >                   virtqueue_disable_cb(sq->vq);
> > >               }
> > >           }
> > >       }
> > > 
> > > I worry that we may end up with toggling queue state in the case
> > > (sq->vq->num_free is near 2 + MAX_SKB_FRAGS).
> > > 
> > Yes, at this worst case each packet will add extra twice event flags write. Due to backend only read this value, the cost won't too much.
> 
> 
> For driver, it means extra overheads, atomics, less batching, stats updating
> etc. For backend, cacheline will bounce between two cpus.
> 
> 
> > Even we can track down chained descs and figure out whether event offset indexed desc is used. There's still possibility that flags is invalid.
> > One case is that backend can buffer multiple descs by not updating the first one. We cannot guarantee that later flags is usable until check from the first one.
> 
> 
> In this case, since we've stopped tx queue, so there's no new buffers added.
> It doesn't matter we get notified when the 3/4 or all of the descriptors has
> been used.
> 
> Thanks

Well - checking the next descriptor will likely result in moving the
event index forward, which will thinkably reduce the # of interrupts.
So it's hard to predict which is better.  I'll apply the patch for now
as it's simple and safe.  If someone has the time to work on tuning all
this, that would be great.

> 
> > 
> > Regards,
> > Marvin
> > 
> > > It looks to me the correct thing to implement is to calculate the head
> > > descriptor of a chain that sits at 3/4.
> > > 
> > > Thanks
> > > 
> > > 
> > > > Thanks,
> > > > Marvin
> > > > 
> > > > > Thanks
> > > > > 
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization