On Thu, Sep 10, 2020 at 01:13:51PM +0200, Guennadi Liakhovetski wrote: > +int vhost_rpmsg_start_lock(struct vhost_rpmsg *vr, struct vhost_rpmsg_iter *iter, > + unsigned int qid, ssize_t len) > + __acquires(vq->mutex) > +{ > + struct vhost_virtqueue *vq = vr->vq + qid; > + unsigned int cnt; > + ssize_t ret; > + size_t tmp; > + > + if (qid >= VIRTIO_RPMSG_NUM_OF_VQS) > + return -EINVAL; > + > + iter->vq = vq; > + > + mutex_lock(&vq->mutex); > + vhost_disable_notify(&vr->dev, vq); > + > + iter->head = vhost_rpmsg_get_msg(vq, &cnt); > + if (iter->head == vq->num) > + iter->head = -EAGAIN; > + > + if (iter->head < 0) { > + ret = iter->head; > + goto unlock; > + } > + [...] > + > +return_buf: > + vhost_add_used(vq, iter->head, 0); > +unlock: > + vhost_enable_notify(&vr->dev, vq); > + mutex_unlock(&vq->mutex); > + > + return ret; > +} There is a race condition here. New buffers could have been added while notifications were disabled (between vhost_disable_notify() and vhost_enable_notify()), so the other vhost drivers check the return value of vhost_enable_notify() and rerun their work loops if it returns true. This driver doesn't do that so it stops processing requests if that condition hits. Something like the below seems to fix it but the correct fix could maybe involve changing this API to account for this case so that it looks more like the code in other vhost drivers. diff --git a/drivers/vhost/rpmsg.c b/drivers/vhost/rpmsg.c index 7c753258d42..673dd4ec865 100644 --- a/drivers/vhost/rpmsg.c +++ b/drivers/vhost/rpmsg.c @@ -302,8 +302,14 @@ static void handle_rpmsg_req_kick(struct vhost_work *work) struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue, poll.work); struct vhost_rpmsg *vr = container_of(vq->dev, struct vhost_rpmsg, dev); + struct vhost_virtqueue *reqvq = vr->vq + VIRTIO_RPMSG_REQUEST; - while (handle_rpmsg_req_single(vr, vq)) + /* + * The !vhost_vq_avail_empty() check is needed since the vhost_rpmsg* + * APIs don't check the return value of vhost_enable_notify() and retry + * if there were buffers added while notifications were disabled. + */ + while (handle_rpmsg_req_single(vr, vq) || !vhost_vq_avail_empty(reqvq->dev, reqvq)) ; }