Re: [PATCH kvmtool] virtio: Fix ordering of virt_queue__should_signal()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 31, 2020 at 11:14:27AM +0100, Alexandru Elisei wrote:
> The guest programs used_event in the avail queue to let the host know when

                                 nit: "avail ring"...

> it wants a notification from the device. The host notifies the guest when
> the used queue vring index passes used_event.

... and "used ring" are more consistent with the virtio spec.

> The virtio-blk guest driver, in the notification callback, sets used_event
> to the value of the current used queue vring index, which means it will get
> a notification after the next buffer is consumed by the host. It is
> possible for the guest to submit a job, and then go into uninterruptible
> sleep waiting for the notification (for example, via submit_bio_wait()).
> 
> On the host side, the virtio-blk device increments the used queue vring
> index, then compares it to used_event to decide if a notification should be
> sent.
> 
> A memory barrier between writing the new index value and reading used_event
> is needed to make sure we read the latest used_event value.  kvmtool uses a
> write memory barrier, which on arm64 translates into DMB ISHST. The
> instruction orders memory writes that have been executed in program order
> before the barrier relative to writes that have been executed in program
> order after the barrier.

Not sure this sentence is necessary.

> The barrier doesn't apply to reads, which means it
> is not enough to prevent reading a stale value for used_event. This can
> lead to the host not sending the notification, and the guest thread remains
> stuck indefinitely waiting for IO to complete.

It might be helpful to detail what the guest does, to identify which
barrier this pairs with on the guest side. I had a hard time convincing
myself that this is the right fix, but I think I got there in the end.

Kvmtool currently does:

	// virt_queue__used_idx_advance()
		vring.used.idx = idx
		wmb()
	// virtio_queue__should_signal()
	if (vring.used_event is between old and new idx)
		notify()

When simplifying Linux virtblk_done() I get:

	while (last_used != vring.used.idx) {
		...
		vring.used_event = ++last_used
		mb() // virtio_store_mb() in virtqueue_get_buf_ctx_split()
		...
		// unnecessary write+mb, here for completeness
		vring.used_event = last_used
		mb() // virtqueue_poll()
	}

(1) Kvmtool:
    writes vring.used.idx = 2
    wmb()
    reads vring.used_event = 1
    notifies the guest.

(2) Linux:
    (reads vring.used.idx = 2)
    writes vring.used_event = 2
    mb()
    reads vring.used.idx = 2
    returns from virtblk_done()

(3) Kvmtool:
    writes vring.used.idx = 3
    wmb()
    reads vring.used_event = 1
    doesn't notify, stalls the guest.

By replacing the wmb() with a full barrier we get the correct store buffer
pattern (SB+fencembonceonces.litmus).

> Using a memory barrier for reads and writes matches what the guest driver
> does when deciding to kick the host: after updating the avail queue vring
> index via virtio_queue_rq() -> virtblk_add_req() -> .. ->

Probably worth noting you're referring to Linux here.

> virtqueue_add_split(), it uses a read/write memory barrier before reading

                                or "full" memory barrier?

> avail_event from the used queue in virtio_commit_rqs() -> .. ->
> virtqueue_kick_prepare_split().
> 
> Also move the barrier in virt_queue__should_signal(), because the barrier
> is needed for notifications to work correctly, and it makes more sense to
> have it in the function that determines if the host should notify the
> guest.
> 
> Reported-by: Anvay Virkar <anvay.virkar@xxxxxxx>
> Suggested-by: Anvay Virkar <anvay.virkar@xxxxxxx>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@xxxxxxx>

Regardless of the nits above, I believe this patch is correct

Reviewed-by: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>

> ---
> This was observed by Anvay, where kvmtool reads the previous value of
> used_event and the notification is not sent.
> 
> I *think* this also fixes the VM hang reported in [1], where several
> processes in the guest were stuck in uninterruptible sleep. I am not
> familiar with the block layer, but my theory is that the threads were stuck
> in wait_for_completion_io(), from blk_execute_rq() executing a flush
> request. It would be great if Milan could give this patch a spin and see if
> the problem goes away. Don't know how reproducible it is though.
> 
> [1] https://www.spinics.net/lists/kvm/msg204543.html
> 
>  virtio/core.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/virtio/core.c b/virtio/core.c
> index f5b3c07fc100..90a661d12e14 100644
> --- a/virtio/core.c
> +++ b/virtio/core.c
> @@ -33,13 +33,6 @@ void virt_queue__used_idx_advance(struct virt_queue *queue, u16 jump)
>  	wmb();
>  	idx += jump;
>  	queue->vring.used->idx = virtio_host_to_guest_u16(queue, idx);
> -
> -	/*
> -	 * Use wmb to assure used idx has been increased before we signal the guest.
> -	 * Without a wmb here the guest may ignore the queue since it won't see
> -	 * an updated idx.
> -	 */
> -	wmb();
>  }
>  
>  struct vring_used_elem *
> @@ -194,6 +187,14 @@ bool virtio_queue__should_signal(struct virt_queue *vq)
>  {
>  	u16 old_idx, new_idx, event_idx;
>  
> +	/*
> +	 * Use mb to assure used idx has been increased before we signal the
> +	 * guest, and we don't read a stale value for used_event. Without a mb
> +	 * here we might not send a notification that we need to send, or the
> +	 * guest may ignore the queue since it won't see an updated idx.
> +	 */
> +	mb();
> +
>  	if (!vq->use_event_idx) {
>  		/*
>  		 * When VIRTIO_RING_F_EVENT_IDX isn't negotiated, interrupt the
> -- 
> 2.28.0
> 



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux