On 3/27/24 09:14, Gavin Shan wrote:
On 3/27/24 01:46, Will Deacon wrote:
On Tue, Mar 26, 2024 at 11:43:13AM +0000, Will Deacon wrote:
Ok, long shot after eyeballing the vhost code, but does the diff below
help at all? It looks like vhost_vq_avail_empty() can advance the value
saved in 'vq->avail_idx' but without the read barrier, possibly confusing
vhost_get_vq_desc() in polling mode.
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 045f666b4f12..87bff710331a 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2801,6 +2801,7 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
return false;
vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
+ smp_rmb();
return vq->avail_idx == vq->last_avail_idx;
}
EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
Thanks, Will. I already noticed smp_rmb() has been missed in vhost_vq_avail_empty().
The issue still exists after smp_rmb() is added here. However, I'm inspired by your
suggestion and recheck the code again. It seems another smp_rmb() has been missed
in vhost_enable_notify().
With smp_rmb() added to vhost_vq_avail_empty() and vhost_enable_notify(), I'm unable
to hit the issue. I will try for more times to make sure the issue is really resolved.
After that, I will post formal patches for review.
Thanks again, Will. The formal patches have been sent for review.
https://lkml.org/lkml/2024/3/27/40
Thanks,
Gavin