On Tue, Mar 26, 2024 at 03:49:02AM -0400, Michael S. Tsirkin wrote: > On Mon, Mar 25, 2024 at 05:34:29PM +1000, Gavin Shan wrote: > > > > On 3/20/24 17:14, Michael S. Tsirkin wrote: > > > On Wed, Mar 20, 2024 at 03:24:16PM +1000, Gavin Shan wrote: > > > > On 3/20/24 10:49, Michael S. Tsirkin wrote:> > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > > > > index 6f7e5010a673..79456706d0bd 100644 > > > > > --- a/drivers/virtio/virtio_ring.c > > > > > +++ b/drivers/virtio/virtio_ring.c > > > > > @@ -685,7 +685,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, > > > > > /* Put entry in available array (but don't update avail->idx until they > > > > > * do sync). */ > > > > > avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1); > > > > > - vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head); > > > > > + u16 headwithflag = head | (q->split.avail_idx_shadow & ~(vq->split.vring.num - 1)); > > > > > + vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, headwithflag); > > > > > /* Descriptors and available array need to be set before we expose the > > > > > * new available array entries. */ > > > > > > > > > Ok, Michael. I continued with my debugging code. It still looks like a > > hardware bug on NVidia's grace-hopper. I really think NVidia needs to be > > involved for the discussion, as suggested by you. > > Do you have a support contact at Nvidia to report this? > > > Firstly, I bind the vhost process and vCPU thread to CPU#71 and CPU#70. > > Note that I have only one vCPU in my configuration. > > Interesting but is guest built with CONFIG_SMP set? arm64 is always built CONFIG_SMP. > > Secondly, the debugging code is enhanced so that the available head for > > (last_avail_idx - 1) is read for twice and recorded. It means the available > > head for one specific available index is read for twice. I do see the > > available heads are different from the consecutive reads. More details > > are shared as below. > > > > From the guest side > > =================== > > > > virtio_net virtio0: output.0:id 86 is not a head! > > head to be released: 047 062 112 > > > > avail_idx: > > 000 49665 > > 001 49666 <-- > > : > > 015 49664 > > what are these #s 49665 and so on? > and how large is the ring? > I am guessing 49664 is the index ring size is 16 and > 49664 % 16 == 0 More than that, 49664 % 256 == 0 So again there seems to be an error in the vicinity of roll-over of the idx low byte, as I observed in the earlier log. Surely this is more than coincidence? -- Keir > > avail_head: > > > is this the avail ring contents? > > > 000 062 > > 001 047 <-- > > : > > 015 112 > > > What are these arrows pointing at, btw? > > > > From the host side > > ================== > > > > avail_idx > > 000 49663 > > 001 49666 <--- > > : > > > > avail_head > > 000 062 (062) > > 001 047 (047) <--- > > : > > 015 086 (112) // head 086 is returned from the first read, > > // but head 112 is returned from the second read > > > > vhost_get_vq_desc: Inconsistent head in two read (86 -> 112) for avail_idx 49664 > > > > Thanks, > > Gavin > > OK thanks so this proves it is actually the avail ring value. > > -- > MST >