* Sasha Levin <levinsasha928@xxxxxxxxx> wrote: > On Fri, 2011-04-29 at 12:22 +0200, Ingo Molnar wrote: > > If that is so then indeed the right approach might be to signal the guest every > > time we manage to readv() something - there might not come any other packet for > > a long time. But the reason is not some 'erroneous state' - all state is > > perfectly fine, this is simply a basic property of the event loop that the rx > > thread implements ... > > My idea as for 'erroneous state' was as follows: > > We have 2 virt queues: one for RX and one for TX, each on it's own > thread. > > RX Thread: > - Runs readv() and reads data. > - Calls virt_queue__set_used_elem() which starts updating the RX virt > queue. > > TX Thread: > - Runs readv() and reads data. Doesnt the TX thread do writev()? > - Calls and returns from virt_queue__set_used_elem(). > - Calls kvm__irq_line(). > > At this point, The RX queue state is being updated but since the IRQ was > signaled (same IRQ for both TX and RX) the guest virtio-net checks the > RX queue and finds a virt queue that wasn't fully updated. Well, guest side can look at the queue *anytime* - that is key to NAPI style network queue processing for example. So virt_queue__set_used_elem() needs to update the RX queue very carefully, and i think it's buggy there. Right now it does: struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, uint32_t head, uint32_t len) { struct vring_used_elem *used_elem; used_elem = &queue->vring.used->ring[queue->vring.used->idx++ % queue->vring.num]; used_elem->id = head; used_elem->len = len; return used_elem; that looks dangerous - what is the right protocol for touching the ring? It sure cannot be this. Firstly, write barriers are needed (wmb() in the kernel), to make sure the fields are modified in the right order and become visible to the guest in that order as well. Secondly, and more importantly, the queue index needs to be updated *after* the ring element has been modified. This ring's vring.used index is owned by the host side, so the guest will not update it in parallel - it will only look at it. So the sequence should be something like: struct vring_used_elem *used_elem; struct vring *vr; int new_idx; vr = &vq->vring; new_idx = (vr->used->idx + 1) % vr->num; /* Fill in the new ring descriptor: */ used_elem = vr->used->ring + new_idx; used_elem->id = head; used_elem->len = len; /* Write barrier, make sure the descriptor updates are visible first: */ wmb(); /* * Update the index to make the new element visible to the guest: */ vr->used->idx = new_idx; /* Write barrier, make sure the index update is visible to the guest before any IRQ: */ wmb(); return used_elem; Note that given current IRQ signalling techniques the second wmb() is unnecesarry, i only added it for completeness. Could you try this variant without the barriers? That alone could solve your hangs. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html