* Asias He <asias.hejun@xxxxxxxxx> wrote: > On 04/29/2011 03:30 PM, Ingo Molnar wrote: > > > > * Asias He <asias.hejun@xxxxxxxxx> wrote: > > > >> This patch fixes virtio-net randmom stall > >> > >> host $ scp guest:/root/big.guest . > >> big.guest 42% 440MB 67.7KB/s - stalled - > > > >> @@ -77,9 +77,12 @@ static void virtio_net_rx_callback(struct kvm *self, void *param) > >> head = virt_queue__get_iov(vq, iov, &out, &in, self); > >> len = readv(net_device.tap_fd, iov, in); > >> virt_queue__set_used_elem(vq, head, len); > >> - } > >> > >> - kvm__irq_line(self, VIRTIO_NET_IRQ, 1); > >> + /* We should interrupt guest right now, otherwise latency is huge. */ > >> + mutex_lock(&net_device.mutex); > >> + kvm__irq_line(self, VIRTIO_NET_IRQ, 1); > >> + mutex_unlock(&net_device.mutex); > > > > The movement of the irq-line assertion call (which fixes the hangs) should be > > separated from the mutex_lock() additions (which look unnecessary). > > I've sent out another patch to move this irq assertion. > > > > > Could you please check whether the addition of the mutex really is necessary to > > solve the hang/stall? > > > Without the mutex around the kvm__irq_line, I am still seeing the > hangs(with the irq assertion movement patch). since the mutex has no effect on the other party that modifies the virtual queue (the guest OS), this at best only changes timings and hides the bug. The queues are private to the worker thread they belong to - and are shared with the guest OS. So we need to document the queueing protocol, who does what, when, and then maybe it will become more obvious why the hangs are occuring. For example what happens when the RX queue gets full? The rx thread: while (virt_queue__available(vq)) { drops out of the TAP-read() loop and sleeps indefinitely. It will only be woken up if the guest signals that the queue is available again (there's free slots in it): case VIRTIO_PCI_QUEUE_NOTIFY: { uint16_t queue_index; queue_index = ioport__read16(data); virtio_net_handle_callback(self, queue_index); break; } ... static void virtio_net_handle_callback(struct kvm *self, uint16_t queue_index) { thread_pool__signal_work(net_device.jobs[queue_index]); } with queue_index == the RX vring. Now if this is correct then virtio_net_rx_callback() is only ever be called with a ring not full. You could test this by adding an assert like this to the head of that function: BUG_ON(!virt_queue__available(vq)) does this BUG_ON() ever trigger? Pick up BUG_ON() from tools/perf/: tools/perf/util/include/linux/kernel.h:#define BUG_ON(cond) assert(!(cond)) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html