> Two ideas: > 1. How about writing out used, just delaying the signal? > This way we don't have to queue separately. This improves some performance, but not as good as delaying both used and signal. Since delaying used buffers combining multiple small copies to a large copy, which saves more CPU utilization and increased some BW. > 2. How about flushing out queued stuff before we exit > the handle_tx loop? That would address most of > the spec issue. The performance is almost as same as the previous patch. I will resubmit the modified one, adding vhost_add_used_and_signal_n after handle_tx loop for processing pending queue. This patch was a part of modified macvtap zero copy which I haven't submitted yet. I found this helped vhost TX in general. This pending queue will be used by DMA done later, so I put it in vq instead of a local variable in handle_tx. Thanks Shirley -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html