On Thu, 2009-01-22 at 12:48 +0000, Mark McLoughlin wrote: > On Thu, 2009-01-22 at 14:12 +0200, Avi Kivity wrote: > > > My worry with this change is that increases cpu utilization even more > > than it increases bandwidth, so that our bits/cycle measure decreases. > > Yep, agreed it's important to watch out for this. > > > The descriptors (and perhaps data) are likely on the same cache as the > > vcpu, and moving the transmit to the iothread will cause them to move to > > the iothread's cache. > > We flush from the I/O thread right now. > > We only ever flush from the vcpu thread when the ring fills up, which > rarely happens from what I've observed. Sorry to have come in late to the discussion, but it seems like maybe it needed another kick after a couple months anyway. As noted, we are mostly (almost exclusively?) doing TX from the timer anyway, so this change or Mark's previous patch series don't really change current cache effects. I am curious what happens to latency with Mark's series since that isn't really addressed by the charts, hopefully good things without the tx_timer. A thread per device or perhaps even a thread per RX/TX stream seems like a logical goal, but these current patches do provide a worthwhile incremental improvement. Perhaps we could affinitize the guest to do I/O on a specific vcpu via _PXM methods in ACPI so we can provide hints to the scheduler to keep a vcpu thread and it's associated I/O threads nearby. Thanks, Alex -- Alex Williamson HP Open Source & Linux Org. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html