Re: [Qemu-devel] [RFC v4 00/58] Memory API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/20/2011 03:10 AM, Avi Kivity wrote:
On 07/19/2011 11:51 PM, Anthony Liguori wrote:
On 07/19/2011 11:10 AM, Avi Kivity wrote:
On 07/19/2011 07:05 PM, Avi Kivity wrote:
On 07/19/2011 05:50 PM, Anthony Liguori wrote:


There's bits I don't like about the interface

Which bits are these?

Nothing I haven't already commented on. I think there's too much in
the generic level. I don't think coalesced I/O belongs here. It's a
concept that doesn't fit. I think a side-band API would be nicer.

Well, it's impossible to do it in a side band. When a range that has
coalesced mmio is exposed is completely orthogonal to programming the
BAR register - it can happen, for example, due to another BAR being
removed or the bridge window being programmed. You can also have a
coalesced mmio region being partially clipped.

Of course, it's not really impossible, just clumsy.

There are exactly two devices that use coalesced I/O: VGA and e1000.

VGA does coalesced I/O over the legacy VGA region (0xa0000 ...
0xc0000). This region is very special in the PC and is directly routed
by the I440FX to the appropriate first PCI graphics card.

The VGA device knows exactly where this region is mapped.

The VGA device doesn't know *if* it is mapped. It can be obstructed by
the chipset and by SMM. Other chipsets we emulate may support multiple
VGA cards.

The i440fx can support multiple VGA cards just fine.

Legacy region accesses are always routed by the PCI bus to the first PCI device that identifies itself as a graphics card.

The card is very well aware of the fact that it is getting legacy VGA accesses or not because only one card can register for this area.

The e1000 does coalesced I/O for it's memory registers. But it's
dubious how much this actually matters anymore. The original claim was
a 10% boost with iperf.

The e1000 is not performance competitive with virtio-net though so it
certainly is reasonable to assume that noone would notice if we
removed coalesced I/O from the e1000.

The e1000 NIC is the best we have for guests that don't support virtio.
It's not reasonable to reduce its performance.

So let's talk about real numbers. This is netperf with a default invocation from guest to host. All numbers are MB/sec

rtl8139
-------
119.45
118.12

e1000 w/coalesced mmio
----------------------
425.93
424.08

e1000 w/o coalesced mmio
------------------------
419.13
413.83

virtio-net
----------
4330.52
4419.90

So removing coalesced MMIO from the e1000 results in a massive 0.7% slowdown :-)

And while the e100 is > 100% faster than the rtl8139, it's still an order of magnitude slower the userspace virtio-net.

I'm confident that the e1000 could be improved if someone modified it to optimally use the new netdev interfaces. But no one cares that much about the performance of the e1000. And if we dropped coalesced MMIO support for the e1000, no one would notice.

Exits costs have changed dramatically over the years. Optimizations that made sense with P4 class hardware don't necessary make sense these days. QEMU has also changed a lot so bottle necks are no longer where they used to be.

The point is, it's so incredibly special cased that having it as part
of such a general purpose API seems wrong. Of the hundreds of devices,
we only have one device that we know for sure really needs it and it
could easily be done independent of the memory API for that device.


We either support coalesced mmio well, or not at all. Even if the API
has only one user, that doesn't excuse doing it badly.

It's not at all that black and white. We need to carefully choose what we model and then have the flexibility to break those models in the name of performance.

If we try to make everything fit elegantly into a model, we'll end up with something that's overly complex just to accommodate a single user. That's my general concern with where we're going here.

I don't think it's too bad and as I said, I don't object to it in it's current form. But I think it could be simplified. Even in it's current non-simple form, it's better than what we currently have.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux