Anthony Liguori wrote: > Avi Kivity wrote: >> >> Hmm, reminds me of something I thought of a while back. >> >> We could implement an 'mmio hypercall' that does mmio reads/writes >> via a hypercall instead of an mmio operation. That will speed up >> mmio for emulated devices (say, e1000). It's easy to hook into Linux >> (readl/writel), is pci-friendly, non-x86 friendly, etc. > > By the time you get down to userspace for an emulated device, that 2us > difference between mmio and hypercalls is simply not going to make a > difference. I don't care about this path for emulated devices. I am interested in in-kernel vbus devices. > I'm surprised so much effort is going into this, is there any > indication that this is even close to a bottleneck in any circumstance? Yes. Each 1us of overhead is a 4% regression in something as trivial as a 25us UDP/ICMP rtt "ping". > > > We have much, much lower hanging fruit to attack. The basic fact that > we still copy data multiple times in the networking drivers is clearly > more significant than a few hundred nanoseconds that should occur less > than once per packet. for request-response, this is generally for *every* packet since you cannot exploit buffering/deferring. Can you back up your claim that PPC has no difference in performance with an MMIO exit and a "hypercall" (yes, I understand PPC has no "VT" like instructions, but clearly there are ways to cause a trap, so presumably we can measure the difference between a PF exit and something more explicit). We need numbers before we can really decide to abandon this optimization. If PPC mmio has no penalty over hypercall, I am not sure the 350ns on x86 is worth this effort (especially if I can shrink this with some RCU fixes). Otherwise, the margin is quite a bit larger. -Greg
Attachment:
signature.asc
Description: OpenPGP digital signature