Marcelo Tosatti wrote: > On Fri, May 08, 2009 at 10:59:00AM +0300, Avi Kivity wrote: > >> Marcelo Tosatti wrote: >> >>> I think comparison is not entirely fair. You're using >>> KVM_HC_VAPIC_POLL_IRQ ("null" hypercall) and the compiler optimizes that >>> (on Intel) to only one register read: >>> >>> nr = kvm_register_read(vcpu, VCPU_REGS_RAX); >>> >>> Whereas in a real hypercall for (say) PIO you would need the address, >>> size, direction and data. >>> >>> >> Well, that's probably one of the reasons pio is slower, as the cpu has >> to set these up, and the kernel has to read them. >> >> >>> Also for PIO/MMIO you're adding this unoptimized lookup to the >>> measurement: >>> >>> pio_dev = vcpu_find_pio_dev(vcpu, port, size, !in); >>> if (pio_dev) { >>> kernel_pio(pio_dev, vcpu, vcpu->arch.pio_data); >>> complete_pio(vcpu); return 1; >>> } >>> >>> >> Since there are only one or two elements in the list, I don't see how it >> could be optimized. >> > > speaker_ioport, pit_ioport, pic_ioport and plus nulldev ioport. nulldev > is probably the last in the io_bus list. > > Not sure if this one matters very much. Point is you should measure the > exit time only, not the pio path vs hypercall path in kvm. > The problem is the exit time in of itself isnt all that interesting to me. What I am interested in measuring is how long it takes KVM to process the request and realize that I want to execute function "X". Ultimately that is what matters in terms of execution latency and is thus the more interesting data. I think the exit time is possibly an interesting 5th data point, but its more of a side-bar IMO. In any case, I suspect that both exits will be approximately the same at the VT/SVM level. OTOH: If there is a patch out there to improve KVMs code (say specifically the PIO handling logic), that is fair-game here and we should benchmark it. For instance, if you have ideas on ways to improve the find_pio_dev performance, etc.... One item may be to replace the kvm->lock on the bus scan with an RCU or something.... (though PIOs are very frequent and the constant re-entry to an an RCU read-side CS may effectively cause a perpetual grace-period and may be too prohibitive). CC'ing pmck. FWIW: the PIOoHCs were about 140ns slower than pure HC, so some of that 140 can possibly be recouped. I currently suspect the lock acquisition in the iobus-scan is the bulk of that time, but that is admittedly a guess. The remaining 200-250ns is elsewhere in the PIO decode. -Greg
Attachment:
signature.asc
Description: OpenPGP digital signature