On Saturday, October 13, 2018 12:31 AM, Andi Kleen wrote: > > 4. Results > > - Without this optimization, the guest pmi handling time is > > ~4500000 ns, and the max sampling rate is reduced to 250. > > - With this optimization, the guest pmi handling time is ~9000 ns > > (i.e. 1 / 500 of the non-optimization case), and the max sampling > > rate remains at the original 100000. > > Impressive performance improvement! > > It's not clear to me why you're special casing PMIs here. The optimization > should work generically, right? Yes, seems doable. I plan to try some lazy approach for the perf event allocation. > Is that guaranteed to be always called on the right CPU that will run the vcpu? > > AFAIK there's an ioctl to set MSRs in the guest from qemu, I'm pretty sure it > won't handle that. Thanks, will consider that case. Best, Wei