On Tue, May 10, 2022 at 1:55 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Tue, Apr 12, 2022, Wei Zhang wrote: > > The profiling buffer is indexed by (pc - _stext) in do_profile_hits(), > > which doesn't work for KVM profiling because the pc represents an address > > in the guest kernel. readprofile is broken in this case, unless the guest > > kernel happens to have the same _stext as the host kernel. > > > > This patch adds a new hypercall so guests could send its _stext to the > > host, which will then be used to adjust the calculation for KVM profiling. > > Disclaimer, I know nothing about using profiling. > > Why not just omit the _stext adjustment and profile the raw guest RIP? It seems > like userspace needs to know about the guest layout in order to make use of profling > info, so why not report raw info and let host userspace do all adjustments? It's hard to store raw IPs if we want to reuse the existing profiling facility. The profiling function is initially used to store the current IP at each clock tick for the host kernel. The original design avoided the trouble of storing raw IPs by creating a buffer array with a length of (_etext - _stext) and do buffer[IP - _stext]++ at each clock tick. In the user space, the readprofile command could read it from /proc/profile and tell us roughly how many ticks occurred in each kernel function with a map file. (IP - _stext) has a clear meaning here since it gives us an offset with respect to the start of the text segment. This gets tricky after the profile=kvm boot option was introduced (https://github.com/torvalds/linux/commit/07031e14) because (IP - _stext) is no longer meaningful. I think raw guest IPs are easy to consume by userspace tools. But we probably need to go with a different approach if we want to store raw guest IPs.