On Fri, Jul 16, 2021 at 12:00 PM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote: > > > > On 7/16/2021 1:02 PM, Jim Mattson wrote: > > On Fri, Jul 16, 2021 at 1:54 AM Zhu Lingshan <lingshan.zhu@xxxxxxxxx> wrote: > >> > >> The guest Precise Event Based Sampling (PEBS) feature can provide an > >> architectural state of the instruction executed after the guest instruction > >> that exactly caused the event. It needs new hardware facility only available > >> on Intel Ice Lake Server platforms. This patch set enables the basic PEBS > >> feature for KVM guests on ICX. > >> > >> We can use PEBS feature on the Linux guest like native: > >> > >> # echo 0 > /proc/sys/kernel/watchdog (on the host) > >> # perf record -e instructions:ppp ./br_instr a > >> # perf record -c 100000 -e instructions:pp ./br_instr a > >> > >> To emulate guest PEBS facility for the above perf usages, > >> we need to implement 2 code paths: > >> > >> 1) Fast path > >> > >> This is when the host assigned physical PMC has an identical index as the > >> virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0). > >> This path is used in most common use cases. > >> > >> 2) Slow path > >> > >> This is when the host assigned physical PMC has a different index from the > >> virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0) In this case, > >> KVM needs to rewrite the PEBS records to change the applicable counter indexes > >> to the virtual PMC indexes, which would otherwise contain the physical counter > >> index written by PEBS facility, and switch the counter reset values to the > >> offset corresponding to the physical counter indexes in the DS data structure. > >> > >> The previous version [0] enables both fast path and slow path, which seems > >> a bit more complex as the first step. In this patchset, we want to start with > >> the fast path to get the basic guest PEBS enabled while keeping the slow path > >> disabled. More focused discussion on the slow path [1] is planned to be put to > >> another patchset in the next step. > >> > >> Compared to later versions in subsequent steps, the functionality to support > >> host-guest PEBS both enabled and the functionality to emulate guest PEBS when > >> the counter is cross-mapped are missing in this patch set > >> (neither of these are typical scenarios). > > > > I'm not sure exactly what scenarios you're ruling out here. In our > > environment, we always have to be able to support host-level > > profiling, whether or not the guest is using the PMU (for PEBS or > > anything else). Hence, for our *basic* vPMU offering, we only expose > > two general purpose counters to the guest, so that we can keep two > > general purpose counters for the host. In this scenario, I would > > expect cross-mapped counters to be common. Are we going to be able to > > use this implementation? > > > > Let's say we have 4 GP counters in HW. > Do you mean that the host owns 2 GP counters (counter 0 & 1) and the > guest own the other 2 GP counters (counter 2 & 3) in your envirinment? > We did a similar implementation in V1, but the proposal has been denied. > https://lore.kernel.org/kvm/20200306135317.GD12561@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ It's the other way around. AFAIK, there is no architectural way to specify that only counters 2 and 3 are available, so we have to give the guest counters 0 and 1. > For the current proposal, both guest and host can see all 4 GP counters. > The counters are shared. I don't understand how that can work. If the host programs two counters, how can you give the guest four counters? > The guest cannot know the availability of the counters. It may requires > a counter (e.g., counter 0) which may has been used by the host. Host > may provides another counter (e.g., counter 1) to the guest. This is the > case described in the slow path. For this case, we have to modify the > guest PEBS record. Because the counter index in the PEBS record is 1, > while the guest perf driver expects 0. If we reserve counters 0 and 1 for the guest, this is not a problem (assuming we tell the guest it only has two counters). If we don't statically partition the counters, I don't see how you can ensure that the guest behaves as architected. For example, what do you do when the guest programs four counters and the host programs two? > If counter 0 is available, guests can use counter 0. That's the fast > path. I think the fast path should be more common even both host and > guest are profiling. Because except for some specific events, we may > move the host event to the counters which are not required by guest if > we have enough resources. And if you don't have enough resources?