On Mon, Nov 05, 2018 at 07:19:01PM +0800, Wei Wang wrote: > On 11/05/2018 05:34 PM, Peter Zijlstra wrote: > > On Fri, Nov 02, 2018 at 05:08:31PM +0800, Wei Wang wrote: > > > On 11/01/2018 10:52 PM, Peter Zijlstra wrote: > > > > > @@ -723,6 +724,9 @@ static void perf_sched_init(struct perf_sched *sched, struct event_constraint ** > > > > > sched->max_weight = wmax; > > > > > sched->max_gp = gpmax; > > > > > sched->constraints = constraints; > > > > > +#ifdef CONFIG_CPU_SUP_INTEL > > > > > + sched->state.used[0] = cpuc->intel_ctrl_guest_mask; > > > > > +#endif > > > > NAK. This completely undermines the whole purpose of event scheduling. > > > > > > > Hi Peter, > > > > > > Could you share more details how it would affect the host side event > > > scheduling? > > Not all counters are equal; suppose you have one of those chips that can > > only do PEBS on counter 0, and then hand out 0 to the guest for some > > silly event. That means nobody can use PEBS anymore. > > Thanks for sharing your point. > > In this example (assume PEBS can only work with counter 0), how would the > existing approach (i.e. using host event to emulate) work? > For example, guest wants to use PEBS, host also wants to use PEBS or other > features that only counter 0 fits, I think either guest or host will not > work then. The answer for PEBS is really simple; PEBS does not virtualize (Andi tried and can tell you why; IIRC it has something to do with how the hardware asks for a Linear Address instead of a Physical Address). So the problem will not arrise. But there are certainly constrained events that will result in the same problem. The traditional approach of perf on resource contention is to share it; you get only partial runtime and can scale up the events given the runtime metrics provided. We also have perf_event_attr::pinned, which is normally only available to root, in which case we'll end up marking any contending event to an error state. Neither are ideal for MSR level emulation. > With the register level virtualization approach, we could further support > that case: if guest requests to use a counter which host happens to be > using, we can let host and guest both be satisfied by supporting counter > context switching on guest/host switching. In this case, both guest and host > can use counter 0. (I think this is actually a policy selection, the current > series chooses to be guest first, we can further change it if necessary) That can only work if the host counter has perf_event_attr::exclude_guest=1, any counter without that must also count when the guest is running. (and, IIRC, normal perf tool events do not have that set by default) > > > Would you have any suggestions? > > I would suggest not to use virt in the first place of course ;-) > > > > But whatever you do; you have to keep using host events to emulate the > > guest PMU. That doesn't mean you can't improve things; that code is > > quite insane from what you told earlier. > > I agree that the host event emulation is a functional approach, but it may > not be an effective one (also got complaints from people about today's perf > in the guest). > We actually have similar problems when doing network virtualization. The > more effective approach tends to be the one that bypasses the host network > stack. Both the network stack and perf stack seem to be too heavy to be used > as part of the emulation. The thing is; you cannot do blind pass-through of the PMU, some of its features simply do not work in a guest. Also, the host perf driver expects certain functionality that must be respected. Those are the constraints you have to work with. Back when we all started down this virt rathole, I proposed people do paravirt perf, where events would be handed to the host kernel and let the host kernel do its normal thing. But people wanted to do the MSR based thing because of !linux guests. Now I don't care about virt much, but I care about !linux guests even less.