Re: [PATCH v2 07/54] perf: Add generic exclude_guest support

"Liang, Kan" <kan.liang@xxxxxxxxxxxxxxx> · Mon, 17 Jun 2024 09:34:15 -0400

On 2024-06-17 3:51 a.m., Peter Zijlstra wrote:
> On Thu, Jun 13, 2024 at 02:04:36PM -0400, Liang, Kan wrote:
>>>>  static enum event_type_t get_event_type(struct perf_event *event)
>>>> @@ -3340,9 +3388,14 @@ ctx_sched_out(struct perf_event_context
>>>>  	 * would only update time for the pinned events.
>>>>  	 */
>>>>  	if (is_active & EVENT_TIME) {
>>>> +		bool stop;
>>>> +
>>>> +		stop = !((ctx->is_active & event_type) & EVENT_ALL) &&
>>>> +		       ctx == &cpuctx->ctx;
>>>> +			
>>>>  		/* update (and stop) ctx time */
>>>>  		update_context_time(ctx);
>>>> -		update_cgrp_time_from_cpuctx(cpuctx, ctx == &cpuctx->ctx);
>>>> +		update_cgrp_time_from_cpuctx(cpuctx, stop);
>>
>> For the event_type == EVENT_GUEST, the "stop" should always be the same
>> as "ctx == &cpuctx->ctx". Because the ctx->is_active never set the
>> EVENT_GUEST bit.
>> Why the stop is introduced?
> 
> Because the ctx_sched_out() for vPMU should not stop time, 

But the implementation seems stop the time.

The ctx->is_active should be (EVENT_ALL | EVENT_TIME) for most of cases.

When a vPMU is scheduling in (invoke ctx_sched_out()), the event_type
should only be EVENT_GUEST.

!((ctx->is_active & event_type) & EVENT_ALL) should be TRUE.

For a CPU context, ctx == &cpuctx->ctx is TRUE as well.

The update_cgrp_time_from_cpuctx(cpuctx, TRUE) stops the time by
deactivate the cgroup, __store_release(&info->active, 0).

If an user try to read the cgroup events when a guest is running. The
update_cgrp_time_from_event() doesn't update the cgrp time. So both time
and counter are stopped.

> only the
> 'normal' sched-out should stop time.

If the guest is the only case which we want to keep the time for, I
think we may use a straightforward check as below.

	stop = !(event_type & EVENT_GUEST) && ctx == &cpuctx->ctx;

> 
> 
>>>> @@ -3949,6 +4015,8 @@ ctx_sched_in(struct perf_event_context *
>>>>  		return;
>>>>  
>>>>  	if (!(is_active & EVENT_TIME)) {
>>>> +		/* EVENT_TIME should be active while the guest runs */
>>>> +		WARN_ON_ONCE(event_type & EVENT_GUEST);
>>>>  		/* start ctx time */
>>>>  		__update_context_time(ctx, false);
>>>>  		perf_cgroup_set_timestamp(cpuctx);
>>>> @@ -3979,8 +4047,11 @@ ctx_sched_in(struct perf_event_context *
>>>>  		 * the exclude_guest events.
>>>>  		 */
>>>>  		update_context_time(ctx);
>>>> -	} else
>>>> +		update_cgrp_time_from_cpuctx(cpuctx, false);
>>
>>
>> In the above ctx_sched_out(), the cgrp_time is stopped and the cgrp has
>> been set to inactive.
>> I think we need a perf_cgroup_set_timestamp(cpuctx) here to restart the
>> cgrp_time, Right?
> 
> So the idea was to not stop time when we schedule out for the vPMU, as
> per the above.
> 
>> Also, I think the cgrp_time is different from the normal ctx->time. When
>> a guest is running, there must be no cgroup. It's OK to disable the
>> cgrp_time. If so, I don't think we need to track the guest_time for the
>> cgrp.
> 
> Uh, the vCPU thread is/can-be part of a cgroup, and different guests
> part of different cgroups. The CPU wide 'guest' time is all time spend
> in guets, but the cgroup view of things might differ, depending on how
> the guets are arranged in cgroups, no?
> 
> As such, we need per cgroup guest tracking.

Got it.

Thanks,
Kan