On 2024-06-17 3:51 a.m., Peter Zijlstra wrote: > On Thu, Jun 13, 2024 at 02:04:36PM -0400, Liang, Kan wrote: >>>> static enum event_type_t get_event_type(struct perf_event *event) >>>> @@ -3340,9 +3388,14 @@ ctx_sched_out(struct perf_event_context >>>> * would only update time for the pinned events. >>>> */ >>>> if (is_active & EVENT_TIME) { >>>> + bool stop; >>>> + >>>> + stop = !((ctx->is_active & event_type) & EVENT_ALL) && >>>> + ctx == &cpuctx->ctx; >>>> + >>>> /* update (and stop) ctx time */ >>>> update_context_time(ctx); >>>> - update_cgrp_time_from_cpuctx(cpuctx, ctx == &cpuctx->ctx); >>>> + update_cgrp_time_from_cpuctx(cpuctx, stop); >> >> For the event_type == EVENT_GUEST, the "stop" should always be the same >> as "ctx == &cpuctx->ctx". Because the ctx->is_active never set the >> EVENT_GUEST bit. >> Why the stop is introduced? > > Because the ctx_sched_out() for vPMU should not stop time, But the implementation seems stop the time. The ctx->is_active should be (EVENT_ALL | EVENT_TIME) for most of cases. When a vPMU is scheduling in (invoke ctx_sched_out()), the event_type should only be EVENT_GUEST. !((ctx->is_active & event_type) & EVENT_ALL) should be TRUE. For a CPU context, ctx == &cpuctx->ctx is TRUE as well. The update_cgrp_time_from_cpuctx(cpuctx, TRUE) stops the time by deactivate the cgroup, __store_release(&info->active, 0). If an user try to read the cgroup events when a guest is running. The update_cgrp_time_from_event() doesn't update the cgrp time. So both time and counter are stopped. > only the > 'normal' sched-out should stop time. If the guest is the only case which we want to keep the time for, I think we may use a straightforward check as below. stop = !(event_type & EVENT_GUEST) && ctx == &cpuctx->ctx; > > >>>> @@ -3949,6 +4015,8 @@ ctx_sched_in(struct perf_event_context * >>>> return; >>>> >>>> if (!(is_active & EVENT_TIME)) { >>>> + /* EVENT_TIME should be active while the guest runs */ >>>> + WARN_ON_ONCE(event_type & EVENT_GUEST); >>>> /* start ctx time */ >>>> __update_context_time(ctx, false); >>>> perf_cgroup_set_timestamp(cpuctx); >>>> @@ -3979,8 +4047,11 @@ ctx_sched_in(struct perf_event_context * >>>> * the exclude_guest events. >>>> */ >>>> update_context_time(ctx); >>>> - } else >>>> + update_cgrp_time_from_cpuctx(cpuctx, false); >> >> >> In the above ctx_sched_out(), the cgrp_time is stopped and the cgrp has >> been set to inactive. >> I think we need a perf_cgroup_set_timestamp(cpuctx) here to restart the >> cgrp_time, Right? > > So the idea was to not stop time when we schedule out for the vPMU, as > per the above. > >> Also, I think the cgrp_time is different from the normal ctx->time. When >> a guest is running, there must be no cgroup. It's OK to disable the >> cgrp_time. If so, I don't think we need to track the guest_time for the >> cgrp. > > Uh, the vCPU thread is/can-be part of a cgroup, and different guests > part of different cgroups. The CPU wide 'guest' time is all time spend > in guets, but the cgroup view of things might differ, depending on how > the guets are arranged in cgroups, no? > > As such, we need per cgroup guest tracking. Got it. Thanks, Kan