Re: [PATCH v2 07/54] perf: Add generic exclude_guest support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 10, 2024 at 01:23:04PM -0400, Liang, Kan wrote:
> On 2024-05-07 4:58 a.m., Peter Zijlstra wrote:

> > Bah, this is a ton of copy-paste from the normal scheduling code with
> > random changes. Why ?
> > 
> > Why can't this use ctx_sched_{in,out}() ? Surely the whole
> > CAP_PASSTHROUGHT thing is but a flag away.
> >
> 
> Not just a flag. The time has to be updated as well, since the ctx->time
> is shared among PMUs. Perf shouldn't stop it while other PMUs is still
> running.

Obviously the original changelog didn't mention any of that.... :/

> A timeguest will be introduced to track the start time of a guest.
> The event->tstamp of an exclude_guest event should always keep
> ctx->timeguest while a guest is running.
> When a guest is leaving, update the event->tstamp to now, so the guest
> time can be deducted.
> 
> The below patch demonstrate how the timeguest works.
> (It's an incomplete patch. Just to show the idea.)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 22d3e56682c9..2134e6886e22 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -953,6 +953,7 @@ struct perf_event_context {
>  	u64				time;
>  	u64				timestamp;
>  	u64				timeoffset;
> +	u64				timeguest;
> 
>  	/*
>  	 * These fields let us detect when two contexts have both
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 14fd881e3e1d..2aed56671a24 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -690,12 +690,31 @@ __perf_update_times(struct perf_event *event, u64
> now, u64 *enabled, u64 *runnin
>  		*running += delta;
>  }
> 
> +static void perf_event_update_time_guest(struct perf_event *event)
> +{
> +	/*
> +	 * If a guest is running, use the timestamp while entering the guest.
> +	 * If the guest is leaving, reset the event timestamp.
> +	 */
> +	if (!__this_cpu_read(perf_in_guest))
> +		event->tstamp = event->ctx->time;
> +	else
> +		event->tstamp = event->ctx->timeguest;
> +}

This conditional seems inverted, without a good reason. Also, in another
thread you talk about some PMUs stopping time in a guest, while other
PMUs would keep ticking. I don't think the above captures that.

>  static void perf_event_update_time(struct perf_event *event)
>  {
> -	u64 now = perf_event_time(event);
> +	u64 now;
> +
> +	/* Never count the time of an active guest into an exclude_guest event. */
> +	if (event->ctx->timeguest &&
> +	    event->pmu->capabilities & PERF_PMU_CAP_PASSTHROUGH_VPMU)
> +		return perf_event_update_time_guest(event);

Urgh, weird split. The PMU check is here. Please just inline the above
here, this seems to be the sole caller anyway.

> 
> +	now = perf_event_time(event);
>  	__perf_update_times(event, now, &event->total_time_enabled,
>  					&event->total_time_running);
> +
>  	event->tstamp = now;
>  }
> 
> @@ -3398,7 +3417,14 @@ ctx_sched_out(struct perf_event_context *ctx,
> enum event_type_t event_type)
>  			cpuctx->task_ctx = NULL;
>  	}
> 
> -	is_active ^= ctx->is_active; /* changed bits */
> +	if (event_type & EVENT_GUEST) {

Patch doesn't introduce EVENT_GUEST, lost a hunk somewhere?

> +		/*
> +		 * Schedule out all !exclude_guest events of PMU
> +		 * with PERF_PMU_CAP_PASSTHROUGH_VPMU.
> +		 */
> +		is_active = EVENT_ALL;
> +	} else
> +		is_active ^= ctx->is_active; /* changed bits */
> 
>  	list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) {
>  		if (perf_skip_pmu_ctx(pmu_ctx, event_type))

> @@ -5894,14 +5933,18 @@ void perf_guest_enter(u32 guest_lvtpc)
>  	}
> 
>  	perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST);
> -	ctx_sched_out(&cpuctx->ctx, EVENT_ALL | EVENT_GUEST);
> +	ctx_sched_out(&cpuctx->ctx, EVENT_GUEST);
> +	/* Set the guest start time */
> +	cpuctx->ctx.timeguest = cpuctx->ctx.time;
>  	perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST);
>  	if (cpuctx->task_ctx) {
>  		perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST);
> -		task_ctx_sched_out(cpuctx->task_ctx, EVENT_ALL | EVENT_GUEST);
> +		task_ctx_sched_out(cpuctx->task_ctx, EVENT_GUEST);
> +		cpuctx->task_ctx->timeguest = cpuctx->task_ctx->time;
>  		perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST);
>  	}
> 
>  	__this_cpu_write(perf_in_guest, true);
> @@ -5925,14 +5968,17 @@ void perf_guest_exit(void)
> 
>  	__this_cpu_write(perf_in_guest, false);
> 
>  	perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST);
> -	ctx_sched_in(&cpuctx->ctx, EVENT_ALL | EVENT_GUEST);
> +	ctx_sched_in(&cpuctx->ctx, EVENT_GUEST);
> +	cpuctx->ctx.timeguest = 0;
>  	perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST);
>  	if (cpuctx->task_ctx) {
>  		perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST);
> -		ctx_sched_in(cpuctx->task_ctx, EVENT_ALL | EVENT_GUEST);
> +		ctx_sched_in(cpuctx->task_ctx, EVENT_GUEST);
> +		cpuctx->task_ctx->timeguest = 0;
>  		perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST);
>  	}

I'm thinking EVENT_GUEST should cause the ->timeguest updates, no point
in having them explicitly duplicated here, hmm?




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux