Re: [PATCH] PERF(kernel): Cleanup power events V2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Thomas Renninger <trenn@xxxxxxx> wrote:

> On Tuesday 26 October 2010 09:10:20 Ingo Molnar wrote:
> > 
> > * Thomas Renninger <trenn@xxxxxxx> wrote:
> > 
> > > Changes in V2:
> > >   - Introduce PWR_EVENT_EXIT instead of 0 to mark non-power state
> > >   - Use u32 instead of u64 for cpuid, state which is by far enough
> > > 
> > > New power trace events:
> > > power:processor_idle
> > > power:processor_frequency
> > > power:machine_suspend
> > > 
> > > 
> > > C-state/idle accounting events:
> > >   power:power_start
> > >   power:power_end
> > > are replaced with:
> > >   power:processor_idle
> > > 
> > > and
> > >   power:power_frequency
> > > is replaced with:
> > >   power:processor_frequency
> > 
> > Could you please name it power:cpu_idle and power:cpu_frequency instead, for 
> > shortness? We generally use 'cpu' in the kernel and for events.
> Sure.
> > 
> > > power:machine_suspend
> > 
> > How will future PCI (or other device) power saving tracepoints be called?
> > 
> > Might be more consistent to use:
> > 
> >   power:cpu_idle
> >   power:machine_idle
> >   power:device_idle
>
> device idle is not true. Those may be low power modes
> like reduced network throughput, reduced wlan range, the device
> needs not to be idle.
> Device power states is probably the most complex area, if such
> a thing gets introduced, it should makes sense to state
> the interface experimental for some time until a wider range of
> devices uses it (in contrast to these new ones
> which should not change that soon anymore...).

Ok.

> Also machine_idle may be true, but machine_suspend sounds more
> familiar and everyone immediately knows what the event is about.

Ok - fair enough.

> > > +#define PWR_EVENT_EXIT 0xFFFFFFFF
> > 
> > Shouldnt this be part of the POWER_ enum? (and you can write -1 there)
>
> No, below enum will vanish, but -1 is nicer.

When it vanishes what will replace it?

> ...
> 
> > Plus:
> > 
> > > +DECLARE_EVENT_CLASS(processor,
> > > +
> > > +	TP_PROTO(unsigned int state, unsigned int cpu_id),
> > > +
> > > +	TP_ARGS(state, cpu_id),
> > > +
> > > +	TP_STRUCT__entry(
> > > +		__field(	u32,		state		)
> > > +		__field(	u32,		cpu_id		)
> > 
> > Trace entries can carry a cpu_id of the current processor already. Can this cpu_id 
> > ever be different from that CPU id?
>
> Yes. A core's frequency can depend on another one which
> will get switched as well (by one command/MSR).
> Compare with commit 6f4f2723d08534fd4e407e1e.
> 
> This can theoretically also be the case for sleep states.
> Afaik such HW does not exist yet, but ACPI spec already
> provides interfaces to pass these dependency from BIOS to OS.
> -> We want a stable ABI and should be prepared for such stuff.
> 
> > > +	),
> > > +
> > > +	TP_fast_assign(
> > > +		__entry->state = state;
> > > +		__entry->cpu_id = cpu_id;
> > > +	),
> > > +
> > > +	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
> > > +		  (unsigned long)__entry->cpu_id)
> > > +);
> > > +
> > > +DEFINE_EVENT(processor, processor_idle,
> > > +
> > > +	TP_PROTO(unsigned int state, unsigned int cpu_id),
> > > +
> > > +	     TP_ARGS(state, cpu_id)
> > > +);
> > > +
> > > +#define PWR_EVENT_EXIT 0xFFFFFFFF
> > > +
> > > +DEFINE_EVENT(processor, processor_frequency,
> > > +
> > > +	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
> > > +
> > > +	TP_ARGS(frequency, cpu_id)
> > > +);
> > 
> > So, we have a 'state' field in the class, which is used as 'state' by the 
> > power::cpu_idle event, and as 'frequency' by the power::cpu_freq event?
>
> Yes, is this a problem?
>
> Definitions are a bit shorter having one power processor class.
> As "frequency" is stated in frequency event definition everything should
> be obvious and this one looks like the more elegant way to me.
>  
> > Are there any architectures that track frequency in Hz, not in kHz? If yes, might 
> > there ever be a need for the frequency value to be larger than 4.29 GHz? If yes, 
> > then it wont fit into u32.
>
> drivers/cpufreq subsystem is fixed to unsigned int (cmp. include/linux/cpufreq.h):
>         unsigned int            min;    /* in kHz */
>         unsigned int            max;    /* in kHz */
>         unsigned int            cur;    /* in kHz,
>         ...
> that should be fine.

ok, good - so we should be fine up to 4 THz CPUs.

> > Also, might there be a future need to express different types of frequencies? 
> > For example, should we decide to track turbo frequencies in Intel CPUs, how 
> > would that be expressed via these events? Are there any architectures and CPUs 
> > that somehow have some extra attribute to the frequency value?
>
> I wonder whether this ever can/will work in a sane way.
> Userspace can compare with:
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
> everything above is turbo. So I do not think it's ever needed.
> But adding an additional value at the end does not violate
> userspace compatibility. This has been done with the cpuid
> as well.
>  
> > > +TRACE_EVENT(machine_suspend,
> > > +
> > > +	TP_PROTO(unsigned int state),
> > > +
> > > +	TP_ARGS(state),
> > > +
> > > +	TP_STRUCT__entry(
> > > +		__field(	u32,		state		)
> > > +	),
> > 
> > Hm, this event is not used anywhere in the submitted patches. Where is the patch 
> > that adds usage, and what are the possible values for 'state'?
>
> Jean wants to make use of it on ARM.
> I also had patch for x86, I can have another look at it, Rafael
> already gave me a comment on it. But on X86 you typically realize
> when you suspend the machine (could imagine this is more useful on
> ARM driven mobile phones and similar), still I can add it..
> 
> Values probably should be (include/linux/suspend.h):
> #define PM_SUSPEND_ON       0
> #define PM_SUSPEND_STANDBY  1
> #define PM_SUSPEND_MEM      3
> #define PM_SUSPEND_MAX      4
> 
> How this strange state Arjan talked about is passed is up
> to these guys. Instead of using 0 and above pre-defined such
> arch specific special states better should get passed by:
> 
> #define X86_MOORESTOWN_STANDBY_S0   0x100
> ..                                  0x101
> #define ARM_WHATEVER_STRANGE_THING  0x200
> ...

I'd rather like to see a meaningful name to be given to these states and them being 
passed, instead of weird platform specific things. Tooling will try to be as generic 
as possible.

I dont know this stuff, but making a distinction between s2ram and s2disk events 
would seem meaningful.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-trace-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux USB Development]     [Linux USB Development]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux