On 22 January 2011 03:42, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote: > On Fri, Jan 21, 2011 at 06:41:58PM +0100, Vincent Guittot wrote: >> On 21 January 2011 17:44, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote: >> > On Fri, Jan 21, 2011 at 09:43:18AM +0100, Vincent Guittot wrote: >> >> On 20 January 2011 17:11, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote: >> >> > On Thu, Jan 20, 2011 at 09:25:54AM +0100, Vincent Guittot wrote: >> >> >> Please find below a new proposal for adding trace events for cpu hotplug. >> >> >> The goal is to measure the latency of each part (kernel, architecture) >> >> >> and also to trace the cpu hotplug activity with other power events. I >> >> >> have tested these traces events on an arm platform. >> >> >> >> >> >> Changes since previous version: >> >> >> -Use cpu_hotplug for trace name >> >> >> -Define traces for kernel core and arch parts only >> >> >> -Use DECLARE_EVENT_CLASS and DEFINE_EVENT >> >> >> -Use proper indentation >> >> >> >> >> >> Subject: [PATCH] cpu hotplug tracepoint >> >> >> >> >> >> this patch adds new events for cpu hotplug tracing >> >> >> * plug/unplug sequence >> >> >> * core and architecture latency measurements >> >> >> >> >> >> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx> >> >> >> --- >> >> >> include/trace/events/cpu_hotplug.h | 117 ++++++++++++++++++++++++++++++++++++ >> >> > >> >> > Note we can't apply new tracepoints if they are not inserted in the code. >> >> >> >> I agree, i just want to have 1st feedbacks on the tracepoint interface >> >> before providing a patch which inserts the trace in the code. >> >> >> >> > >> >> >> +DEFINE_EVENT(cpu_hotplug, cpu_hotplug_arch_wait_die_start, >> >> >> + >> >> >> + TP_PROTO(unsigned int cpuid), >> >> >> + >> >> >> + TP_ARGS(cpuid) >> >> >> +); >> >> >> + >> >> >> +DEFINE_EVENT(cpu_hotplug, cpu_hotplug_arch_wait_die_end, >> >> >> + >> >> >> + TP_PROTO(unsigned int cpuid), >> >> >> + >> >> >> + TP_ARGS(cpuid) >> >> >> +); >> >> > >> >> > What is wait die, compared to die for example? >> >> > >> >> >> >> The arch_wait_die is used to trace the process which waits for the cpu >> >> to die (__cpu_die) and the arch_die is used to trace when the cpu dies >> >> (cpu_die) >> > >> > I still can't find the difference. >> > >> > Having: >> > >> > trace_cpu_hotplug_arch_die_start(cpu) >> > __cpu_die(); >> > trace_cpu_hotplug_arch_die_end(cpu) >> > >> > Is not enough to get both the information that a cpu dies >> > and the time took to do so? >> > >> >> it's quite interesting to trace the cpu_die function because the cpu >> really dies in this one. > > Note in case of success, you have barely the same time between die and > wait_die, the difference will reside in some completion wait/polling, > noise, mostly. Probably most of the time unnoticeable and irrelevant. > OK, tracing only __cpu_die should be enough > Plus if you opt for this scheme, you need to put your die hook into > every architectures, while otherwise a simple trace_cpu_die_start() > trace_cpu_die_stop() pair around __cpu_die() call in the generic code > is enough. > >> The __cpu_die function can't return if the >> cpu fails to die in the very last step and then wake up. But this >> could be detected with some cpu_die traces. >> >> >> for a normal use case we have something like : >> cpu 0 enters __cpu_die >> cpu 1 enters cpu_die >> cpu1 acks that it is going to died >> cpu0 returns from __cpu_die >> >> if the cpu 1 fails to die at the very last step, we could have: >> cpu 0 enters __cpu_die >> cpu 1 enters cpu_idle --> cpu_die >> cpu1 leaves cpu_die because of some issues and comes back into cpu_idle. >> cpu0 returns from __cpu_die after a timeout or an error ack > > If it fails in the hardware level, you'll certainly notice in your > power profiling because a CPU is not supposed to take seconds to > die. Especially with a such visual tool like pytimechart, it will > be obvious. > > For the details, that's something that must be found in syslogs and > that's it. > > I don't think it's a good idea to handle such buggy and unexpected case at > the tracepoint level. You don't want to profile bugs, you want to debug them. > So it doesn't belong to this space IMHO. > >> Then, cpu_die traces can be used with power traces for profiling the >> cpu power state. May be, the power.h trace file is a better place for >> the cpu_die traces ? > > Hmm, this should probably stay inside the cpu hotplug tracepoint family, > this is where people will seek them in the first place. > -- To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html