Re: [PATCH] watchdog: Use a reference cycle counter to avoid scaling issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Michael Ellerman <mpe@xxxxxxxxxxxxxx> writes:

> On Mon, 2015-04-20 at 14:32 +0300, Alexander Shishkin wrote:
>> The problem with using cycle counter for NMI watchdog is that its
>> frequency changes with the corresponding core's frequency. This means
>> that, in particular, if the core frequency scales up, watchdog NMI will
>> arrive more frequently than what user requested through watchdog_thresh
>> and also increasing the probability of setting off the hardlockup detector,
>> because the corresponding hrtimer will keep firing at the same intervals
>> regardless of the core frequency. And, if the core can turbo to up to 2.5x
>> its base frequency (and therefore TSC) [1], we'll have the hrtimer and NMI
>> counter firing at the same frequency; anything that disables interrupts at
>> an unfortunate moment can set off the hardlockup detector then.
>> 
>> The proposed solution is to use reference cycle counter instead, which is
>> guaranteed to run at the same frequency regardless of the cpu speed. This
>> will also ensure that the watchdog_thresh timeout value is honored.
>> 
>> One issue with the reference counter is that its frequency is not the same
>> as that of tsc on older cpu models and therefore it needs calibration.
>> 
>> This patch adds code to calibrate ref-cycles counter using an hrtimer and
>> a perf counter, which runs in parallel with the rest of kernel init
>> sequence. It also removes the architecture-specific hw_nmi_* code, in favor
>> of calculating the NMI sample period in the architecture independent code,
>> using the cpu_khz variable. I notice that powerpc just added
>> HARDLOCKUP_DETECTOR support, so this change is relevant to them as well.
>
> Hi Alexander,

Hi Michael,

> Thanks for noticing that we're using this on powerpc now.
>
> We don't have PERF_COUNT_HW_REF_CPU_CYCLES defined, so this isn't going to work
> for us as-is.

If you don't define PERF_COUNT_HW_REF_CPU_CYCLES, the calibration event
will fail to create, and the error path will fall back to
PERF_COUNT_HW_REF_CPU_CYCLES (which is what you're using now, iiuc). So
that bit shouldn't break. The immediate problem would be that my patch
adds a dependency on cpu_khz, which powerpc doesn't seem to have, one
way to fix that would be to set it from ppc_cpu_freq.

> What we'd actually like is a way to use a different event entirely, a powerpc
> specific one, which counts timebase ticks. So basically an arch callback to
> setup the wd_hw_attr would work for us.

Ok, that's one possibility too. Do I understand you correctly that this
event's resolution is known upfront and it doesn't need calibration?

> Whether you want to try and accomodate that as part of this patch is up to you.
> If you don't, then I guess we'd just ask that this new logic is something we
> can opt out of.

I don't see why not take care of powerpc too while we're at it.

Let me come up with something in the next iteration of this patch.

Regards,
--
Alex
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]