Michael Ellerman <mpe@xxxxxxxxxxxxxx> writes: > On Mon, 2015-04-20 at 14:32 +0300, Alexander Shishkin wrote: >> The problem with using cycle counter for NMI watchdog is that its >> frequency changes with the corresponding core's frequency. This means >> that, in particular, if the core frequency scales up, watchdog NMI will >> arrive more frequently than what user requested through watchdog_thresh >> and also increasing the probability of setting off the hardlockup detector, >> because the corresponding hrtimer will keep firing at the same intervals >> regardless of the core frequency. And, if the core can turbo to up to 2.5x >> its base frequency (and therefore TSC) [1], we'll have the hrtimer and NMI >> counter firing at the same frequency; anything that disables interrupts at >> an unfortunate moment can set off the hardlockup detector then. >> >> The proposed solution is to use reference cycle counter instead, which is >> guaranteed to run at the same frequency regardless of the cpu speed. This >> will also ensure that the watchdog_thresh timeout value is honored. >> >> One issue with the reference counter is that its frequency is not the same >> as that of tsc on older cpu models and therefore it needs calibration. >> >> This patch adds code to calibrate ref-cycles counter using an hrtimer and >> a perf counter, which runs in parallel with the rest of kernel init >> sequence. It also removes the architecture-specific hw_nmi_* code, in favor >> of calculating the NMI sample period in the architecture independent code, >> using the cpu_khz variable. I notice that powerpc just added >> HARDLOCKUP_DETECTOR support, so this change is relevant to them as well. > > Hi Alexander, Hi Michael, > Thanks for noticing that we're using this on powerpc now. > > We don't have PERF_COUNT_HW_REF_CPU_CYCLES defined, so this isn't going to work > for us as-is. If you don't define PERF_COUNT_HW_REF_CPU_CYCLES, the calibration event will fail to create, and the error path will fall back to PERF_COUNT_HW_REF_CPU_CYCLES (which is what you're using now, iiuc). So that bit shouldn't break. The immediate problem would be that my patch adds a dependency on cpu_khz, which powerpc doesn't seem to have, one way to fix that would be to set it from ppc_cpu_freq. > What we'd actually like is a way to use a different event entirely, a powerpc > specific one, which counts timebase ticks. So basically an arch callback to > setup the wd_hw_attr would work for us. Ok, that's one possibility too. Do I understand you correctly that this event's resolution is known upfront and it doesn't need calibration? > Whether you want to try and accomodate that as part of this patch is up to you. > If you don't, then I guess we'd just ask that this new logic is something we > can opt out of. I don't see why not take care of powerpc too while we're at it. Let me come up with something in the next iteration of this patch. Regards, -- Alex -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html