Hi-- On 7/12/22 07:32, Laurent Dufour wrote: > During a LPM, while the memory transfer is in progress on the arrival side, > some latencies is generated when accessing not yet transferred pages on the are > arrival side. Thus, the NMI watchdog may be triggered too frequently, which > increases the risk to hit a NMI interrupt in a bad place in the kernel, an NMI > leading to a kernel panic. > > Disabling the Hard Lockup Watchdog until the memory transfer could be a too > strong work around, some users would want this timeout to be eventually > triggered if the system is hanging even during LPM. > > Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply > a factor to the NMI watchdog timeout during a LPM. Just before the CPU are an LPM. the CPU is > stopped for the switchover sequence, the NMI watchdog timer is set to > watchdog_tresh + factor% watchdog_thresh > > A value of 0 has no effect. The default value is 200, meaning that the NMI > watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value). watchdog_thresh > Once the memory transfer is achieved, the factor is reset to 0. > > Setting this value to a high number is like disabling the NMI watchdog > during a LPM. an LPM. > > Reviewed-by: Nicholas Piggin <npiggin@xxxxxxxxx> > Signed-off-by: Laurent Dufour <ldufour@xxxxxxxxxxxxx> > --- > Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++ > arch/powerpc/platforms/pseries/mobility.c | 43 +++++++++++++++++++++ > 2 files changed, 55 insertions(+) > > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst > index ddccd1077462..0bb0b7f27e96 100644 > --- a/Documentation/admin-guide/sysctl/kernel.rst > +++ b/Documentation/admin-guide/sysctl/kernel.rst > @@ -592,6 +592,18 @@ to the guest kernel command line (see > Documentation/admin-guide/kernel-parameters.rst). > This entire block should be in kernel-parameters.txt, not .rst, and it should be formatted like everything else in the .txt file. > > +nmi_watchdog_factor (PPC only) > +================================== > + > +Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is Factor to apply to the NMI > +set to 1). This factor represents the percentage added to > +``watchdog_thresh`` when calculating the NMI watchdog timeout during a during an > +LPM. The soft lockup timeout is not impacted. > + > +A value of 0 means no change. The default value is 200 meaning the NMI > +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10). > + > + > numa_balancing > ============== > -- ~Randy