Le 12/07/2022 à 11:47, Laurent Dufour a écrit : > Le 12/07/2022 à 03:46, Nicholas Piggin a écrit : >> Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm: >>> During a LPM, while the memory transfer is in progress on the arrival side, >>> some latencies is generated when accessing not yet transferred pages on the >>> arrival side. Thus, the NMI watchdog may be triggered too frequently, which >>> increases the risk to hit a NMI interrupt in a bad place in the kernel, >>> leading to a kernel panic. >>> >>> Disabling the Hard Lockup Watchdog until the memory transfer could be a too >>> strong work around, some users would want this timeout to be eventually >>> triggered if the system is hanging even during LPM. >>> >>> Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply >>> a factor to the NMI watchdog timeout during a LPM. Just before the CPU are >>> stopped for the switchover sequence, the NMI watchdog timer is set to >>> watchdog_tresh + factor% >>> >>> A value of 0 has no effect. The default value is 200, meaning that the NMI >>> watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value). >>> Once the memory transfer is achieved, the factor is reset to 0. >>> >>> Setting this value to a high number is like disabling the NMI watchdog >>> during a LPM. >>> >>> Signed-off-by: Laurent Dufour <ldufour@xxxxxxxxxxxxx> >>> --- >>> Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++ >>> arch/powerpc/platforms/pseries/mobility.c | 43 +++++++++++++++++++++ >>> 2 files changed, 55 insertions(+) >>> >>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst >>> index ddccd1077462..0bb0b7f27e96 100644 >>> --- a/Documentation/admin-guide/sysctl/kernel.rst >>> +++ b/Documentation/admin-guide/sysctl/kernel.rst >>> @@ -592,6 +592,18 @@ to the guest kernel command line (see >>> Documentation/admin-guide/kernel-parameters.rst). >>> >>> >>> +nmi_watchdog_factor (PPC only) >>> +================================== >>> + >>> +Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is >>> +set to 1). This factor represents the percentage added to >>> +``watchdog_thresh`` when calculating the NMI watchdog timeout during a >>> +LPM. The soft lockup timeout is not impacted. >> >> Could "LPM" or "mobility" be a bit more prominent in the parameter name >> and documentation? Something else might want to add a factor as well, >> one day. > > In the V2 version, Nathan suggested "making the user-visible > name more generic (e.g. "nmi_watchdog_factor") in case it makes sense to > apply this to other contexts in the future." > > So I made the change to a more generic name. I think this is a good option > since the documentation is explicit about the LPM particular case. > If in the future this factor needs to apply during an other operation that > name will be generic enough. > > Do you agree ? Nick and I discussed that. Nick prefers to have LPM in the tunable names, and thinks we can add a new tunable if a separate user came up which required it. We agree that 'nmi_wd_lpm_factor' is a good name. I'll send a v5 updating that name. >> >> Otherwise the code looks okay. >> >> Reviewed-by: Nicholas Piggin <npiggin@xxxxxxxxx> >> >>> + >>> +A value of 0 means no change. The default value is 200 meaning the NMI >>> +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10). >>> + >>> + >>> numa_balancing >>> ============== >>> >>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c >>> index 907a779074d6..649155faafc2 100644 >>> --- a/arch/powerpc/platforms/pseries/mobility.c >>> +++ b/arch/powerpc/platforms/pseries/mobility.c >>> @@ -48,6 +48,39 @@ struct update_props_workarea { >>> #define MIGRATION_SCOPE (1) >>> #define PRRN_SCOPE -2 >>> >>> +#ifdef CONFIG_PPC_WATCHDOG >>> +static unsigned int nmi_wd_factor = 200; >>> + >>> +#ifdef CONFIG_SYSCTL >>> +static struct ctl_table nmi_wd_factor_ctl_table[] = { >>> + { >>> + .procname = "nmi_watchdog_factor", >>> + .data = &nmi_wd_factor, >>> + .maxlen = sizeof(int), >>> + .mode = 0644, >>> + .proc_handler = proc_douintvec_minmax, >>> + }, >>> + {} >>> +}; >>> +static struct ctl_table nmi_wd_factor_sysctl_root[] = { >>> + { >>> + .procname = "kernel", >>> + .mode = 0555, >>> + .child = nmi_wd_factor_ctl_table, >>> + }, >>> + {} >>> +}; >>> + >>> +static int __init register_nmi_wd_factor_sysctl(void) >>> +{ >>> + register_sysctl_table(nmi_wd_factor_sysctl_root); >>> + >>> + return 0; >>> +} >>> +device_initcall(register_nmi_wd_factor_sysctl); >>> +#endif /* CONFIG_SYSCTL */ >>> +#endif /* CONFIG_PPC_WATCHDOG */ >>> + >>> static int mobility_rtas_call(int token, char *buf, s32 scope) >>> { >>> int rc; >>> @@ -702,13 +735,20 @@ static int pseries_suspend(u64 handle) >>> static int pseries_migrate_partition(u64 handle) >>> { >>> int ret; >>> + unsigned int factor = 0; >>> >>> +#ifdef CONFIG_PPC_WATCHDOG >>> + factor = nmi_wd_factor; >>> +#endif >>> ret = wait_for_vasi_session_suspending(handle); >>> if (ret) >>> return ret; >>> >>> vas_migration_handler(VAS_SUSPEND); >>> >>> + if (factor) >>> + watchdog_nmi_set_lpm_factor(factor); >>> + >>> ret = pseries_suspend(handle); >>> if (ret == 0) { >>> post_mobility_fixup(); >>> @@ -716,6 +756,9 @@ static int pseries_migrate_partition(u64 handle) >>> } else >>> pseries_cancel_migration(handle, ret); >>> >>> + if (factor) >>> + watchdog_nmi_set_lpm_factor(0); >>> + >>> vas_migration_handler(VAS_RESUME); >>> >>> return ret; >>> -- >>> 2.36.1 >>> >>> >