On 17/02/2025 22:22, Ahmad Fatoum wrote:
Hello Matti,
On 22.01.25 12:28, Matti Vaittinen wrote:
On 13/01/2025 18:25, Ahmad Fatoum wrote:
hw_protection_shutdown() will kick off an orderly shutdown and if that
takes longer than a configurable amount of time, an emergency shutdown
will occur.
Recently, hw_protection_reboot() was added for those systems that don't
implement a proper shutdown and are better served by rebooting and
having the boot firmware worry about doing something about the critical
condition.
On timeout of the orderly reboot of hw_protection_reboot(), the system
would go into shutdown, instead of reboot. This is not a good idea, as
going into shutdown was explicitly not asked for.
Fix this by always doing an emergency reboot if hw_protection_reboot()
is called and the orderly reboot takes too long.
Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()")
Signed-off-by: Ahmad Fatoum <a.fatoum@xxxxxxxxxxxxxx>
---
kernel/reboot.c | 70 ++++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 49 insertions(+), 21 deletions(-)
diff --git a/kernel/reboot.c b/kernel/reboot.c
index 847ac5d17a659981c6765699eac323f5e87f48c1..222b63dfd31020d0e2bc1b1402dbfa82adc71990 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -932,48 +932,76 @@ void orderly_reboot(void)
}
EXPORT_SYMBOL_GPL(orderly_reboot);
+static const char *hw_protection_action_str(enum hw_protection_action action)
+{
+ switch (action) {
+ case HWPROT_ACT_SHUTDOWN:
+ return "shutdown";
+ case HWPROT_ACT_REBOOT:
+ return "reboot";
+ default:
+ return "undefined";
+ }
+}
+
+static enum hw_protection_action hw_failure_emergency_action;
nit: Do we have a (theoretical) possibility that two emergency restarts get scheduled with different actions? Should the action be allocated (maybe not) for each caller, or should there be a check if an operation with conflicting action is already scheduled?
If this was already considered and thought it is not an issue:
Reviewed-by: Matti Vaittinen <mazziesaccount@xxxxxxxxx>
__hw_protection_trigger (née __hw_protection_shutdown) has this at its start:
static atomic_t allow_proceed = ATOMIC_INIT(1);
/* Shutdown should be initiated only once. */
if (!atomic_dec_and_test(&allow_proceed))
return;
It's thus not possible to have a later emergency restart race against the first.
Ah, indeed. I missed this. Thanks for the clarification! :)
Yours,
-- Matti