On Thu, Mar 20, 2025 at 02:40:09AM +0000, Pasha Tatashin wrote: > Introduces the Live Update Orchestrator (LUO), a new kernel subsystem > designed to facilitate live updates. Live update is a method to reboot > the kernel while attempting to keep selected devices alive across the > reboot boundary, minimizing downtime. > > The primary use case is cloud environments, allowing hypervisor updates > without fully disrupting running virtual machines. VMs can be suspended > while the hypervisor kernel reboots, and devices attached to these VM > are kept operational by the LUO. > > Features introduced: > > - Core orchestration logic for managing the live update process. > - A state machine (NORMAL, PREPARED, UPDATED, *_FAILED) to track > the progress of live updates. > - Notifier chains for subsystems (device layer, interrupts, KVM, IOMMU, > etc.) to register callbacks for different live update events: > - LIVEUPDATE_PREPARE: Prepare for reboot (before blackout). > - LIVEUPDATE_REBOOT: Final serialization before kexec (blackout). > - LIVEUPDATE_FINISH: Cleanup after update (after blackout). > - LIVEUPDATE_CANCEL: Rollback actions on failure or user request. > - A sysfs interface (/sys/kernel/liveupdate/) for user-space control: > - `prepare`: Initiate preparation (write 1) or reset (write 0). > - `finish`: Finalize update in new kernel (write 1). > - `cancel`: Abort ongoing preparation or reboot (write 1). > - `reset`: Force state back to normal (write 1). > - `state`: Read-only view of the current LUO state. > - `enabled`: Read-only view of whether live update is enabled. > - Integration with KHO to pass orchestrator state to the new kernel. > - Version checking during startup of the new kernel to ensure > compatibility with the previous kernel's live update state. > > This infrastructure allows various kernel subsystems to coordinate and > participate in the live update process, serializing and restoring device > state across a kernel reboot. ... > +Date: March 2025 > +KernelVersion: 6.14.0 This is way too optimistic, it even won't make v6.15. And date can be chosen either v6.16-rc1 or v6.16 release in accordance with prediction tool ... > +#ifndef _LINUX_LIVEUPDATE_H > +#define _LINUX_LIVEUPDATE_H > +#include <linux/compiler.h> > +#include <linux/notifier.h> This is semi-random list of inclusions. Try to follow IWYU principle. See below. ... > +bool liveupdate_state_updated(void); Where bool is defined? ... > +/** > + * LIVEUPDATE_DECLARE_NOTIFIER - Declare a live update notifier with default > + * structure. > + * @_name: A base name used to generate the names of the notifier block > + * (e.g., ``_name##_liveupdate_notifier_block``) and the callback function > + * (e.g., ``_name##_liveupdate``). > + * @_priority: The priority of the notifier, specified using the > + * ``enum liveupdate_cb_priority`` values > + * (e.g., ``LIVEUPDATE_CB_PRIO_BEFORE_DEVICES``). > + * > + * This macro declares a static struct notifier_block and a corresponding > + * notifier callback function for use with the live update orchestrator. > + * It simplifies the process by automatically handling the dispatching of > + * live update events to separate handler functions for prepare, reboot, > + * finish, and cancel. > + * > + * This macro expects the following functions to be defined: > + * > + * ``_name##_liveupdate_prepare()``: Called on LIVEUPDATE_PREPARE. > + * ``_name##_liveupdate_reboot()``: Called on LIVEUPDATE_REBOOT. > + * ``_name##_liveupdate_finish()``: Called on LIVEUPDATE_FINISH. > + * ``_name##_liveupdate_cancel()``: Called on LIVEUPDATE_CANCEL. > + * > + * The generated callback function handles the switch statement for the > + * different live update events and calls the appropriate handler function. > + * It also includes warnings if the finish or cancel handlers return an error. > + * > + * For example, declartion can look like this: > + * > + * ``static int foo_liveupdate_prepare(void) { ... }`` > + * > + * ``static int foo_liveupdate_reboot(void) { ... }`` > + * > + * ``static int foo_liveupdate_finish(void) { ... }`` > + * > + * ``static int foo_liveupdate_cancel(void) { ... }`` > + * > + * ``LIVEUPDATE_DECLARE_NOTIFIER(foo, LIVEUPDATE_CB_PRIO_WITH_DEVICES);`` > + * Hmm... Have you run kernel-doc validator? There is missing Return section and it will warn about that. > + */ ... > + WARN_ONCE(rv, "cancel failed[%d]\n", rv); \ + bug.h ... > + #undef pr_fmt > + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt Leftover from the development? > +#undef pr_fmt Not needed as long as pr_fmt9) is at the top of the file. > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt ... > +#include <linux/kernel.h> What for? Can you follow IWYU, please? Here again semi-random list of inclusions. > +#include <linux/sysfs.h> > +#include <linux/string.h> > +#include <linux/rwsem.h> > +#include <linux/err.h> > +#include <linux/liveupdate.h> > +#include <linux/cpufreq.h> > +#include <linux/kexec_handover.h> Can you keep them ordered which will be easier to read and maintain? ... > +static int __init early_liveupdate_param(char *buf) > +{ > + return kstrtobool(buf, &luo_enabled); > +} > + Redundant blank line. > +early_param("liveupdate", early_liveupdate_param); ... > +/* Show the current live update state */ > +static ssize_t state_show(struct kobject *kobj, > + struct kobj_attribute *attr, It is still enough room even for the strict 80 limit case. > + char *buf) > +{ > + return sysfs_emit(buf, "%s\n", LUO_STATE_STR); > +} ... > + ret = blocking_notifier_call_chain(&luo_notify_list, > + event, > + NULL); There is room on the previous lines. Ditto for the similar cases all over the code. ... > +{ > + int ret; > + > + if (down_write_killable(&luo_state_rwsem)) { > + pr_warn(" %s, change state canceled by user\n", __func__); Why __func__ is so important in _this_ message? And why leading space? Ditto for the similar cases. > + return -EAGAIN; > + } > + > + if (!IS_STATE(LIVEUPDATE_STATE_NORMAL)) { > + pr_warn("Can't switch to [%s] from [%s] state\n", > + luo_state_str[LIVEUPDATE_STATE_PREPARED], > + LUO_STATE_STR); > + up_write(&luo_state_rwsem); > + > + return -EINVAL; > + } > + > + ret = luo_notify(LIVEUPDATE_PREPARE); > + if (!ret) > + luo_set_state(LIVEUPDATE_STATE_PREPARED); > + > + up_write(&luo_state_rwsem); > + > + return ret; > +} ... > +static ssize_t prepare_store(struct kobject *kobj, > + struct kobj_attribute *attr, > + const char *buf, > + size_t count) > +{ > + ssize_t ret; > + long val; > + if (kstrtol(buf, 0, &val) < 0) > + return -EINVAL; Shadower error code. > + if (val != 1 && val != 0) > + return -EINVAL; What's wrong with using kstrtobool() from the beginning? > + > + if (val) > + ret = luo_prepare(); > + else > + ret = luo_cancel(); > + if (!ret) > + ret = count; > + > + return ret; Can we go with the usual pattern "check for errors first"? if (ret) return ret; ... > +} ... > +static ssize_t finish_store(struct kobject *kobj, > + struct kobj_attribute *attr, > + const char *buf, > + size_t count) Same comments as per above. ... > +static struct attribute *luo_attrs[] = { > + &state_attribute.attr, > + &prepare_attribute.attr, > + &finish_attribute.attr, > + NULL, No comma for the terminator entry. > +}; ... > +static int __init luo_init(void) > +{ > + int ret; > + > + if (!luo_enabled || !kho_is_enabled()) { > + pr_info("disabled by user\n"); > + luo_enabled = false; > + > + return 0; > + } Can be written like if (!kho_is_enabled()) luo_enabled = false; if (!luo_enabled) { pr_info("disabled by user\n"); return 0; } > + ret = sysfs_create_group(kernel_kobj, &luo_attr_group); > + if (ret) > + pr_err("Failed to create group\n"); > + > + luo_sysfs_initialized = true; > + pr_info("Initialized\n"); > + > + return ret; Something is odd here between (non-)checking for errors and printed messages. > +} ... > +EXPORT_SYMBOL_GPL(liveupdate_state_normal); No namespace? ... > --- a/kernel/reboot.c > +++ b/kernel/reboot.c > @@ -18,6 +18,7 @@ > #include <linux/syscalls.h> > #include <linux/syscore_ops.h> > #include <linux/uaccess.h> > +#include <linux/liveupdate.h> Can oyu preserve order (with given context at least)? -- With Best Regards, Andy Shevchenko