Re: [RFC v1 1/3] luo: Live Update Orchestrator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 20, 2025 at 02:40:09AM +0000, Pasha Tatashin wrote:
> Introduces the Live Update Orchestrator (LUO), a new kernel subsystem
> designed to facilitate live updates. Live update is a method to reboot
> the kernel while attempting to keep selected devices alive across the
> reboot boundary, minimizing downtime.
> 
> The primary use case is cloud environments, allowing hypervisor updates
> without fully disrupting running virtual machines. VMs can be suspended
> while the hypervisor kernel reboots, and devices attached to these VM
> are kept operational by the LUO.
> 
> Features introduced:
> 
> - Core orchestration logic for managing the live update process.
> - A state machine (NORMAL, PREPARED, UPDATED, *_FAILED) to track
>   the progress of live updates.
> - Notifier chains for subsystems (device layer, interrupts, KVM, IOMMU,
>   etc.) to register callbacks for different live update events:
>     - LIVEUPDATE_PREPARE: Prepare for reboot (before blackout).
>     - LIVEUPDATE_REBOOT: Final serialization before kexec (blackout).
>     - LIVEUPDATE_FINISH: Cleanup after update (after blackout).
>     - LIVEUPDATE_CANCEL: Rollback actions on failure or user request.
> - A sysfs interface (/sys/kernel/liveupdate/) for user-space control:
>     - `prepare`: Initiate preparation (write 1) or reset (write 0).
>     - `finish`: Finalize update in new kernel (write 1).
>     - `cancel`: Abort ongoing preparation or reboot (write 1).
>     - `reset`: Force state back to normal (write 1).
>     - `state`: Read-only view of the current LUO state.
>     - `enabled`: Read-only view of whether live update is enabled.
> - Integration with KHO to pass orchestrator state to the new kernel.
> - Version checking during startup of the new kernel to ensure
>   compatibility with the previous kernel's live update state.
> 
> This infrastructure allows various kernel subsystems to coordinate and
> participate in the live update process, serializing and restoring device
> state across a kernel reboot.

...

> +Date:		March 2025
> +KernelVersion:	6.14.0

This is way too optimistic, it even won't make v6.15.
And date can be chosen either v6.16-rc1 or v6.16 release
in accordance with prediction tool

...

> +#ifndef _LINUX_LIVEUPDATE_H
> +#define _LINUX_LIVEUPDATE_H

> +#include <linux/compiler.h>
> +#include <linux/notifier.h>

This is semi-random list of inclusions. Try to follow IWYU principle.
See below.

...

> +bool liveupdate_state_updated(void);

Where bool is defined?

...

> +/**
> + * LIVEUPDATE_DECLARE_NOTIFIER - Declare a live update notifier with default
> + * structure.
> + * @_name: A base name used to generate the names of the notifier block
> + * (e.g., ``_name##_liveupdate_notifier_block``) and the callback function
> + * (e.g., ``_name##_liveupdate``).
> + * @_priority: The priority of the notifier, specified using the
> + * ``enum liveupdate_cb_priority`` values
> + * (e.g., ``LIVEUPDATE_CB_PRIO_BEFORE_DEVICES``).
> + *
> + * This macro declares a static struct notifier_block and a corresponding
> + * notifier callback function for use with the live update orchestrator.
> + * It simplifies the process by automatically handling the dispatching of
> + * live update events to separate handler functions for prepare, reboot,
> + * finish, and cancel.
> + *
> + * This macro expects the following functions to be defined:
> + *
> + * ``_name##_liveupdate_prepare()``:  Called on LIVEUPDATE_PREPARE.
> + * ``_name##_liveupdate_reboot()``:   Called on LIVEUPDATE_REBOOT.
> + * ``_name##_liveupdate_finish()``:   Called on LIVEUPDATE_FINISH.
> + * ``_name##_liveupdate_cancel()``:   Called on LIVEUPDATE_CANCEL.
> + *
> + * The generated callback function handles the switch statement for the
> + * different live update events and calls the appropriate handler function.
> + * It also includes warnings if the finish or cancel handlers return an error.
> + *
> + * For example, declartion can look like this:
> + *
> + * ``static int foo_liveupdate_prepare(void) { ... }``
> + *
> + * ``static int foo_liveupdate_reboot(void) { ... }``
> + *
> + * ``static int foo_liveupdate_finish(void) { ... }``
> + *
> + * ``static int foo_liveupdate_cancel(void) { ... }``
> + *
> + * ``LIVEUPDATE_DECLARE_NOTIFIER(foo, LIVEUPDATE_CB_PRIO_WITH_DEVICES);``
> + *

Hmm... Have you run kernel-doc validator? There is missing Return section and
it will warn about that.

> + */

...

> +		WARN_ONCE(rv, "cancel failed[%d]\n", rv);		\

+ bug.h

...

> + #undef pr_fmt
> + #define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt

Leftover from the development?

> +#undef pr_fmt

Not needed as long as pr_fmt9) is at the top of the file.

> +#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt

...

> +#include <linux/kernel.h>

What for? Can you follow IWYU, please? Here again semi-random list of
inclusions.

> +#include <linux/sysfs.h>
> +#include <linux/string.h>
> +#include <linux/rwsem.h>
> +#include <linux/err.h>
> +#include <linux/liveupdate.h>
> +#include <linux/cpufreq.h>
> +#include <linux/kexec_handover.h>

Can you keep them ordered which will be easier to read and maintain?

...

> +static int __init early_liveupdate_param(char *buf)
> +{
> +	return kstrtobool(buf, &luo_enabled);
> +}

> +

Redundant blank line.

> +early_param("liveupdate", early_liveupdate_param);

...

> +/* Show the current live update state */
> +static ssize_t state_show(struct kobject *kobj,
> +			  struct kobj_attribute *attr,

It is still enough room even for the strict 80 limit case.

> +			  char *buf)
> +{
> +	return sysfs_emit(buf, "%s\n", LUO_STATE_STR);
> +}

...

> +		ret = blocking_notifier_call_chain(&luo_notify_list,
> +						   event,
> +						   NULL);

There is room on the previous lines. Ditto for the similar cases all over
the code.

...

> +{
> +	int ret;
> +
> +	if (down_write_killable(&luo_state_rwsem)) {
> +		pr_warn(" %s, change state canceled by user\n", __func__);

Why __func__ is so important in _this_ message? And why leading space?
Ditto for the similar cases.

> +		return -EAGAIN;
> +	}
> +
> +	if (!IS_STATE(LIVEUPDATE_STATE_NORMAL)) {
> +		pr_warn("Can't switch to [%s] from [%s] state\n",
> +			luo_state_str[LIVEUPDATE_STATE_PREPARED],
> +			LUO_STATE_STR);
> +		up_write(&luo_state_rwsem);
> +
> +		return -EINVAL;
> +	}
> +
> +	ret = luo_notify(LIVEUPDATE_PREPARE);
> +	if (!ret)
> +		luo_set_state(LIVEUPDATE_STATE_PREPARED);
> +
> +	up_write(&luo_state_rwsem);
> +
> +	return ret;
> +}

...

> +static ssize_t prepare_store(struct kobject *kobj,
> +			     struct kobj_attribute *attr,
> +			     const char *buf,
> +			     size_t count)
> +{
> +	ssize_t ret;
> +	long val;

> +	if (kstrtol(buf, 0, &val) < 0)
> +		return -EINVAL;

Shadower error code.


> +	if (val != 1 && val != 0)
> +		return -EINVAL;

What's wrong with using kstrtobool() from the beginning?

> +
> +	if (val)
> +		ret = luo_prepare();
> +	else
> +		ret = luo_cancel();

> +	if (!ret)
> +		ret = count;
> +
> +	return ret;

Can we go with the usual pattern "check for errors first"?

	if (ret)
		return ret;

	...

> +}

...

> +static ssize_t finish_store(struct kobject *kobj,
> +			    struct kobj_attribute *attr,
> +			    const char *buf,
> +			    size_t count)

Same comments as per above.

...

> +static struct attribute *luo_attrs[] = {
> +	&state_attribute.attr,
> +	&prepare_attribute.attr,
> +	&finish_attribute.attr,

> +	NULL,

No comma for the terminator entry.

> +};

...

> +static int __init luo_init(void)
> +{
> +	int ret;
> +
> +	if (!luo_enabled || !kho_is_enabled()) {
> +		pr_info("disabled by user\n");
> +		luo_enabled = false;
> +
> +		return 0;
> +	}

Can be written like

	if (!kho_is_enabled())
		luo_enabled = false;
	if (!luo_enabled) {
		pr_info("disabled by user\n");
		return 0;
	}

> +	ret = sysfs_create_group(kernel_kobj, &luo_attr_group);
> +	if (ret)
> +		pr_err("Failed to create group\n");
> +
> +	luo_sysfs_initialized = true;
> +	pr_info("Initialized\n");
> +
> +	return ret;

Something is odd here between (non-)checking for errors and printed messages.

> +}

...

> +EXPORT_SYMBOL_GPL(liveupdate_state_normal);

No namespace?

...

> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -18,6 +18,7 @@
>  #include <linux/syscalls.h>
>  #include <linux/syscore_ops.h>
>  #include <linux/uaccess.h>
> +#include <linux/liveupdate.h>

Can oyu preserve order (with given context at least)?

-- 
With Best Regards,
Andy Shevchenko






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux