Main changes from v1 [1]: - Get rid of abusing crashkernel and implent proper way to pass memory to new kernel - Lots of misc cleanups/refactorings. kstate (kernel state) is a mechanism to describe internal some part of the kernel state, save it into the memory and restore the state after kexec in the new kernel. The end goal here and the main use case for this is to be able to update host kernel under VMs with VFIO pass-through devices running on that host. Since we are pretty far from that end goal yet, this only establishes some basic infrastructure to describe and migrate complex in-kernel states. The idea behind KSTATE resembles QEMU's migration framework [1], which solves quite similar problem - migrate state of VM/emulated devices across different versions of QEMU. This is an altenative to Kexec Hand Over (KHO [3]). So, why not KHO? - The main reason is KHO doesn't provide simple and convenient internal API for the drivers/subsystems to preserve internal data. E.g. lets consider we have some variable of type 'struct a' that needs to be preserved: struct a { int i; unsigned long *p_ulong; char s[10]; struct page *page; }; The KHO-way requires driver/subsystem to have a bunch of code dealing with FDT stuff, something like a_kho_write() { ... fdt_property(fdt, "i", &a.i, sizeof(a.i)); fdt_property(fdt, "ulong", a.p_ulong, sizeof(*a.p_ulong)); fdt_property(fdt, "s", &a.s, sizeof(a.s)); if (err) ... } a_kho_restore() { ... a.i = fdt_getprop(fdt, offset, "i", &len); if (!a.i || len != sizeof(a.i)) goto err *a.p_ulong = fdt_getprop.... } Each driver/subsystem has to solve this problem in their own way. Also if we use fdt properties for individual fields, that might be wastefull in terms of used memory, as these properties use strings as keys. While with KSTATE solves the same problem in more elegant way, with this: struct kstate_description a_state = { .name = "a_struct", .version_id = 1, .id = KSTATE_TEST_ID, .state_list = LIST_HEAD_INIT(test_state.state_list), .fields = (const struct kstate_field[]) { KSTATE_BASE_TYPE(i, struct a, int), KSTATE_BASE_TYPE(s, struct a, char [10]), KSTATE_POINTER(p_ulong, struct a), KSTATE_PAGE(page, struct a), KSTATE_END_OF_LIST() }, }; { static unsigned long ulong static struct a a_data = { .p_ulong = &ulong }; kstate_register(&test_state, &a_data); } The driver needs only to have a proper 'kstate_description' and call kstate_register() to save/restore a_data. Basically 'struct kstate_description' provides instructions how to save/restore 'struct a'. And kstate_register() does all this save/restore stuff under the hood. - Another bonus point - kstate can preserve migratable memory, which is required to preserve guest memory So now to the part how this works. State of kernel data (usually it's some struct) is described by the 'struct kstate_description' containing the array of individual fields descpriptions - 'struct kstate_field'. Each field has set of bits in ->flags which instructs how to save/restore a certain field of the struct. E.g.: - KS_BASE_TYPE flag tells that field can be just copied by value, - KS_POINTER means that the struct member is a pointer to the actual data, so it needs to be dereference before saving/restoring data to/from kstate data steam. - KS_STRUCT - contains another struct, field->ksd must point to another 'struct kstate_dscription' - KS_CUSTOM - Some non-trivial field that requires custom kstate_field->save() ->restore() callbacks to save/restore data. - KS_ARRAY_OF_POINTER - array of pointers, the size of array determined by the field->count() callback - KS_ADDRESS - field is a pointer to either vmemmap area (struct page) or linear address. Store offset - KS_END - special flag indicating the end of migration stream data. kstate_register() call accepts kstate_description along with an instance of an object and registers it in the global 'states' list. During kexec reboot phase we go through the list of 'kstate_description's and each instance of kstate_description forms the 'struct kstate_entry' which save into the kstate's data stream. The 'kstate_entry' contains information like ID of kstate_description, version of it, size of migration data and the data itself. The ->data is formed in accordance to the kstate_field's of the corresponding kstate_description. After the reboot, when the kstate_register() called it parses migration stream, finds the appropriate 'kstate_entry' and restores the contents of the object in accordance with kstate_description and ->fields. [1] https://lkml.kernel.org/r/20241002160722.20025-1-arbn@xxxxxxxxxxxxxxx [2] https://www.qemu.org/docs/master/devel/migration/main.html#vmstate [3] https://lkml.kernel.org/r/20250206132754.2596694-1-rppt@xxxxxxxxxx Andrey Ryabinin (7): kstate: Add kstate - a mechanism to describe and migrate kernel state across kexec kstate, kexec, x86: transfer kstate data across kexec kexec: exclude control pages from the destination addresses kexec, kstate: delay loading of kexec segments x86, kstate: Add the ability to preserve memory pages across kexec. kexec, kstate: save kstate data before kexec'ing kstate, test: add test module for testing kstate subsystem. arch/x86/Kconfig | 1 + arch/x86/kernel/kexec-bzimage64.c | 4 + arch/x86/kernel/setup.c | 2 + include/linux/kexec.h | 3 + include/linux/kstate.h | 216 ++++++++++++++ kernel/Kconfig.kexec | 13 + kernel/Makefile | 1 + kernel/kexec_core.c | 30 ++ kernel/kexec_file.c | 159 +++++++---- kernel/kexec_internal.h | 9 + kernel/kstate.c | 458 ++++++++++++++++++++++++++++++ lib/Makefile | 2 + lib/test_kstate.c | 86 ++++++ 13 files changed, 925 insertions(+), 59 deletions(-) create mode 100644 include/linux/kstate.h create mode 100644 kernel/kstate.c create mode 100644 lib/test_kstate.c -- 2.45.3