On Fri, Feb 14, 2025 at 5:33 PM Harry (Hyeonggon) Yoo <42.hyeyoo@xxxxxxxxx> wrote: > > On Thu, Feb 13, 2025 at 11:20:22AM +0800, Huacai Chen wrote: > > Hi, Harry, > > > > On Wed, Feb 12, 2025 at 11:39 PM Harry (Hyeonggon) Yoo > > <42.hyeyoo@xxxxxxxxx> wrote: > > > On Wed, Feb 12, 2025 at 11:17 PM Huacai Chen <chenhuacai@xxxxxxxxxxx> wrote: > > > > > > > > Hibernation assumes the memory layout after resume be the same as that > > > > before sleep, but CONFIG_RANDOM_KMALLOC_CACHES breaks this assumption. > > > > > > Could you please elaborate what do you mean by > > > hibernation assumes 'the memory layout' after resume be the same as that > > > before sleep? > > > > > > I don't understand how updating random_kmalloc_seed breaks resuming from > > > hibernation. Changing random_kmalloc_seed affects which kmalloc caches > > > newly allocated objects are from, but it should not affect the objects that are > > > already allocated (before hibernation). > > > > When resuming, the booting kernel should switch to the target kernel, > > if the address of switch code (from the booting kernel) is the > > effective data of the target kernel, then the switch code may be > > overwritten. > > Hmm... I'm still missing some pieces. > How is the kernel binary overwritten when slab allocations are randomized? > > Also, I'm not sure if it's even safe to assume that the memory layout is the > same across boots. But I'm not an expert on swsusp anyway... > > It'd be really helpful for linux-pm folks to clarify 1) what are the > (architecture-independent) assumptions are for swsusp to work, and > 2) how architectures dealt with other randomization features like kASLR... I'm sorry to confuse you. Binary overwriting is indeed caused by kASLR, so at least on LoongArch we should disable kASLR for hibernation. Random kmalloc is another story, on LoongArch it breaks smpboot when resuming, the details are: 1, LoongArch uses kmalloc() family to allocate idle_task's stack/thread_info and other data structures. 2, If random kmalloc is enabled, idle_task's stack in the booting kernel may be other things in the target kernel. 3, When CPU0 executes the switch code, other CPUs are executing idle_task, and their stacks may be corrupted by the switch code. So in experiments we can fix hibernation only by moving random_kmalloc_seed initialization after smp_init(). But obviously, moving it after all initcalls is harmless and safer. Huacai > > > For LoongArch there is an additional problem: the regular kernel > > function uses absolute address to call exception handlers, this means > > the code calls to exception handlers should at the same address for > > booting kernel and target kernel. > > -- > Harry