On Fri, Feb 14, 2025 at 06:02:52PM +0800, Huacai Chen wrote: > On Fri, Feb 14, 2025 at 5:33 PM Harry (Hyeonggon) Yoo > <42.hyeyoo@xxxxxxxxx> wrote: > > > > On Thu, Feb 13, 2025 at 11:20:22AM +0800, Huacai Chen wrote: > > > Hi, Harry, > > > > > > On Wed, Feb 12, 2025 at 11:39 PM Harry (Hyeonggon) Yoo > > > <42.hyeyoo@xxxxxxxxx> wrote: > > > > On Wed, Feb 12, 2025 at 11:17 PM Huacai Chen <chenhuacai@xxxxxxxxxxx> wrote: > > > > > > > > > > Hibernation assumes the memory layout after resume be the same as that > > > > > before sleep, but CONFIG_RANDOM_KMALLOC_CACHES breaks this assumption. > > > > > > > > Could you please elaborate what do you mean by > > > > hibernation assumes 'the memory layout' after resume be the same as that > > > > before sleep? > > > > > > > > I don't understand how updating random_kmalloc_seed breaks resuming from > > > > hibernation. Changing random_kmalloc_seed affects which kmalloc caches > > > > newly allocated objects are from, but it should not affect the objects that are > > > > already allocated (before hibernation). > > > > > > When resuming, the booting kernel should switch to the target kernel, > > > if the address of switch code (from the booting kernel) is the > > > effective data of the target kernel, then the switch code may be > > > overwritten. > > > > Hmm... I'm still missing some pieces. > > How is the kernel binary overwritten when slab allocations are randomized? > > > > Also, I'm not sure if it's even safe to assume that the memory layout is the > > same across boots. But I'm not an expert on swsusp anyway... > > > > It'd be really helpful for linux-pm folks to clarify 1) what are the > > (architecture-independent) assumptions are for swsusp to work, and > > 2) how architectures dealt with other randomization features like kASLR... > [+Cc few more people that worked on slab hardening] > I'm sorry to confuse you. Binary overwriting is indeed caused by > kASLR, so at least on LoongArch we should disable kASLR for > hibernation. Understood. > Random kmalloc is another story, on LoongArch it breaks smpboot when > resuming, the details are: > 1, LoongArch uses kmalloc() family to allocate idle_task's > stack/thread_info and other data structures. > 2, If random kmalloc is enabled, idle_task's stack in the booting > kernel may be other things in the target kernel. Slab hardening features try so hard to prevent such predictability. For example, SLAB_FREELIST_RANDOM could also randomize the address kmalloc objects are allocated at. Rather than hacking CONFIG_RANDOM_KMALLOC_CACHES like this, we could have a single option to disable slab hardening features that makes the address unpredictable. It'd be nice to have something like ARCH_SUPPORTS_SLAB_RANDOM which some hardening features depend on. And then let some arches conditionally not select ARCH_SUPPORTS_SLAB_RANDOM if hibernation's enabled (at cost of less hardening)? -- Harry > 3, When CPU0 executes the switch code, other CPUs are executing > idle_task, and their stacks may be corrupted by the switch code. > > So in experiments we can fix hibernation only by moving > random_kmalloc_seed initialization after smp_init(). But obviously, > moving it after all initcalls is harmless and safer. > > > Huacai > > > > For LoongArch there is an additional problem: the regular kernel > > > function uses absolute address to call exception handlers, this means > > > the code calls to exception handlers should at the same address for > > > booting kernel and target kernel.