On Wed, Jul 12, 2023 at 10:43 AM Palmer Dabbelt <palmer@xxxxxxxxxxxx> wrote: > > On Tue, 11 Jul 2023 04:07:22 PDT (-0700), alex@xxxxxxxx wrote: > > Hi Guo, > > > > > > On 10/07/2023 07:40, guoren@xxxxxxxxxx wrote: > >> From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx> > >> > >> The machine_kexec() uses set_memory_x to modify the direct mapping > >> attributes from RW to RWX. But set_memory_x only changes the init_mm's > >> attributes, not current->active_mm, so when kexec jumps into > >> control_buffer, the instruction page fault happens, and there is no > >> minor_pagefault for it, then panic. > > > > > > I think it needs more details like this: > > > > "The current implementation of set_memory_x does not split hugepages in > > the linear mapping and then when a PGD mapping is used, the whole PGD is > > marked as executable. But changing the permissions at the PGD level must > > be propagated to all the page tables." > > > > > >> > >> The bug is found on an MMU_sv39 machine, and the direct mapping used a > >> 1GB PUD, the pgd entries. Here is the bug output: > >> > >> kexec_core: Starting new kernel > >> Will call new kernel at 00300000 from hart id 0 > >> FDT image at 747c7000 > >> Bye... > >> Unable to handle kernel paging request at virtual address ffffffda23b0d000 > >> Oops [#1] > >> Modules linked in: > >> CPU: 0 PID: 53 Comm: uinit Not tainted 6.4.0-rc6 #15 > >> Hardware name: Sophgo Mango (DT) > >> epc : 0xffffffda23b0d000 > >> ra : machine_kexec+0xa6/0xb0 > >> epc : ffffffda23b0d000 ra : ffffffff80008272 sp : ffffffc80c173d10 > >> gp : ffffffff8150e1e0 tp : ffffffd9073d2c40 t0 : 0000000000000000 > >> t1 : 0000000000000042 t2 : 6567616d69205444 s0 : ffffffc80c173d50 > >> s1 : ffffffd9076c4800 a0 : ffffffd9076c4800 a1 : 0000000000300000 > >> a2 : 00000000747c7000 a3 : 0000000000000000 a4 : ffffffd800000000 > >> a5 : 0000000000000000 a6 : ffffffd903619c40 a7 : ffffffffffffffff > >> s2 : ffffffda23b0d000 s3 : 0000000000300000 s4 : 00000000747c7000 > >> s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000 > >> s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000 > >> s11: 0000003f940001a0 t3 : ffffffff815351af t4 : ffffffff815351af > >> t5 : ffffffff815351b0 t6 : ffffffc80c173b50 > >> status: 0000000200000100 badaddr: ffffffda23b0d000 cause: 000000000000000c > >> > >> The solution is to fix machine_kexec() to remap control code page outside > >> the linear mapping. > > > > > > "Given the current flaw in the set_memory_x implementation, the simplest > > solution is to ..." > > > > > >> > >> Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") > >> Signed-off-by: Guo Ren <guoren@xxxxxxxxxxxxxxxxx> > >> Signed-off-by: Guo Ren <guoren@xxxxxxxxxx> > >> Cc: Alexandre Ghiti <alex@xxxxxxxx> > >> --- > >> Changelog: > >> V2: > >> - Use vm_map_ram instead of modifying set_memory_x > >> - Correct Fixes tag > >> --- > >> arch/riscv/include/asm/kexec.h | 1 + > >> arch/riscv/kernel/machine_kexec.c | 14 ++++++++++---- > >> 2 files changed, 11 insertions(+), 4 deletions(-) > >> > >> diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h > >> index 2b56769cb530..17456e91476e 100644 > >> --- a/arch/riscv/include/asm/kexec.h > >> +++ b/arch/riscv/include/asm/kexec.h > >> @@ -41,6 +41,7 @@ crash_setup_regs(struct pt_regs *newregs, > >> struct kimage_arch { > >> void *fdt; /* For CONFIG_KEXEC_FILE */ > >> unsigned long fdt_addr; > >> + void *control_code_buffer; > >> }; > >> > >> extern const unsigned char riscv_kexec_relocate[]; > >> diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c > >> index 2d139b724bc8..eeb209775107 100644 > >> --- a/arch/riscv/kernel/machine_kexec.c > >> +++ b/arch/riscv/kernel/machine_kexec.c > >> @@ -86,7 +86,14 @@ machine_kexec_prepare(struct kimage *image) > >> > >> /* Copy the assembler code for relocation to the control page */ > >> if (image->type != KEXEC_TYPE_CRASH) { > >> - control_code_buffer = page_address(image->control_code_page); > >> + control_code_buffer = vm_map_ram(&image->control_code_page, > >> + KEXEC_CONTROL_PAGE_SIZE/PAGE_SIZE, > >> + NUMA_NO_NODE); > >> + if (control_code_buffer == NULL) { > >> + pr_err("Failed to vm_map control page\n"); > >> + return -ENOMEM; > >> + } > >> + > >> control_code_buffer_sz = page_size(image->control_code_page); > >> > >> if (unlikely(riscv_kexec_relocate_size > control_code_buffer_sz)) { > >> @@ -97,8 +104,7 @@ machine_kexec_prepare(struct kimage *image) > >> memcpy(control_code_buffer, riscv_kexec_relocate, > >> riscv_kexec_relocate_size); > >> > >> - /* Mark the control page executable */ > >> - set_memory_x((unsigned long) control_code_buffer, 1); > >> + internal->control_code_buffer = control_code_buffer; > > > > > > Where is this mapping marked as executable? I see that vm_map_ram() maps > > the pages as PAGE_KERNEL, which does not set PAGE_EXEC. > > > > > >> } > >> > >> return 0; > >> @@ -211,7 +217,7 @@ machine_kexec(struct kimage *image) > >> unsigned long this_cpu_id = __smp_processor_id(); > >> unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id); > >> unsigned long fdt_addr = internal->fdt_addr; > >> - void *control_code_buffer = page_address(image->control_code_page); > >> + void *control_code_buffer = internal->control_code_buffer; > >> riscv_kexec_method kexec_method = NULL; > >> > >> #ifdef CONFIG_SMP > > > > > > Otherwise, you can add: > > > > Reviewed-by: Alexandre Ghiti <alexghiti@xxxxxxxxxxxx> > > > > Thanks, > > > > Alex > > Thanks for looking at this. Guo: do you have a re-spit that fixes the > issues Alex pointed out? Sorry if I just missed it. Sorry for the late reply. Here is the patch of v3: https://lore.kernel.org/linux-riscv/20230713150758.2956316-1-guoren@xxxxxxxxxx/ -- Best Regards Guo Ren