On Wed, Jul 5, 2023 at 3:01 PM Alexandre Ghiti <alex@xxxxxxxx> wrote: > > > On 04/07/2023 04:25, Guo Ren wrote: > > On Mon, Jul 3, 2023 at 6:17 PM Alexandre Ghiti <alex@xxxxxxxx> wrote: > >> Hi Guo, > >> > >> On 29/06/2023 10:20, guoren@xxxxxxxxxx wrote: > >>> From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx> > >>> > >>> The machine_kexec() uses set_memory_x to add the executable attribute to the > >>> page table entry of control_code_buffer. It only modifies the init_mm but not > >>> the current->active_mm. The current kexec process won't use init_mm directly, > >>> and it depends on minor_pagefault, which is removed by commit 7d3332be011e4 > >> > >> Is the removal of minor_pagefault an issue? I'm not sure I understand > >> this part of the changelog. > > I use two different work-around patches to answer your question: > > 1st: > > ----- > > diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c > > index 705d63a59aec..b8b200c81606 100644 > > --- a/arch/riscv/mm/fault.c > > +++ b/arch/riscv/mm/fault.c > > @@ -249,7 +249,7 @@ void handle_page_fault(struct pt_regs *regs) > > * only copy the information from the master page table, > > * nothing more. > > */ > > - if (unlikely((addr >= VMALLOC_START) && (addr < VMALLOC_END))) { > > + if (unlikely(addr >= 0x8000000000000000UL)) { > > vmalloc_fault(regs, code, addr); > > return; > > } > > ------ > > > > 2nd: > > ------ > > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > > index 8e65f0a953e5..270f50852886 100644 > > --- a/arch/riscv/mm/init.c > > +++ b/arch/riscv/mm/init.c > > @@ -1387,7 +1387,7 @@ static void __init create_linear_mapping_page_table(void) > > if (end >= __pa(PAGE_OFFSET) + memory_limit) > > end = __pa(PAGE_OFFSET) + memory_limit; > > > > - create_linear_mapping_range(start, end, 0); > > + create_linear_mapping_range(start, end, PMD_SIZE); > > } > > > > #ifdef CONFIG_STRICT_KERNEL_RWX > > ----- > > > > The removal of minor_pagefault could be an issue, but in this case > > it's the VMALLOC_START/END which prevents the minor_pagefault at > > first. I didn't say commit 7d3332be011e4 is the problem. > > > Sorry I still don't understand what you mean here and why you mention > the minor pagefault, could you explain again please? > > > >> > >>> ("riscv: mm: Pre-allocate PGD entries for vmalloc/modules area") of 64BIT. So, > >>> when it met pud mapping on an MMU_SV39 machine, it caused the following: > >>> > >>> kexec_core: Starting new kernel > >>> Will call new kernel at 00300000 from hart id 0 > >>> FDT image at 747c7000 > >>> Bye... > >>> Unable to handle kernel paging request at virtual address ffffffda23b0d000 > >>> Oops [#1] > >>> Modules linked in: > >>> CPU: 0 PID: 53 Comm: uinit Not tainted 6.4.0-rc6 #15 > >>> Hardware name: Sophgo Mango (DT) > >>> epc : 0xffffffda23b0d000 > >>> ra : machine_kexec+0xa6/0xb0 > >>> epc : ffffffda23b0d000 ra : ffffffff80008272 sp : ffffffc80c173d10 > >>> gp : ffffffff8150e1e0 tp : ffffffd9073d2c40 t0 : 0000000000000000 > >>> t1 : 0000000000000042 t2 : 6567616d69205444 s0 : ffffffc80c173d50 > >>> s1 : ffffffd9076c4800 a0 : ffffffd9076c4800 a1 : 0000000000300000 > >>> a2 : 00000000747c7000 a3 : 0000000000000000 a4 : ffffffd800000000 > >>> a5 : 0000000000000000 a6 : ffffffd903619c40 a7 : ffffffffffffffff > >>> s2 : ffffffda23b0d000 s3 : 0000000000300000 s4 : 00000000747c7000 > >>> s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000 > >>> s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000 > >>> s11: 0000003f940001a0 t3 : ffffffff815351af t4 : ffffffff815351af > >>> t5 : ffffffff815351b0 t6 : ffffffc80c173b50 > >>> status: 0000000200000100 badaddr: ffffffda23b0d000 cause: 000000000000000c > >>> > >>> Yes, Using set_memory_x API after boot has the limitation, and at least we > >>> should synchronize the current->active_mm to fix the problem. > >>> > >>> Fixes: d3ab332a5021 ("riscv: add ARCH_HAS_SET_MEMORY support") > >>> Signed-off-by: Guo Ren <guoren@xxxxxxxxxxxxxxxxx> > >>> Signed-off-by: Guo Ren <guoren@xxxxxxxxxx> > >>> --- > >>> arch/riscv/mm/pageattr.c | 7 +++++++ > >>> 1 file changed, 7 insertions(+) > >>> > >>> diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c > >>> index ea3d61de065b..23d169c4ee81 100644 > >>> --- a/arch/riscv/mm/pageattr.c > >>> +++ b/arch/riscv/mm/pageattr.c > >>> @@ -123,6 +123,13 @@ static int __set_memory(unsigned long addr, int numpages, pgprot_t set_mask, > >>> &masks); > >>> mmap_write_unlock(&init_mm); > >>> > >>> + if (current->active_mm != &init_mm) { > >>> + mmap_write_lock(current->active_mm); > >>> + ret = walk_page_range_novma(current->active_mm, start, end, > >>> + &pageattr_ops, NULL, &masks); > >>> + mmap_write_unlock(current->active_mm); > >>> + } > >>> + > >>> flush_tlb_kernel_range(start, end); > >>> > >>> return ret; > >> > >> I don't understand: any page table inherits the entries of > >> swapper_pg_dir (see pgd_alloc()), so any kernel page table entry is > >> "automatically" synchronized, so why should we synchronize one 4K entry > >> explicitly? A PGD entry would need to be synced, but not a PTE entry. > > The purpose of the second walk_page_range_novma() is for pgd's entries > > synchronization. I'm a bit lazy here, I agree, it's unnecessary to > > write lower level entries again. So I would use a simple pgd entries > > synchronization from vmalloc_fault in the next version of patch, all > > right? > > > But vmalloc_fault was removed by commit 7d3332be011e4 for CONFIG_64BIT, > so I don't get it: why would we need to synchronize a PGD entry in your > case? Where does this new PGD come from? And the trap address is > ffffffda23b0d000, which lies in the direct mapping, so why do you > mention vmalloc_fault at all? The machine_kexec() uses set_memory_x to modify the direct mapping attributes from RW to RWX. But set_memory_x only changes the init_mm's attributes, not current->active_mm, so when kexec jumps into control_buffer, the instruction page fault happens, and there is no minor_pagefault for it, then panic. I found the bug on an MMU_sv39 machine, and the direct mapping used a 1GB PUD, the pgd entries. This patch could solve the problem by synchronizing a PGD entry between init_mm and current->active_mm. > > Sorry if I'm missing something, hope you can clarify things! > > Thanks, > > Alex > > > > > > > > -- > > Best Regards > > Guo Ren > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@xxxxxxxxxxxxxxxxxxx > > http://lists.infradead.org/mailman/listinfo/linux-riscv -- Best Regards Guo Ren