On Tue, Jun 11, 2024 at 02:21:42PM +0000, Edgecombe, Rick P wrote: > [EXTERNAL MAIL] > > On Tue, 2024-06-11 at 21:13 +0800, Leo Yu-Chi Liang wrote: > > The previous reset procedure is > > 1. Set direct map attribute to invalid > > 2. Flush TLB > > 3. Reset direct map attribute to default > > > > It is possible that kernel forks another process > > on another core that access the invalid mappings after > > sync_kernel_mappings. > > > > We could reproduce this scenario by running LTP/bpf_prog > > multiple times on RV32 kernel on QEMU. > > > > Therefore, the following procedure is proposed > > to avoid mappings being invalid. > > 1. Reset direct map attribute to default > > 2. Flush TLB > > Can you explain more about what is happening in this scenario? Looking briefly, > riscv is doing something unique around sync_kernel_mappings(). If a RO mapping > is copied instead of a NP/invalid mapping, how is the problem avoided? Hi Edgecombe, Sorry for the late reply and thank you for taking a look. What we are seeing at first is that running LTP bpf_prog03 test fails randomly on RV32 SMP QEMU with kernel 6.1 and the failed cause is a load page fault. After a bit of inspection, we found that the faulting page is a part of kernel's page table and the valid bit of that page's PTE is cleared due to this reset procedure. The scenario of this fault is suspected to be the following: 1. Running bpf_prog03: Creates kernel pages with elevated 'X' permission so that bpf program can be executed. 2. Finishing bpf_prog03: vfree code path to reset permission to default: a. Set the pages to invalid first b. Unmap the pages and flush TLB c. Reset them to default permission 3. Other core forkes new processes: sync_kernel_mappings copies the kernel page table. If the 3rd step happens during 2a, then we get a kernel mapping with invalid PTE permission, Therefore, if the invalid page is accessed, we'd get a page fault exception and the kernel panics. But despite all of the above conjecture, we still are wondering if setting the mappings to be invalid first is necessary. IMHO, "set to invalid --> unmap & flush TLB --> set to default" is identical to "set to default --> unmap & flush TLB". Could we not just reset them to default first and then flush TLB & free memory? Best regards, Leo