Hello, > On 15.08.22 16:22, Will Deacon wrote: >> >>> [...] >>> >>> >>> [20:47:09] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>> [20:48:46] BUG: Bad page map in process projecta pte:1110111111111111 pmd:800000001c40003 >>> [20:48:46] addr:0000007fa1c00000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c00 >>> [20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0 > >> I hate to say it, but this all looks like memory corruption hitting the >> page table and possibly the 'struct page' array to me :/ > > Perhaps a note on the occcurence: across devices, the "bad page map" differs at pte, but somehow is mostly consistent at pmd:800000001c40003 (though I have seen 800000001c02003 and 800000001c40003). Is this some "magic value"? Because when not, I think it would be highly unlikely to be the hardware. > > It is not only my program that has the problem, I have seen > > [Sun Aug 14 17:30:38 2022] BUG: Bad page map in process llvmpipe-3 pte:262d2626292a2627 pmd:800000001c01003 > > and > [Sat Aug 13 11:53:43 2022] BUG: Bad page map in process Xorg:disk$1 pte:a098a09aa29ea8a4 pmd:800000001c01003 > [Sat Aug 13 11:53:43 2022] addr:00000055a961e000 vm_flags:200100073 anon_vma:ffffff804c07d8f8 mapping:0000000000000000 index:55a961e > [Sat Aug 13 11:53:43 2022] file:(null) fault:0x0 mmap:0x0 read_folio:0x0 > > [..] I am able to reproduce this on 6.0.0-rc1 . It looks like vm_normal_page does not recognize the page as being "normal" (?). (mm/memory.c) > if (likely(!pte_special(pte))) > goto check_pfn; > if (vma->vm_ops && vma->vm_ops->find_special_page) > return vma->vm_ops->find_special_page(vma, addr); > if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) > return NULL; > if (is_zero_pfn(pfn)) > return NULL; > if (pte_devmap(pte)) >[...] > return NULL; > > print_bad_pte(vma, addr, pte, NULL); What would be helpful to do next? Is the KASAN warning a consequent error or the cause? [ 18:42:59] [ 18:44:17] BUG: Bad page map in process projecta pte:212725231f242323 pmd:800000001c01003 [ 18:44:17] addr:0000007fa1000000 vm_flags:00100073 anon_vma:ffffff8054090c38 mapping:0000000000000000 index:7fa1000 [ 18:44:17] file:(null) fault:0x0 mmap:0x0 read_folio:0x0 [ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G C 6.0.0-rc1-v8-gc8f41281d1f4 #2 [ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT) [ 18:44:17] Call trace: [ 18:44:17] dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:184) [ 18:44:17] show_stack (arch/arm64/kernel/stacktrace.c:191) [ 18:44:17] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) [ 18:44:17] dump_stack (lib/dump_stack.c:114) [ 18:44:17] print_bad_pte (mm/memory.c:567 (discriminator 12)) [ 18:44:17] vm_normal_page (mm/memory.c:638) [ 18:44:17] copy_page_range (mm/memory.c:951 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330) [ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524) [ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256) [ 18:44:17] kernel_clone (kernel/fork.c:2673) [ 18:44:17] __do_sys_clone (kernel/fork.c:2808) [ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775) [ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52) [ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150) [ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207) [ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625) [ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643) [ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581) [ 18:44:17] Disabling lock debugging due to kernel taint [ 18:44:17] BUG: Bad page map in process projecta pte:2626262023222323 pmd:800000001c01003 [ 18:44:17] addr:0000007fa1001000 vm_flags:00100073 anon_vma:ffffff8054090c38 mapping:0000000000000000 index:7fa1001 [ 18:44:17] file:(null) fault:0x0 mmap:0x0 read_folio:0x0 [ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G B C 6.0.0-rc1-v8-gc8f41281d1f4 #2 [ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT) [ 18:44:17] Call trace: [ 18:44:17] dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:184) [ 18:44:17] show_stack (arch/arm64/kernel/stacktrace.c:191) [ 18:44:17] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) [ 18:44:17] dump_stack (lib/dump_stack.c:114) [ 18:44:17] print_bad_pte (mm/memory.c:567 (discriminator 12)) [ 18:44:17] vm_normal_page (mm/memory.c:638) [ 18:44:17] copy_page_range (mm/memory.c:951 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330) [ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524) [ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256) [ 18:44:17] kernel_clone (kernel/fork.c:2673) [ 18:44:17] __do_sys_clone (kernel/fork.c:2808) [ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775) [ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52) [ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150) [ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207) [ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625) [ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643) [ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581) [ 18:44:17] ================================================================== [ 18:44:17] BUG: KASAN: wild-memory-access in __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) [ 18:44:17] Read of size 8 at addr 00000096808c8880 by task projecta/1135 [ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G B C 6.0.0-rc1-v8-gc8f41281d1f4 #2 [ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT) [ 18:44:17] Call trace: [ 18:44:17] dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:184) [ 18:44:17] show_stack (arch/arm64/kernel/stacktrace.c:191) [ 18:44:17] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) [ 18:44:17] print_report (mm/kasan/report.c:438) [ 18:44:17] kasan_report (mm/kasan/report.c:162 mm/kasan/report.c:497) [ 18:44:17] __asan_load8 (mm/kasan/generic.c:256) [ 18:44:17] __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) [ 18:44:17] copy_page_range (./arch/arm64/include/asm/pgtable.h:327 ./arch/arm64/include/asm/pgtable.h:358 mm/memory.c:994 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330) [ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524) [ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256) [ 18:44:17] kernel_clone (kernel/fork.c:2673) [ 18:44:17] __do_sys_clone (kernel/fork.c:2808) [ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775) [ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52) [ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150) [ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207) [ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625) [ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643) [ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581) [ 18:44:17] ================================================================== [ 18:44:17] Unable to handle kernel paging request at virtual address 00000096808c8880 [ 18:44:17] Mem abort info: [ 18:44:17] ESR = 0x0000000096000004 [ 18:44:17] EC = 0x25: DABT (current EL), IL = 32 bits [ 18:44:17] SET = 0, FnV = 0 [ 18:44:17] EA = 0, S1PTW = 0 [ 18:44:17] FSC = 0x04: level 0 translation fault [ 18:44:17] Data abort info: [ 18:44:17] ISV = 0, ISS = 0x00000004 [ 18:44:17] CM = 0, WnR = 0 [ 18:44:17] [00000096808c8880] address between user and kernel address ranges [ 18:44:17] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 18:44:17] Modules linked in: rtc_pcf85063 regmap_i2c ov9281 rfkill bcm2835_unicam v4l2_dv_timings v4l2_fwnode v3d bcm2835_v4l2(C) v4l2_async bcm2835_codec(C) bcm2835_isp(C) videobuf2_vmalloc rpivid_hevc(C) v4l2_mem2mem drm_shmem_helper bcm2835_mmal_vchiq(C) gpu_sched videobuf2_dma_contig videobuf2_memops i2c_mux_pinctrl videobuf2_v4l2 videobuf2_common raspberrypi_hwmon i2c_mux videodev i2c_brcmstb i2c_bcm2835 vc_sm_cma(C) mc uio_pdrv_genirq nvmem_rmem uio drm fuse drm_panel_orientation_quirks backlight ipv6 [ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G B C 6.0.0-rc1-v8-gc8f41281d1f4 #2 [ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT) [ 18:44:17] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 18:44:17] pc : __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) [ 18:44:17] lr : __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) [ 18:44:17] sp : ffffffc00d067630 [ 18:44:17] x29: ffffffc00d067630 x28: 0400000000000001 x27: 2626262023222323 [ 18:44:17] x26: 0000007fa1001000 x25: fffffffe010f2ce8 x24: 0000000000000000 [ 18:44:17] x23: fffffffe00000000 x22: 00000096808c8880 x21: 1ffffff801a0cece [ 18:44:17] x20: 0000000000000000 x19: 00000098808c8880 x18: 0000000000000000 [ 18:44:17] x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d [ 18:44:17] x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: ffffffb8014cd81d [ 18:44:17] x11: 1ffffff8014cd81c x10: ffffffb8014cd81c x9 : dfffffc000000000 [ 18:44:17] x8 : ffffffc00a66c0e7 x7 : 00000047feb327e4 x6 : 0000000000000001 [ 18:44:17] x5 : ffffffc00a66c0e0 x4 : ffffffb8014cd81d x3 : ffffffc0080b68e4 [ 18:44:17] x2 : 0000000000000000 x1 : ffffff804f3e0040 x0 : 0000000000000001 [ 18:44:17] Call trace: [ 18:44:17] __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) [ 18:44:17] copy_page_range (./arch/arm64/include/asm/pgtable.h:327 ./arch/arm64/include/asm/pgtable.h:358 mm/memory.c:994 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330) [ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524) [ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256) [ 18:44:17] kernel_clone (kernel/fork.c:2673) [ 18:44:17] __do_sys_clone (kernel/fork.c:2808) [ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775) [ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52) [ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150) [ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207) [ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625) [ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643) [ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581) [ 18:44:17] Code: d37ae673 8b170276 aa1603e0 940f8ac1 (f8776a60) All code ======== 0: d37ae673 lsl x19, x19, #6 4: 8b170276 add x22, x19, x23 8: aa1603e0 mov x0, x22 c: 940f8ac1 bl 0x3e2b10 10:* f8776a60 ldr x0, [x19, x23] <-- trapping instruction Code starting with the faulting instruction =========================================== 0: f8776a60 ldr x0, [x19, x23] [ 18:44:17] ---[ end trace 0000000000000000 ]--- [ 18:44:17] note: projecta[1135] exited with preempt_count 2 Thanks, Max