On Mon, 12 Dec 2022 14:31:35 +1000 "Nicholas Piggin" <npiggin@xxxxxxxxx> wrote: > On Thu Dec 1, 2022 at 7:44 AM AEST, Erhard F. wrote: > > Getting this at boot sometimes, but not always (PowerMac G4 DP, kernel 6.0.9): > > > > [...] > > Freeing unused kernel image (initmem) memory: 1328K > > Checked W+X mappings: passed, no W+X pages found > > rodata_test: all tests were successful > > Run /sbin/init as init process > > _swap_info_get: Bad swap file entry 24c0ab68 > > BUG: Bad page map in process init pte:c0ab684c pmd:01182000 > > Have you run memtest on the system? Are the messages related to a > kernel upgrade? This and your KASAN bugs look possibly like random > corruption. > > Although with that KASAN one it's strange that kernfs_node_cache > was involved both times, it's strange that page tables are pointing > to that same slab memory. It could be a page table page use-after > -free maybe? Maybe with the page table fragment code. I'm sure other > people would have hit that before though, so I don't know what to > suggest. > > Thanks, > Nick Revisited the issue on kernel v6.8-rc6 and I can still reproduce it. Short summary as my last post was over a year ago: (x) I get this memory corruption only when CONFIG_VMAP_STACK=y and CONFIG_SMP=y is enabled. (x) I don't get this memory corruption when only one of the above is enabled. ^^ (x) memtester says the 2 GiB RAM in my G4 DP are fine. (x) I don't get this issue on my G5 11,2 or Talos II. (x) "stress -m 2 --vm-bytes 965M" provokes the issue in < 10 secs. (https://salsa.debian.org/debian/stress) For the test I used CONFIG_KASAN_INLINE=y for v6.8-rc6 and debug_pagealloc=on, page_owner=on and got this dmesg: [...] pagealloc: memory corruption f5fcfff0: 00 00 00 00 .... CPU: 1 PID: 1788 Comm: stress Tainted: G B 6.8.0-rc6-PMacG4 #15 Hardware name: PowerMac3,6 7455 0x80010303 PowerMac Call Trace: [f3bfbac0] [c162a8e8] dump_stack_lvl+0x60/0x94 (unreliable) [f3bfbae0] [c04edf9c] __kernel_unpoison_pages+0x1e0/0x1f0 [f3bfbb30] [c04a8aa0] post_alloc_hook+0xe0/0x174 [f3bfbb60] [c04a8b58] prep_new_page+0x24/0xbc [f3bfbb80] [c04abcc4] get_page_from_freelist+0xcd0/0xf10 [f3bfbc50] [c04aecd8] __alloc_pages+0x204/0xe2c [f3bfbda0] [c04b07a8] __folio_alloc+0x18/0x88 [f3bfbdc0] [c0461a10] vma_alloc_zeroed_movable_folio.isra.0+0x2c/0x6c [f3bfbde0] [c046bb90] handle_mm_fault+0x91c/0x19ac [f3bfbec0] [c0047b8c] ___do_page_fault+0x93c/0xc14 [f3bfbf10] [c0048278] do_page_fault+0x28/0x60 [f3bfbf30] [c000433c] DataAccess_virt+0x124/0x17c --- interrupt: 300 at 0xbe30d8 NIP: 00be30d8 LR: 00be30b4 CTR: 00000000 REGS: f3bfbf40 TRAP: 0300 Tainted: G B (6.8.0-rc6-PMacG4) MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 20882464 XER: 00000000 DAR: 88c7a010 DSISR: 42000000 GPR00: 00be30b4 af8397d0 a78436c0 6b2ee010 3c500000 20224462 fe77f7e1 00b00264 GPR08: 1d98d000 1d98c000 00000000 40ae256a 20882262 00bffff4 00000000 00000000 GPR16: 00000000 00000002 00000000 0000005a 40802262 80002262 40002262 00c000a4 GPR24: ffffffff ffffffff 3c500000 00000000 00000000 6b2ee010 00c07d64 00001000 NIP [00be30d8] 0xbe30d8 LR [00be30b4] 0xbe30b4 --- interrupt: 300 page:ef4bd92c refcount:1 mapcount:0 mapping:00000000 index:0x1 pfn:0x310b3 flags: 0x80000000(zone=2) page_type: 0xffffffff() raw: 80000000 00000100 00000122 00000000 00000001 00000000 ffffffff 00000001 raw: 00000000 page dumped because: pagealloc: corrupted page details page_owner info is not present (never set?) swapper/1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 1 PID: 0 Comm: swapper/1 Tainted: G B 6.8.0-rc6-PMacG4 #15 Hardware name: PowerMac3,6 7455 0x80010303 PowerMac Call Trace: [f101b9d0] [c162a8e8] dump_stack_lvl+0x60/0x94 (unreliable) [f101b9f0] [c04ae948] warn_alloc+0x154/0x2e0 [f101bab0] [c04af030] __alloc_pages+0x55c/0xe2c SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0 node 0: slabs: 509, objs: 7126, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0 node 0: slabs: 509, objs: 7126, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0 node 0: slabs: 509, objs: 7126, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0 node 0: slabs: 509, objs: 7126, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0 node 0: slabs: 509, objs: 7126, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0 node 0: slabs: 509, objs: 7126, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0 node 0: slabs: 509, objs: 7126, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0 node 0: slabs: 509, objs: 7126, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0 node 0: slabs: 509, objs: 7126, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0 node 0: slabs: 509, objs: 7126, free: 0 [...] New findings: (x) The page corruption only shows up the 1st time I run "stress -m 2 --vm-bytes 965M". When I quit and restart stress no additional page corruption shows up. (x) The page corruption shows up shortly after I run "stress -m 2 --vm-bytes 965M" but no additional page corruption shows up afterwards, even if left running for 30min. For additional testing I thought it would be a good idea to try "modprobe test_vmalloc" but this remained inconclusive. Sometimes a 'BUG: Unable to handle kernel data access on read at 0xe0000000' like this shows up but not always: [...] vmap allocation for size 8192 failed: use vmalloc=<size> to increase size vmalloc_test/0: vmalloc error: size 4096, vm_struct allocation failed, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 1 PID: 1792 Comm: vmalloc_test/0 Tainted: G B 6.8.0-rc6-PMacG4 #12 Hardware name: PowerMac3,6 7455 0x80010303 PowerMac Call Trace: [f3bdbcc0] [c162a8c8] dump_stack_lvl+0x60/0x94 (unreliable) [f3bdbce0] [c04ae928] warn_alloc+0x154/0x2e0 [f3bdbda0] [c04a02d0] __vmalloc_node_range+0x168/0x96c [f3bdbe60] [c04a0bc0] __vmalloc_node+0xec/0x114 [f3bdbea0] [bfa3d308] kvfree_rcu_2_arg_vmalloc_test+0x50/0xb4 [test_vmalloc] [f3bdbec0] [bfa3e254] test_func+0x490/0x568 [test_vmalloc] [f3bdbfc0] [c00ff1fc] kthread+0x384/0x394 [f3bdbff0] [c002a304] start_kernel_thread+0x10/0x14 Mem-Info: active_anon:13098 inactive_anon:0 isolated_anon:0 active_file:31626 inactive_file:9834 isolated_file:0 unevictable:0 dirty:1 writeback:0 slab_reclaimable:1935 slab_unreclaimable:23175 mapped:15912 shmem:148 pagetables:260 sec_pagetables:0 bounce:0 kernel_misc_reclaimable:0 free:379850 free_pcp:188 free_cma:0 Node 0 active_anon:52392kB inactive_anon:0kB active_file:126504kB inactive_file:39336kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:63648kB dirty:4kB writeback:0kB shmem:592kB writeback_tmp:0kB kernel_stack:1984kB pagetables:1040kB sec_pagetables:0kB all_unreclaimable? no DMA free:545332kB boost:0kB min:3312kB low:4140kB high:4968kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:4kB inactive_file:20864kB unevictable:0kB writepending:0kB present:786432kB managed:687812kB mlocked:0kB bounce:0kB free_pcp:752kB local_pcp:192kB free_cma:0kB lowmem_reserve[]: 0 0 1184 1184 DMA: 3*4kB (ME) 15*8kB (UME) 5*16kB (ME) 7*32kB (UME) 2*64kB (UE) 2*128kB (ME) 1*256kB (M) 1*512kB (U) 5*1024kB (UM) 5*2048kB (UM) 129*4096kB (M) = 545332kB 41608 total pagecache pages 0 pages in swap cache Free swap = 8388604kB Total swap = 8388604kB 524288 pages RAM 327680 pages HighMem/MovableOnly 49231 pages reserved vmap allocation for size 8192 failed: use vmalloc=<size> to increase size vmap allocation for size 8192 failed: use vmalloc=<size> to increase size vmap allocation for size 147456 failed: use vmalloc=<size> to increase size vmap allocation for size 8192 failed: use vmalloc=<size> to increase size vmap allocation for size 4194304 failed: use vmalloc=<size> to increase size BUG: Unable to handle kernel data access on read at 0xe0000000 Faulting instruction address: 0xbfa3dc58 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 DEBUG_PAGEALLOC PowerMac Modules linked in: test_vmalloc(+) af_packet ipv6 b43legacy mac80211 libarc4 cfg80211 snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa rfkill therm_windtunnel firewire_ohci firewire_core crc_itu_t snd_aoa_i2sbus snd_aoa_soundbus snd_pcm snd_timer snd soundcore ssb mac_hid pcmcia pcmcia_core radeon hwmon drm_suballoc_helper i2c_algo_bit drm_ttm_helper ttm drm_display_helper backlight ohci_pci CPU: 1 PID: 1792 Comm: vmalloc_test/0 Tainted: G B 6.8.0-rc6-PMacG4 #12 Hardware name: PowerMac3,6 7455 0x80010303 PowerMac NIP: bfa3dc58 LR: bfa3dc48 CTR: 00000000 REGS: f3bdbdd0 TRAP: 0300 Tainted: G B (6.8.0-rc6-PMacG4) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 44000004 XER: 00000000 DAR: e0000000 DSISR: 40000000 GPR00: 00000000 f3bdbe90 c3fe1820 00000000 00000000 00000000 00000000 00000000 GPR08: 00000000 e0000000 00000000 f3bdbdf0 00000000 00000000 18baab26 00000000 GPR16: c5d55930 f227d040 00000000 0000001e 00000007 f2276260 f22762b4 000000a0 GPR24: c5d55928 00000061 f2280000 00000000 00000000 00000001 c4832ea8 00000001 NIP [bfa3dc58] vm_map_ram_test+0xdc/0x194 [test_vmalloc] LR [bfa3dc48] vm_map_ram_test+0xcc/0x194 [test_vmalloc] Call Trace: [f3bdbe90] [bfa3dc48] vm_map_ram_test+0xcc/0x194 [test_vmalloc] (unreliable) [f3bdbec0] [bfa3e254] test_func+0x490/0x568 [test_vmalloc] [f3bdbfc0] [c00ff1fc] kthread+0x384/0x394 [f3bdbff0] [c002a304] start_kernel_thread+0x10/0x14 Code: 48a7bc49 307fffff 7c63f910 4bffff7c 38a0ffff 7fa4eb78 7fc3f378 48a608c9 5469e8fe 5468077e 7c7c1b78 3d29e000 <89290000> 7d2a0774 5529cffe 7d0a4010 ---[ end trace 0000000000000000 ]--- note: vmalloc_test/0[1792] exited with irqs disabled [...] So this may be a different issue as the memory corruption or just the test gone wrong? Full dmesg + kernel .config attached. The KASAN hit in the dmesg earlier 'BUG: KASAN: vmalloc-out-of-bounds in memset32' Christophe already reported as https://lore.kernel.org/all/2000a30f-214a-4b20-b0b5-348e987d6a0e@xxxxxxxxxx/ Regards, Erhard
Attachment:
config_68-rc6_g4--
Description: Binary data
Attachment:
dmesg_68-rc6_g4--
Description: Binary data