Re: BUG: Bad page map in process init pte:c0ab684c pmd:01182000 (on a PowerMac G4 DP)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 20 Jun 2024 00:42:37 +0200, Erhard Furtner wrote:

>> Le 29/02/2024 à 02:09, Erhard Furtner a écrit :
>> > 
>> > Revisited the issue on kernel v6.8-rc6 and I can still reproduce it.
>> > 
>> > Short summary as my last post was over a year ago:
>> >   (x) I get this memory corruption only when CONFIG_VMAP_STACK=y and CONFIG_SMP=y is enabled.
>> >   (x) I don't get this memory corruption when only one of the above is enabled. ^^
>> >   (x) memtester says the 2 GiB RAM in my G4 DP are fine.
>> >   (x) I don't get this issue on my G5 11,2 or Talos II.
>> >   (x) "stress -m 2 --vm-bytes 965M" provokes the issue in < 10 secs. (https://salsa.debian.org/debian/stress)
>> > 
> The "pagealloc: memory corruption" remains however as of kernel v6.10-rc4.

I've reproduced the bug on similar hardware, also a dual-processor Power
Mac G4 with 2 GiB RAM.

With the 6.6.30 kernel without extra debugging options, the system was
stable and could e.g. compile GCC or the kernel without an issue. That
doesn't mean there wasn't silent corruption going on, of course. :-)
Running the `stress` program as listed above did, however, cause the
system to get into an unstable state where heavier workloads, such as
compiling the kernel, would randomly fail.

I updated the kernel to 6.10.3, enabled SLUB_DEBUG, PAGE_POISONING and
DEBUG_PAGEALLOC and turned them on at boot-time with slub_debug=FZ
page_poison=on debug_pagealloc=on.

The updated kernel exhibits the same symptoms as described by Erhard,
running `stress -m 2 --vm-bytes 965M` almost immediately causes a memory
corruption with the following messages in dmesg:

```
pagealloc: memory corruption
fffcfff0: 00 00 00 00                                      ....
CPU: 1 PID: 1845 Comm: stress Tainted: G                T  6.10.3-gentoo #1
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
Call Trace:
[f2d05ca0] [c08ff18c] dump_stack_lvl+0x60/0xbc (unreliable)
[f2d05cc0] [c01db7e0] __kernel_unpoison_pages+0x128/0x1f0
[f2d05d10] [c01bc6c4] get_page_from_freelist+0xeb0/0xf6c
[f2d05db0] [c01bcf7c] __alloc_pages_noprof+0x160/0xdf0
[f2d05e70] [c01be388] __folio_alloc_noprof+0x14/0x44
[f2d05e80] [c0199690] handle_mm_fault+0x99c/0xdac
[f2d05f00] [c00218c8] do_page_fault+0x264/0x73c
[f2d05f30] [c000433c] DataAccess_virt+0x124/0x17c
--- interrupt: 300 at 0x7c2db0
NIP:  007c2db0 LR: 007c2d90 CTR: 00000000
REGS: f2d05f40 TRAP: 0300   Tainted: G                T   (6.10.3-gentoo)
MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 20882004  XER: 00000000
DAR: 8fe18020 DSISR: 42000000
GPR00: 007c2d90 afb6a160 a7a00100 6b416020 ffffffa0 00000000 a7916ffc 00000000
GPR08: 24a03000 24a02000 00000000 404347fa 404344c7 00000000 00000000 0000005a
GPR16: 6b416020 00000002 00000000 00000000 ffffffff 00000000 40882002 007e0004
GPR24: 00000001 ffffffff ffffffff 3c500000 00000000 66b7cd68 007e7cf8 00001000
NIP [007c2db0] 0x7c2db0
LR [007c2d90] 0x7c2d90
--- interrupt: 300
page: refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0x31069
flags: 0x80000000(zone=2)
raw: 80000000 00000100 00000122 00000000 00000000 00000000 ffffffff 00000001
page dumped because: pagealloc: corrupted page details
```

Other activity can also trigger it, compilation of larger programs with
`make -j2` does it within an hour, typically resulting in an ICE.

When booted with the `maxcpus=0` kernel parameter, the corruptions do
not occur.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux