Hi Sebastian, Thanks a lot for your quick reply! CONFIG_HAVE_PREEMPT_LAZY=y CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT__LL is not set # CONFIG_PREEMPT_RTB is not set # CONFIG_PREEMPT_RT_FULL is not set CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_PREEMPT_RT_FULL is not enabled, neither CONFIG_SLUB. I think it's not related to the issue fixed in f1aca90802af9 ("Revert "slub: delay ctor until the object is requested""). We share the kernel source code but using different configuration on different products. The applications on this product are non-RT applications. This issue was reported on different nodes, so it seems not related to hardware bad RAM. I'm checking whether it's possible for other CPUs in AMP to overwrite the memory. I will consider your suggestion on disabling the memory compacting and enabling the list-debugging. Sincerely appreciate your support! B.R. Yimin Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> 于2022年5月12日周四 00:18写道: > > On 2022-05-09 15:40:43 [+0800], Yimin Deng wrote: > > Hi > Hi, > > > I encountered an oops in isolate_pcp_pages() and a bad page in > > get_page_from_freelist(). > > > > linux: 3.12.37-rt51 (CONFIG_PREEMPT_RT_BASE not enabled) > > arch: PowperPC (e500) > … > What you mean by CONFIG_PREEMPT_RT_BASE is not enabled? Is > CONFIG_PREEMPT_RT_FULL enabled or none of those options? > > > Any suggestions will be appreciated! > > > > [18857088.953420] Unable to handle kernel paging request for data at > > address 0x00100104 > > [18857089.046143] Faulting instruction address: 0xc0075624 > … > > [18857090.073578] NIP [c0075624] isolate_pcp_pages+0x84/0xc4 > > [18857090.138173] LR [c0078f24] free_hot_cold_page+0x124/0x174 > … > > I can't even tell if I saw a report as yours earlier or not. I do > remember that I saw the "bad page state" reports earlier but I don't > remember how they went away. I know that I had two 8572DS systems and > one started to report all kind different errors (including "bad page > state") but this was due to bad RAM (probably) since the other system > never had this error despite that they had the same configuration. > > Your kernel is kind of old. The latest v3.12 is v3.12.74-rt99 which > contains a few bug fixes including commit > f1aca90802af9 ("Revert "slub: delay ctor until the object is requested"") > > which is probably not what you see but a possible crash. > You could disable memory compacting and so on but as far as I remember > they could lead higher latencies in some cases, not to a crash. > You could enable list-debugging in case an entry is added/removed > multiple times. > The e500 support is quite good upstream so you could upgrade to a later > kernel (one of the current LTS kernels). > > > B.R. > > Yimin > > Sebastian