Am 19.11.21 um 14:13 schrieb Matthew Wilcox:
On Fri, Nov 19, 2021 at 08:48:11AM +0100, Uwe Sauter wrote:
[ 1132.645038] BUG: unable to handle page fault for address: 0000000000400000
[ 1132.645045] #PF: supervisor instruction fetch in kernel mode
[ 1132.645047] #PF: error_code(0x0010) - not-present page
[ 1132.645050] PGD 0 P4D 0
[ 1132.645053] Oops: 0010 [#1] PREEMPT SMP PTI
[ 1132.645057] CPU: 7 PID: 429941 Comm: rsync Tainted: P OE
5.15.2-arch1-1 #1 e3bfbeb633edc604ba956e06f24d5659e31c294f
[ 1132.645061] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C226 WS, BIOS P3.40 06/25/2018
[ 1132.645063] RIP: 0010:0x400000
Your computer was trying to execute instructions at 0x400000. This
smells very much like a single bit flip; ie there was a function
pointer which should have been NULL, but actually had one bit flip
and so the CPU jumped to somewhere that doesn't have any memory
backing it.
Can you run memtest86, or whatever the current flavour of memory testing
software is?
As I mentioned in the description the host is equipped with ECC memory. dmesg didn't show any sign of memory error that
I would expect from a bit flip inside RAM.
The hardware is
* ASRock Rack C226 WS mainboard
* Intel Xeon E3-1245 v3
* 4x Kingston 9965525-055.A00LF 8GB ECC memory.
Also the host has been running fine after the bug triggered for 1.5h and today again for 7h.