Hi, I notice a regression report on bugzilla [1]. Quoting from it: > Hello. > > (I apologize if I chose the wrong "Product" and "Component".) > > On two of my systems, I see strange "bug" when running 6+ kernels (below is a recent one): > > ``` > May 14 14:48:07 smoon7.bkoty.ru kernel: RIP: 0010:__filemap_get_folio+0xbf/0x6a0 > May 14 14:48:07 smoon7.bkoty.ru kernel: Code: ef e8 c5 60 c3 00 48 89 c7 48 3d 02 04 00 00 74 e4 48 3d 06 04 00 00 74 dc 48 85 c0 0f 84 6a 04 00 00 a8 01 0f 85 6c 04 00 00 <8b> 40 34 85 c0 74 c4 8d 50 01 4c 8d 47 34 f0 0f b1 57 34 75 ee 48 > May 14 14:48:07 smoon7.bkoty.ru kernel: RSP: 0000:ffffa7800b1dfbf8 EFLAGS: 00010246 > May 14 14:48:07 smoon7.bkoty.ru kernel: RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000004 > May 14 14:48:07 smoon7.bkoty.ru kernel: RDX: ffffa7800b1dfc50 RSI: ffff9a2413646910 RDI: 0000000000000002 > May 14 14:48:07 smoon7.bkoty.ru kernel: RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 00007f862b600000 > May 14 14:48:07 smoon7.bkoty.ru kernel: R10: 00007f8659246f48 R11: ffff9a21c1494a0c R12: 000000000002dc46 > May 14 14:48:07 smoon7.bkoty.ru kernel: R13: ffffa7800b1dfc50 R14: ffff9a21e2cb82b0 R15: 00007f8659246f48 > May 14 14:48:07 smoon7.bkoty.ru kernel: FS: 00007f87fcff96c0(0000) GS:ffff9a295e280000(0000) knlGS:0000000000000000 > May 14 14:48:07 smoon7.bkoty.ru kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > May 14 14:48:07 smoon7.bkoty.ru kernel: CR2: 0000000000000036 CR3: 0000000105b2c003 CR4: 00000000003706e0 > May 14 14:48:07 smoon7.bkoty.ru kernel: Call Trace: > May 14 14:48:07 smoon7.bkoty.ru kernel: <TASK> > May 14 14:48:07 smoon7.bkoty.ru kernel: ? psi_group_change+0x274/0x430 > May 14 14:48:07 smoon7.bkoty.ru kernel: filemap_fault+0x6f/0xfd0 > May 14 14:48:07 smoon7.bkoty.ru kernel: ? filemap_map_pages+0x15f/0x640 > May 14 14:48:07 smoon7.bkoty.ru kernel: __do_fault+0x30/0x130 > May 14 14:48:07 smoon7.bkoty.ru kernel: do_fault+0x1d7/0x400 > May 14 14:48:07 smoon7.bkoty.ru kernel: handle_mm_fault+0xb48/0x1450 > May 14 14:48:07 smoon7.bkoty.ru kernel: do_user_addr_fault+0x1c7/0x740 > May 14 14:48:07 smoon7.bkoty.ru kernel: exc_page_fault+0x7c/0x180 > May 14 14:48:07 smoon7.bkoty.ru kernel: asm_exc_page_fault+0x26/0x30 > May 14 14:48:07 smoon7.bkoty.ru kernel: RIP: 0033:0x7f881a56cb0d > May 14 14:48:07 smoon7.bkoty.ru kernel: Code: 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 48 89 f8 48 83 fa 20 72 23 <c5> fe 6f 06 48 83 fa 40 0f 87 a5 00 00 00 c5 fe 6f 4c 16 e0 c5 fe > May 14 14:48:07 smoon7.bkoty.ru kernel: RSP: 002b:00007f87fcff72c8 EFLAGS: 00010202 > May 14 14:48:07 smoon7.bkoty.ru kernel: RAX: 00007f87dc02a700 RBX: 00007f87fcff8308 RCX: 00007f87fcff7500 > May 14 14:48:07 smoon7.bkoty.ru kernel: RDX: 0000000000004000 RSI: 00007f8659246f48 RDI: 00007f87dc02a700 > May 14 14:48:07 smoon7.bkoty.ru kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > May 14 14:48:07 smoon7.bkoty.ru kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000 > May 14 14:48:07 smoon7.bkoty.ru kernel: R13: 00007f87dc001370 R14: 0000000000000009 R15: 00005645d0719a70 > May 14 14:48:07 smoon7.bkoty.ru kernel: </TASK> > ``` > > I've seen these errors since the very first kernel of the 6 series, while I see no problem with 5.15 on the same hardware. > > These two systems have the same CPU (Intel(R) Core(TM) i5-10500 CPU @ 3.10GHz) but slightly different motherboards, same amount of memory (same manufacturer, I tested it when plugged in). > > The hosts in question don't show this "bug" immediately, but after some time while having "heavy" disk load (torrents). The "bug" shows up whether I use `mitigations=off` or not (at first I thought the "bug" might be related to `mitigations=off`, but I got the above output when I removed that setting from the kernel command line). > > What puzzles me is that I don't see these errors on the other hosts (but they don't have "heavy" disk loads), they work just fine. On the other hand, they have different CPUs (not i5-10500). Sometimes (less often than this error) I saw the following in the kernel log (dmesg): > > ``` > May 14 08:09:09 smoon7.bkoty.ru kernel: mce: [Hardware Error]: Machine check events logged > May 14 08:09:09 smoon7.bkoty.ru kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 0: 9000004000010005 > May 14 08:09:09 smoon7.bkoty.ru kernel: mce: [Hardware Error]: TSC 95596a63008b > May 14 08:09:09 smoon7.bkoty.ru kernel: mce: [Hardware Error]: PROCESSOR 0:a0653 TIME 1684022949 SOCKET 0 APIC 0 microcode f6 > May 14 08:11:39 smoon7.bkoty.ru kernel: mce: [Hardware Error]: Machine check events logged > May 14 08:11:39 smoon7.bkoty.ru kernel: mce: [Hardware Error]: CPU 5: Machine Check: 0 Bank 0: 9000004000010005 > May 14 08:11:39 smoon7.bkoty.ru kernel: mce: [Hardware Error]: TSC 95c56b82abf0 > May 14 08:11:39 smoon7.bkoty.ru kernel: mce: [Hardware Error]: PROCESSOR 0:a0653 TIME 1684023099 SOCKET 0 APIC a microcode f6 > ``` > > So now I'm thinking of buying a new CPU (same socket) and see if I will see the same error. For the full thread, see bugzilla. FYI, filemap_get_folio() is introduced in 3f0c6a07fee6a1 ("mm/filemap: Add filemap_get_folio"). Anyway, I'm adding this to regzbot: #regzbot introduced: v5.15..v6.0 https://bugzilla.kernel.org/show_bug.cgi?id=217441 #regzbot title: NULL pointer dereference on filemap_get_folio() on Intel Core i5-10500 Thanks. [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217441 -- An old man doll... just what I always wanted! - Clara