On Sat, May 28, 2022 at 06:41:55PM +0200, Anders Roxell wrote: > On Tue, 17 May 2022 at 16:33, Anders Roxell <anders.roxell@xxxxxxxxxx> wrote: > > > > On Tue, 17 May 2022 at 16:02, Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> wrote: > > > > > > On Tue, May 17, 2022 at 9:54 AM Anders Roxell <anders.roxell@xxxxxxxxxx> wrote: > > > > > > > > On 2022-05-07 11:01, Tong Tiangen wrote: > > > > > From: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> > > > > > > > > > > As commit d283d422c6c4 ("x86: mm: add x86_64 support for page table check") > > > > > , enable ARCH_SUPPORTS_PAGE_TABLE_CHECK on arm64. > > > > > > > > > > Add additional page table check stubs for page table helpers, these stubs > > > > > can be used to check the existing page table entries. > > > > > > > > > > Signed-off-by: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> > > > > > Signed-off-by: Tong Tiangen <tongtiangen@xxxxxxxxxx> > > > > > Reviewed-by: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> > > > > > > > > When building and booting an arm64 allmodconfig kernel on the next tree, branch next-20220516, > > > > see the following kernel oops when booting in QEMU [1]: > > > > > > > > T35] ------------[ cut here ]------------ > > > > [ 578.695796][ T35] kernel BUG at mm/page_table_check.c:82! That seems to be: BUG_ON(atomic_dec_return(&ptc->file_map_count) < 0); > > > > [ 578.697292][ T35] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > > > > [ 578.704318][ T35] Modules linked in: > > > > [ 578.705907][ T35] CPU: 0 PID: 35 Comm: khugepaged Tainted: G T 5.18.0-rc6-next-20220513 #1 893498a5d8159d9fb26e12492a93c07e83dd4b7f > > > > [ 578.711170][ T35] Hardware name: linux,dummy-virt (DT) > > > > [ 578.713315][ T35] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > > > [ 578.716398][ T35] pc : page_table_check_clear.constprop.0+0x1f4/0x280 > > > > [ 578.719107][ T35] lr : page_table_check_clear.constprop.0+0x1cc/0x280 > > > > [ 578.721781][ T35] sp : ffff80000f3778b0 > > > > [ 578.723446][ T35] x29: ffff80000f3778b0 x28: ffff80000b891218 x27: ffff000012dd55f0 > > > > [ 578.726667][ T35] x26: 0000000000000008 x25: ffff80000c38cd80 x24: 0000000000000000 > > > > [ 578.729870][ T35] x23: ffff80000c38c9c0 x22: 0000000000000000 x21: 0000000000000200 > > > > [ 578.733079][ T35] x20: ffff000007bae000 x19: ffff000007bae008 x18: 0000000000000000 > > > > [ 578.736299][ T35] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 > > > > [ 578.739505][ T35] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 > > > > [ 578.742735][ T35] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 > > > > [ 578.745925][ T35] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 > > > > [ 578.749145][ T35] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff000007bae00c > > > > [ 578.752348][ T35] x2 : 0000000000000000 x1 : 0000000000000001 x0 : 00000000ffffffff > > > > [ 578.755556][ T35] Call trace: > > > > [ 578.756877][ T35] page_table_check_clear.constprop.0+0x1f4/0x280 > > > > [ 578.759446][ T35] __page_table_check_pmd_clear+0xc4/0x140 > > > > [ 578.761757][ T35] pmdp_collapse_flush+0xa4/0x1c0 > > > > [ 578.763771][ T35] collapse_huge_page+0x4e4/0xb00 > > > > [ 578.765778][ T35] khugepaged_scan_pmd+0xc18/0xd00 > > > > [ 578.767840][ T35] khugepaged_scan_mm_slot+0x580/0x780 > > > > [ 578.770018][ T35] khugepaged+0x2dc/0x400 > > > > [ 578.771786][ T35] kthread+0x164/0x180 > > > > [ 578.773430][ T35] ret_from_fork+0x10/0x20 > > > > [ 578.775253][ T35] Code: 52800021 91001263 14000388 36f80040 (d4210000) > > > > [ 578.777990][ T35] ---[ end trace 0000000000000000 ]--- > > > > [ 578.778021][ T35] Kernel panic - not syncing: Oops - BUG: Fatal exception > > > > [ 578.782934][ T35] Kernel Offset: disabled > > > > [ 578.784642][ T35] CPU features: 0x000,00100010,00001086 > > > > [ 578.786848][ T35] Memory Limit: none > > > > [ 578.788433][ T35] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]--- > > Now I see this oops on the mainline kernel too when I'm building and booting an > arm64 allmodconfig kernel, sha > 9d004b2f4fea ("Merge tag 'cxl-for-5.19' of > git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl"). > > building and booting an arm64 allmodconfig kernel. > > When I revert 42b2547137f5 ("arm64/mm: enable > ARCH_SUPPORTS_PAGE_TABLE_CHECK") I'm able to boot. > The kernel boots fine. I don't think disabling the check is the right thing to do, and I'm not really seeing anything arm64-specific from the information here either. It's more likely that one of the many other options (or combination of options) enabled in allmodconfig is causing the problem. Are you able to reproduce on x86? Anshuman -- any ideas? Will