Hi, > On Mon, Apr 12, 2021 at 3:07 AM Wang Yugui <wangyugui@xxxxxxxxxxxx> wrote: > > > > Hi, > > > > kernel BUG at mm/huge_memory.c:2736(linux 5.10.29) is triggered > > by some files write test. > > > > mm/huge_memory.c: > > if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { > > pr_alert("total_mapcount: %u, page_count(): %u\n", > > mapcount, count); > > if (PageTail(page)) > > dump_page(head, NULL); > > dump_page(page, "total_mapcount(head) > 0"); > > L2736: BUG(); > > } > > We just can tell the mapcount of the page is not zero from the current > log, it might mean the unmap_page() call is failed. It seems you have > CONFIG_DEBUG_VM enabled, could you please paste more log? There is > "VM_BUG_ON_PAGE(!unmap_success, page)" in unmap_page(). It should be > able to tell us if unmap_page() is failed or not, or something else > happened. This is the full dmesg output [63080.331513] huge_memory: total_mapcount: 511, page_count(): 512 [63080.332167] page:00000000d2e1a982 refcount:512 mapcount:0 mapping:0000000000000000 index:0x7fe260582 pfn:0x676a00 [63080.332167] head:00000000d2e1a982 order:9 compound_mapcount:0 compound_pincount:0 [63080.332167] anon flags: 0x17ffffc009001d(locked|uptodate|dirty|lru|head|swapbacked) [63080.332167] raw: 0017ffffc009001d ffffc93cda0d0008 ffffc93cd9ab0008 ffff8f21be9f0cb9 [63080.332167] raw: 00000007fe260582 0000000000000000 00000200ffffffff ffff8f1021810000 [63080.332167] page->mem_cgroup:ffff8f1021810000 [63080.332167] page:00000000bc78ac24 refcount:512 mapcount:1 mapping:0000000000000000 index:0x7fe260584 pfn:0x676a02 [63080.332167] head:00000000d2e1a982 order:9 compound_mapcount:0 compound_pincount:0 [63080.332167] anon flags: 0x17ffffc009001d(locked|uptodate|dirty|lru|head|swapbacked) [63080.332167] raw: 0017ffffc0000000 ffffc93cd9da8001 dead000000000000 ffffc93d428d0098 [63080.332167] raw: ffffa002cd183bf0 0000000000000000 0000000000000000 0000000000000000 [63080.332167] head: 0017ffffc009001d ffffc93cda0d0008 ffffc93cd9ab0008 ffff8f21be9f0cb9 [63080.332167] head: 00000007fe260582 0000000000000000 00000200ffffffff ffff8f1021810000 [63080.332167] page dumped because: total_mapcount(head) > 0 [63080.332167] ------------[ cut here ]------------ [63080.332167] kernel BUG at mm/huge_memory.c:2736! [63080.332167] invalid opcode: 0000 [#1] SMP NOPTI [63080.332167] CPU: 8 PID: 376 Comm: kswapd0 Tainted: G S 5.10.31-1.el7.x86_64 #1 [63080.332167] Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019 [63080.332167] RIP: 0010:split_huge_page_to_list.cold.86+0x19/0x1b [63080.332167] Code: 3a bc e8 8f 86 ff ff b8 f4 ff ff ff e9 43 7f 83 ff 31 f6 4c 89 e7 e8 bd dc 7d ff 48 c7 c6 4f f1 3a bc 48 89 ef e8 ae dc 7d ff <0f> 0b 48 8b 34 24 4c 89 e2 48 c7 c7 28 f5 3a bc e8 57 86 ff ff 31 [63080.332167] RSP: 0018:ffffa002cd183b10 EFLAGS: 00010086 [63080.332167] RAX: 0000000000000000 RBX: ffff8f1021810ae0 RCX: 0000000000000027 [63080.332167] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8f2eefa18a88 [63080.332167] RBP: ffffc93cd9da8080 R08: 0000000000000000 R09: c0000000ffffbfff [63080.332167] R10: 0000000000000001 R11: ffffa002cd1837e8 R12: ffffc93cd9da8000 [63080.332167] R13: 0000000000000000 R14: ffff8f21be9f0cb8 R15: 00000000000001ff [63080.332167] FS: 0000000000000000(0000) GS:ffff8f2eefa00000(0000) knlGS:0000000000000000 [63080.332167] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [63080.332167] CR2: 00007f8e24fabd20 CR3: 00000007eaa10005 CR4: 00000000001706e0 [63080.332167] Call Trace: [63080.332167] ? irq_exit_rcu+0x4f/0xe0 [63080.332167] ? asm_sysvec_apic_timer_interrupt+0x12/0x20 [63080.332167] deferred_split_scan+0x1ca/0x320 [63080.332167] do_shrink_slab+0x11f/0x250 [63080.332167] shrink_slab+0x20f/0x2c0 [63080.332167] shrink_node+0x24b/0x6d0 [63080.332167] balance_pgdat+0x2db/0x550 [63080.332167] kswapd+0x201/0x390 [63080.332167] ? finish_wait+0x80/0x80 [63080.332167] ? balance_pgdat+0x550/0x550 [63080.332167] kthread+0x116/0x130 [63080.332167] ? kthread_park+0x80/0x80 [63080.332167] ret_from_fork+0x1f/0x30 [63080.332167] Modules linked in: binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rfkill rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation intel_rapl_msr intel_rapl_common snd_soc_core snd_compress snd_pcm_dmaengine soundwire_cadence snd_hda_codec sb_edac x86_pkg_temp_thermal snd_hda_core intel_powerclamp coretemp ac97_bus kvm_intel snd_hwdep snd_seq iTCO_wdt snd_seq_device dcdbas intel_pmc_bxt mei_wdt mei_hdcp iTCO_vendor_support snd_pcm dell_smm_hwmon kvm irqbypass snd_timer rapl mei_me intel_cstate snd i2c_i801 intel_uncore i2c_smbus lpc_ich mei soundcore nvme_rdma nvme_fabrics rdma_cm iw_cm ib_cm rdmavt rdma_rxe nfsd ib_uverbs ip6_udp_tunnel udp_tunnel ib_core auth_rpcgss nfs_acl lockd grace nfs_ssc ip_tables xfs r adeon [63080.332167] i2c_algo_bit ttm drm_kms_helper cec bnx2x crct10dif_pclmul nvme crc32_pclmul drm crc32c_intel mpt3sas ghash_clmulni_intel e1000e pcspkr mdio nvme_core raid_class scsi_transport_sas wmi dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua btrfs xor raid6_pq sunrpc i2c_dev [63080.332167] ---[ end trace 35ee9d9fcf3c4757 ]--- [63080.332167] RIP: 0010:split_huge_page_to_list.cold.86+0x19/0x1b [63080.332167] Code: 3a bc e8 8f 86 ff ff b8 f4 ff ff ff e9 43 7f 83 ff 31 f6 4c 89 e7 e8 bd dc 7d ff 48 c7 c6 4f f1 3a bc 48 89 ef e8 ae dc 7d ff <0f> 0b 48 8b 34 24 4c 89 e2 48 c7 c7 28 f5 3a bc e8 57 86 ff ff 31 [63080.332167] RSP: 0018:ffffa002cd183b10 EFLAGS: 00010086 [63080.332167] RAX: 0000000000000000 RBX: ffff8f1021810ae0 RCX: 0000000000000027 [63080.332167] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8f2eefa18a88 [63080.332167] RBP: ffffc93cd9da8080 R08: 0000000000000000 R09: c0000000ffffbfff [63080.332167] R10: 0000000000000001 R11: ffffa002cd1837e8 R12: ffffc93cd9da8000 [63080.332167] R13: 0000000000000000 R14: ffff8f21be9f0cb8 R15: 00000000000001ff [63080.332167] FS: 0000000000000000(0000) GS:ffff8f2eefa00000(0000) knlGS:0000000000000000 [63080.332167] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [63080.332167] CR2: 00007f8e24fabd20 CR3: 00000007eaa10005 CR4: 00000000001706e0 [63080.332167] Kernel panic - not syncing: Fatal exception [63080.332167] Shutting down cpus with NMI [63080.332167] Kernel Offset: 0x3a000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [63080.332167] ---[ end Kernel panic - not syncing: Fatal exception ]--- Best Regards Wang Yugui (wangyugui@xxxxxxxxxxxx) 2021/04/17