https://bugzilla.kernel.org/show_bug.cgi?id=212123 Bug ID: 212123 Summary: kernel BUG at mm/filemap.c:1499, invalid opcode: 0000 Product: File System Version: 2.5 Kernel Version: 5.11.1-2.g39e04be Hardware: Intel OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: XFS Assignee: filesystem_xfs@xxxxxxxxxxxxxxxxxxxxxx Reporter: tm@xxxxxxxxxxx Regression: No Unfortunately I didn't have the chance yet to test this with something else than XFS, but we see this across several hosts with different Intel SSDs (SATA and NVMe, gets triggered faster on NVMe) and AMD CPUs, different mount options (discard or periodic trim) and at least since Kernel 5.7. This makes the corresponding user space process hang and finally a hard reset of the machine is needed. What seems to trigger the issue more rapidly (besides a faster drive) is the fill state (usually >90%) of the filesystem and/or the load on the machine (those are HPC-esque nodes). The following stacktrace also contains an error "Bad page state in process", this is not always the case. 2021-02-26T11:49:29.803049+01:00 tcopt11 kernel: [156009.300579] ------------[ cut here ]------------ 2021-02-26T11:49:29.803063+01:00 tcopt11 kernel: [156009.300584] kernel BUG at mm/filemap.c:1499! 2021-02-26T11:49:29.803065+01:00 tcopt11 kernel: [156009.300591] invalid opcode: 0000 [#1] SMP NOPTI 2021-02-26T11:49:29.803066+01:00 tcopt11 kernel: [156009.305213] CPU: 51 PID: 832 Comm: kworker/51:1 Not tainted 5.11.1-2.g39e04be-default #1 openSUSE Tumbleweed (unreleased) 2021-02-26T11:49:29.803067+01:00 tcopt11 kernel: [156009.316246] Hardware name: GIGABYTE R181-Z91-00/MZ91-FS0-00, BIOS F23a 08/14/2020 2021-02-26T11:49:29.803069+01:00 tcopt11 kernel: [156009.323812] Workqueue: xfs-conv/nvme0n1p1 xfs_end_io [xfs] 2021-02-26T11:49:29.803069+01:00 tcopt11 kernel: [156009.329486] RIP: 0010:end_page_writeback+0xbb/0xc0 2021-02-26T11:49:29.803070+01:00 tcopt11 kernel: [156009.334365] Code: 34 83 e0 07 83 f8 04 75 e8 48 8b 45 08 8b 40 68 83 e8 01 83 f8 01 77 d9 48 89 ef 5d e9 fe e4 00 00 48 89 ef 5d e9 45 e4 00 00 <0f> 0b 0f 1f 00 0f 1f 44 00 00 41 54 55 48 89 fd 53 89 d3 40 84 f6 2021-02-26T11:49:29.803071+01:00 tcopt11 kernel: [156009.353203] RSP: 0018:ffffbf67cf38fd80 EFLAGS: 00010246 2021-02-26T11:49:29.803073+01:00 tcopt11 kernel: [156009.358519] RAX: 0000000000000000 RBX: 000000000000001d RCX: 0000000000000005 2021-02-26T11:49:29.803074+01:00 tcopt11 kernel: [156009.365739] RDX: 0000000000000000 RSI: ffff9b3893433130 RDI: 0000000000000000 2021-02-26T11:49:29.803074+01:00 tcopt11 kernel: [156009.372959] RBP: fffff9a5643ea1c0 R08: 0000000000006000 R09: 0000000000000001 2021-02-26T11:49:29.803075+01:00 tcopt11 kernel: [156009.380177] R10: ffff9b3e2ff6a000 R11: 0000000000000020 R12: 0000000000000000 2021-02-26T11:49:29.803095+01:00 tcopt11 kernel: [156009.387398] R13: 0000000000008000 R14: ffff9b0f0294b240 R15: fffff9a5643ea1c0 2021-02-26T11:49:29.803096+01:00 tcopt11 kernel: [156009.394617] FS: 0000000000000000(0000) GS:ffff9b160f040000(0000) knlGS:0000000000000000 2021-02-26T11:49:29.803098+01:00 tcopt11 kernel: [156009.402789] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2021-02-26T11:49:29.803099+01:00 tcopt11 kernel: [156009.408622] CR2: 0000558ea6c00870 CR3: 000000183b5b6000 CR4: 00000000003506e0 2021-02-26T11:49:29.803099+01:00 tcopt11 kernel: [156009.415841] Call Trace: 2021-02-26T11:49:29.803100+01:00 tcopt11 kernel: [156009.418383] iomap_finish_ioend+0x127/0x250 2021-02-26T11:49:29.803101+01:00 tcopt11 kernel: [156009.422654] iomap_finish_ioends+0x41/0x90 2021-02-26T11:49:29.803101+01:00 tcopt11 kernel: [156009.426842] xfs_end_ioend+0x6d/0x110 [xfs] 2021-02-26T11:49:29.803102+01:00 tcopt11 kernel: [156009.431193] xfs_end_io+0xac/0xe0 [xfs] 2021-02-26T11:49:29.803103+01:00 tcopt11 kernel: [156009.435195] process_one_work+0x1df/0x370 2021-02-26T11:49:29.803104+01:00 tcopt11 kernel: [156009.439293] worker_thread+0x50/0x400 2021-02-26T11:49:29.803105+01:00 tcopt11 kernel: [156009.443047] ? process_one_work+0x370/0x370 2021-02-26T11:49:29.803106+01:00 tcopt11 kernel: [156009.447320] kthread+0x11b/0x140 2021-02-26T11:49:29.803107+01:00 tcopt11 kernel: [156009.450638] ? __kthread_bind_mask+0x60/0x60 2021-02-26T11:49:29.803107+01:00 tcopt11 kernel: [156009.454999] ret_from_fork+0x22/0x30 2021-02-26T11:49:29.803108+01:00 tcopt11 kernel: [156009.458665] Modules linked in: joydev st sr_mod cdrom rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc nfs_ssc fscache fuse af_packet xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ebtable_filter ebtables iscsi_ibft iscsi_boot_sysfs ip6table_filter ip6_tables rfkill iptable_filter ip_tables x_tables bpfilter dmi_sysfs msr ipmi_ssif i40iw ib_uverbs intel_rapl_msr ib_core intel_rapl_common amd64_edac_mod edac_mce_amd xfs nls_iso8859_1 nls_cp437 kvm_amd vfat fat kvm nvme irqbypass acpi_ipmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel igb ipmi_si crypto_simd sp5100_tco cryptd glue_helper i40e pcspkr efi_pstore nvme_core ipmi_devintf k10temp i2c_piix4 dca ccp ipmi_msghandler tiny_power_button button 2021-02-26T11:49:29.803109+01:00 tcopt11 kernel: [156009.458745] acpi_cpufreq btrfs blake2b_generic libcrc32c xor ast i2c_algo_bit drm_vram_helper raid6_pq drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm_ttm_helper xhci_pci ttm xhci_pci_renesas crc32c_intel xhci_hcd drm usbcore sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs 2021-02-26T11:49:29.803116+01:00 tcopt11 kernel: [156009.574325] ---[ end trace bd7bd5d623d48cd8 ]--- 2021-02-26T11:49:29.944226+01:00 tcopt11 kernel: [156009.710597] RIP: 0010:end_page_writeback+0xbb/0xc0 2021-02-26T11:49:29.944229+01:00 tcopt11 kernel: [156009.710604] Code: 34 83 e0 07 83 f8 04 75 e8 48 8b 45 08 8b 40 68 83 e8 01 83 f8 01 77 d9 48 89 ef 5d e9 fe e4 00 00 48 89 ef 5d e9 45 e4 00 00 <0f> 0b 0f 1f 00 0f 1f 44 00 00 41 54 55 48 89 fd 53 89 d3 40 84 f6 2021-02-26T11:49:29.968386+01:00 tcopt11 kernel: [156009.734336] RSP: 0018:ffffbf67cf38fd80 EFLAGS: 00010246 2021-02-26T11:49:29.968388+01:00 tcopt11 kernel: [156009.734340] RAX: 0000000000000000 RBX: 000000000000001d RCX: 0000000000000005 2021-02-26T11:49:29.982842+01:00 tcopt11 kernel: [156009.746884] RDX: 0000000000000000 RSI: ffff9b3893433130 RDI: 0000000000000000 2021-02-26T11:49:29.982845+01:00 tcopt11 kernel: [156009.746888] RBP: fffff9a5643ea1c0 R08: 0000000000006000 R09: 0000000000000001 2021-02-26T11:49:29.982846+01:00 tcopt11 kernel: [156009.746890] R10: ffff9b3e2ff6a000 R11: 0000000000000020 R12: 0000000000000000 2021-02-26T11:49:30.004527+01:00 tcopt11 kernel: [156009.768566] R13: 0000000000008000 R14: ffff9b0f0294b240 R15: fffff9a5643ea1c0 2021-02-26T11:49:30.004528+01:00 tcopt11 kernel: [156009.768569] FS: 0000000000000000(0000) GS:ffff9b160f040000(0000) knlGS:0000000000000000 2021-02-26T11:49:30.013210+01:00 tcopt11 kernel: [156009.783977] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2021-02-26T11:49:30.013212+01:00 tcopt11 kernel: [156009.783980] CR2: 0000558ea6c00870 CR3: 000000183b5b6000 CR4: 00000000003506e0 2021-02-26T11:49:32.035969+01:00 tcopt11 kernel: [156011.521366] BUG: Bad page state in process cp2k.ssmp pfn:290fa87 2021-02-26T11:49:32.035989+01:00 tcopt11 kernel: [156011.527556] page:000000000121bb86 refcount:1 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x290fa87 2021-02-26T11:49:32.035991+01:00 tcopt11 kernel: [156011.537118] flags: 0x2affff800000000() 2021-02-26T11:49:32.035992+01:00 tcopt11 kernel: [156011.540959] raw: 02affff800000000 dead000000000100 dead000000000122 0000000000000000 2021-02-26T11:49:32.035993+01:00 tcopt11 kernel: [156011.548785] raw: 0000000000000001 0000000000000000 00000001ffffffff 0000000000000000 2021-02-26T11:49:32.035995+01:00 tcopt11 kernel: [156011.556610] page dumped because: nonzero _refcount 2021-02-26T11:49:32.035996+01:00 tcopt11 kernel: [156011.561488] Modules linked in: joydev st sr_mod cdrom rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc nfs_ssc fscache fuse af_packet xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ebtable_filter ebtables iscsi_ibft iscsi_boot_sysfs ip6table_filter ip6_tables rfkill iptable_filter ip_tables x_tables bpfilter dmi_sysfs msr ipmi_ssif i40iw ib_uverbs intel_rapl_msr ib_core intel_rapl_common amd64_edac_mod edac_mce_amd xfs nls_iso8859_1 nls_cp437 kvm_amd vfat fat kvm nvme irqbypass acpi_ipmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel igb ipmi_si crypto_simd sp5100_tco cryptd glue_helper i40e pcspkr efi_pstore nvme_core ipmi_devintf k10temp i2c_piix4 dca ccp ipmi_msghandler tiny_power_button button 2021-02-26T11:49:32.036009+01:00 tcopt11 kernel: [156011.561566] acpi_cpufreq btrfs blake2b_generic libcrc32c xor ast i2c_algo_bit drm_vram_helper raid6_pq drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm_ttm_helper xhci_pci ttm xhci_pci_renesas crc32c_intel xhci_hcd drm usbcore sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs 2021-02-26T11:49:32.036013+01:00 tcopt11 kernel: [156011.677099] CPU: 56 PID: 89417 Comm: cp2k.ssmp Tainted: G D 5.11.1-2.g39e04be-default #1 openSUSE Tumbleweed (unreleased) 2021-02-26T11:49:32.036015+01:00 tcopt11 kernel: [156011.689437] Hardware name: GIGABYTE R181-Z91-00/MZ91-FS0-00, BIOS F23a 08/14/2020 2021-02-26T11:49:32.036016+01:00 tcopt11 kernel: [156011.697002] Call Trace: 2021-02-26T11:49:32.036018+01:00 tcopt11 kernel: [156011.699545] dump_stack+0x6b/0x83 2021-02-26T11:49:32.036019+01:00 tcopt11 kernel: [156011.702956] bad_page.cold+0x9b/0xa0 2021-02-26T11:49:32.036020+01:00 tcopt11 kernel: [156011.706623] get_page_from_freelist+0x74b/0x1380 2021-02-26T11:49:32.036021+01:00 tcopt11 kernel: [156011.711331] __alloc_pages_nodemask+0x161/0x310 2021-02-26T11:49:32.036023+01:00 tcopt11 kernel: [156011.715947] alloc_pages_vma+0x80/0x260 2021-02-26T11:49:32.036024+01:00 tcopt11 kernel: [156011.719873] handle_mm_fault+0xef9/0x1960 2021-02-26T11:49:32.036025+01:00 tcopt11 kernel: [156011.723974] ? schedule+0x46/0xb0 2021-02-26T11:49:32.036028+01:00 tcopt11 kernel: [156011.727380] do_user_addr_fault+0x19f/0x480 2021-02-26T11:49:32.036029+01:00 tcopt11 kernel: [156011.731653] exc_page_fault+0x5d/0x120 2021-02-26T11:49:32.036031+01:00 tcopt11 kernel: [156011.735490] ? asm_exc_page_fault+0x8/0x30 2021-02-26T11:49:32.036034+01:00 tcopt11 kernel: [156011.739677] asm_exc_page_fault+0x1e/0x30 2021-02-26T11:49:32.036036+01:00 tcopt11 kernel: [156011.743777] RIP: 0033:0xc9b9d8 2021-02-26T11:49:32.036039+01:00 tcopt11 kernel: [156011.746923] Code: b0 08 20 00 00 49 8d 90 08 22 08 00 4d 8d 24 08 4d 89 f7 48 89 55 a0 49 89 c6 4d 01 ec 0f 1f 44 00 00 4f 8d 8c 3d f8 dd f7 ff <43> c7 84 3c f8 fd f7 ff 01 00 00 00 48 89 d8 49 8d 79 08 49 c7 01 2021-02-26T11:49:32.036041+01:00 tcopt11 kernel: [156011.765756] RSP: 002b:00007ffe41c5df80 EFLAGS: 00010287 2021-02-26T11:49:32.036043+01:00 tcopt11 kernel: [156011.771069] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 2021-02-26T11:49:32.036045+01:00 tcopt11 kernel: [156011.778289] RDX: 000000000a5386b0 RSI: 0000000000000000 RDI: 000000000a53a6c0 2021-02-26T11:49:32.036047+01:00 tcopt11 kernel: [156011.785507] RBP: 00007ffe41c5e000 R08: 00007f9291c55b00 R09: 00007f92785bd140 2021-02-26T11:49:32.036050+01:00 tcopt11 kernel: [156011.792727] R10: ffffffffffff0c08 R11: 0000000000000070 R12: 00007f9278571010 2021-02-26T11:49:32.036053+01:00 tcopt11 kernel: [156011.799946] R13: 00007f9278571010 R14: 00007f9291c55f30 R15: 00000000000ce338 2021-02-26T11:50:18.309353+01:00 tcopt11 kernel: [156058.072317] BUG: workqueue lockup - pool cpus=51 node=0 flags=0x0 nice=0 stuck for 48s! 2021-02-26T11:50:18.309380+01:00 tcopt11 kernel: [156058.080446] Showing busy workqueues and worker pools: 2021-02-26T11:50:18.319118+01:00 tcopt11 kernel: [156058.085750] workqueue mm_percpu_wq: flags=0x8 2021-02-26T11:50:18.330121+01:00 tcopt11 kernel: [156058.090215] pwq 102: cpus=51 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 2021-02-26T11:50:18.330126+01:00 tcopt11 kernel: [156058.097355] pending: vmstat_update 2021-02-26T11:50:18.347571+01:00 tcopt11 kernel: [156058.101828] workqueue xfs-conv/nvme0n1p1: flags=0xc 2021-02-26T11:50:18.347576+01:00 tcopt11 kernel: [156058.106805] pwq 102: cpus=51 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 2021-02-26T11:50:18.347577+01:00 tcopt11 kernel: [156058.113940] in-flight: 832:xfs_end_io [xfs] 2021-02-26T11:50:18.355454+01:00 tcopt11 kernel: [156058.118877] pool 102: cpus=51 node=0 flags=0x0 nice=0 hung=48s workers=2 idle: 330 [...] -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.