Hi, 在 2023/08/23 3:26, Yiyi Hu 写道:
Hi, when I upgrade to kernel 6.1.46 I'll meet this bug, running around a day or within 2 days randomly. Actually, I met this with 6.1.0, then I waited for 6.1.7, 6.1.15, 6.1.27, I waited so long and use the old 5.15.x series kernel and no one seems to meet this. So I decided to report this bug after I tried 6.1.46. kernel error log: # tested on 2023-08-18, with 6.1.46, The bug still exists. [ 3258.356331] BUG: kernel NULL pointer dereference, address: 0000000000000050 [ 3258.356340] #PF: supervisor read access in kernel mode [ 3258.356343] #PF: error_code(0x0000) - not-present page [ 3258.356345] PGD 0 P4D 0 [ 3258.356348] Oops: 0000 [#1] PREEMPT SMP [ 3258.356351] CPU: 2 PID: 3956 Comm: md127_raid6 Tainted: G S 6.1.46-gentoo #1 [ 3258.356355] Hardware name: To Be Filled By O.E.M. X370 Killer SLI/X370 Killer SLI, BIOS P7.10 05/10/2022 [ 3258.356358] RIP: 0010:blk_cgroup_bio_start+0x46/0xa0 [ 3258.356364] Code: 00 00 0f 45 c2 89 c5 e8 98 26 b3 ff 48 c7 c7 e0 bf 36 82 e8 2c 65 4b 00 48 8b 43 48 0f b7 4b 14 65 8b 35 9d 88 aa 7e 48 63 d6 <48> 8b 40 50 48 03 04 d5 60 e8 38 82 48 63 d5 f6 c5 01 75 0e 80 cd [ 3258.356368] RSP: 0018:ffffc90000c4fcd8 EFLAGS: 00010286 [ 3258.356371] RAX: 0000000000000000 RBX: ffff88810ec530b8 RCX: 0000000000000000 [ 3258.356373] RDX: 0000000000000002 RSI: 0000000000000002 RDI: ffffffff823354de [ 3258.356375] RBP: 0000000000000001 R08: 0000000000040001 R09: ffff888108f2e8a8 [ 3258.356377] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888108f2e8a8 [ 3258.356379] R13: 8000000000000000 R14: 0000000000000003 R15: ffffc90000c4fd58 [ 3258.356382] FS: 0000000000000000(0000) GS:ffff889fbe880000(0000) knlGS:0000000000000000 [ 3258.356384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3258.356386] CR2: 0000000000000050 CR3: 00000009705ad000 CR4: 0000000000350ee0 [ 3258.356389] Call Trace: [ 3258.356391] <TASK> [ 3258.356394] ? __die_body.cold+0x1a/0x1f [ 3258.356399] ? page_fault_oops+0xae/0x280 [ 3258.356403] ? do_user_addr_fault+0x61/0x4c0 [ 3258.356406] ? exc_page_fault+0x5c/0x120 [ 3258.356409] ? asm_exc_page_fault+0x22/0x30 [ 3258.356414] ? blk_cgroup_bio_start+0x46/0xa0 [ 3258.356417] ? blk_cgroup_bio_start+0x34/0xa0 [ 3258.356420] submit_bio_noacct_nocheck+0x38/0x380 [ 3258.356424] ? bio_init+0x6d/0xb0 [ 3258.356428] ? submit_bio_noacct+0x52/0x300 [ 3258.356434] handle_active_stripes.constprop.0+0x2cc/0x4b0 [raid456] [ 3258.356447] raid5d+0x359/0x5b0 [raid456]
Looks like this is the same as following: https://lore.kernel.org/all/7c57f3a8-36e9-4805-b1ea-a4fd3406f7bb@xxxxxxxxxx/ And this is fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=0d0bd28c500173bfca78aa840f8f36d261ef1765 Thanks, Kuai
[ 3258.356453] ? common_interrupt+0xc6/0xd0 [ 3258.356457] ? schedule_timeout+0x10a/0x140 [ 3258.356459] ? preempt_count_add+0x62/0x90 [ 3258.356463] ? md_free_disk+0x80/0x80 [ 3258.356467] md_thread+0xa4/0x150 [ 3258.356470] ? destroy_sched_domains_rcu+0x30/0x30 [ 3258.356475] kthread+0xe6/0x110 [ 3258.356477] ? kthread_complete_and_exit+0x20/0x20 [ 3258.356480] ret_from_fork+0x1f/0x30 [ 3258.356484] </TASK> [ 3258.356485] Modules linked in: target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod rpcsec_gss_krb5 nfsv4 nfs fscache netfs dm_cache_smq dm_cache dm_persistent_data dm_bio_prison dm_bufio dm_queue_length macvtap macvlan ipip dummy bridge stp llc nf_tables sch_fq_codel virtio_vdpa vduse vdpa rdma_rxe ip6_udp_tunnel udp_tunnel nvme_rdma nvmet_rdma raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 linear bcache nvmet nvme_tcp nvme_fabrics dm_writecache msr thermal rpcrdma rt2800usb rdma_ucm rt2x00usb ib_iser rt2800lib rt2x00lib ib_umad rdma_cm ib_ipoib iw_cm mac80211 snd_hda_codec_realtek snd_hda_codec_generic ib_cm ledtrig_audio snd_hda_intel snd_intel_dspcfg cfg80211 snd_hda_codec rfkill nf_conntrack_tftp snd_hda_core nf_conntrack_netbios_ns nf_conntrack_broadcast snd_hwdep nf_nat_ftp nf_conntrack_ftp wmi_bmof mxm_wmi snd_pcm nf_nat evdev nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 snd_timer snd kvm_amd [ 3258.356537] nct6775 rapl soundcore nct6775_core pcspkr hwmon_vid vhost_net mlx4_ib ocrdma rtc_cmos tun wmi vhost ib_uverbs vhost_iotlb tap button ib_core acpi_cpufreq kvm irqbypass k10temp nfsd loop fuse drm auth_rpcgss nfs_acl lockd grace dmi_sysfs mlx4_en crct10dif_pclmul crc32_pclmul igb nvme crc32c_intel ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd be2net mlx4_core i2c_algo_bit nvme_core sd_mod hwmon xhci_pci t10_pi crc64_rocksoft_generic xhci_hcd crc64_rocksoft crc64 sunrpc dm_mirror dm_region_hash dm_log be2iscsi iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath dm_mod dax efivarfs [ 3258.356597] CR2: 0000000000000050 [ 3258.356599] ---[ end trace 0000000000000000 ]--- [ 3258.356601] RIP: 0010:blk_cgroup_bio_start+0x46/0xa0 [ 3258.356604] Code: 00 00 0f 45 c2 89 c5 e8 98 26 b3 ff 48 c7 c7 e0 bf 36 82 e8 2c 65 4b 00 48 8b 43 48 0f b7 4b 14 65 8b 35 9d 88 aa 7e 48 63 d6 <48> 8b 40 50 48 03 04 d5 60 e8 38 82 48 63 d5 f6 c5 01 75 0e 80 cd [ 3258.356608] RSP: 0018:ffffc90000c4fcd8 EFLAGS: 00010286 [ 3258.356610] RAX: 0000000000000000 RBX: ffff88810ec530b8 RCX: 0000000000000000 [ 3258.356613] RDX: 0000000000000002 RSI: 0000000000000002 RDI: ffffffff823354de [ 3258.356614] RBP: 0000000000000001 R08: 0000000000040001 R09: ffff888108f2e8a8 [ 3258.356616] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888108f2e8a8 [ 3258.356618] R13: 8000000000000000 R14: 0000000000000003 R15: ffffc90000c4fd58 [ 3258.356621] FS: 0000000000000000(0000) GS:ffff889fbe880000(0000) knlGS:0000000000000000 [ 3258.356626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3258.356630] CR2: 0000000000000050 CR3: 00000009705ad000 CR4: 0000000000350ee0 [ 3258.356633] note: md127_raid6[3956] exited with irqs disabled [ 3258.356642] note: md127_raid6[3956] exited with preempt_count 1 [ 3258.356646] ------------[ cut here ]------------ [ 3258.356647] WARNING: CPU: 2 PID: 3956 at kernel/exit.c:814 do_exit+0x8b2/0xa50 [ 3258.356652] Modules linked in: target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod rpcsec_gss_krb5 nfsv4 nfs fscache netfs dm_cache_smq dm_cache dm_persistent_data dm_bio_prison dm_bufio dm_queue_length macvtap macvlan ipip dummy bridge stp llc nf_tables sch_fq_codel virtio_vdpa vduse vdpa rdma_rxe ip6_udp_tunnel udp_tunnel nvme_rdma nvmet_rdma raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 linear bcache nvmet nvme_tcp nvme_fabrics dm_writecache msr thermal rpcrdma rt2800usb rdma_ucm rt2x00usb ib_iser rt2800lib rt2x00lib ib_umad rdma_cm ib_ipoib iw_cm mac80211 snd_hda_codec_realtek snd_hda_codec_generic ib_cm ledtrig_audio snd_hda_intel snd_intel_dspcfg cfg80211 snd_hda_codec rfkill nf_conntrack_tftp snd_hda_core nf_conntrack_netbios_ns nf_conntrack_broadcast snd_hwdep nf_nat_ftp nf_conntrack_ftp wmi_bmof mxm_wmi snd_pcm nf_nat evdev nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 snd_timer snd kvm_amd [ 3258.356693] nct6775 rapl soundcore nct6775_core pcspkr hwmon_vid vhost_net mlx4_ib ocrdma rtc_cmos tun wmi vhost ib_uverbs vhost_iotlb tap button ib_core acpi_cpufreq kvm irqbypass k10temp nfsd loop fuse drm auth_rpcgss nfs_acl lockd grace dmi_sysfs mlx4_en crct10dif_pclmul crc32_pclmul igb nvme crc32c_intel ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd be2net mlx4_core i2c_algo_bit nvme_core sd_mod hwmon xhci_pci t10_pi crc64_rocksoft_generic xhci_hcd crc64_rocksoft crc64 sunrpc dm_mirror dm_region_hash dm_log be2iscsi iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath dm_mod dax efivarfs [ 3258.356741] CPU: 2 PID: 3956 Comm: md127_raid6 Tainted: G S D 6.1.46-gentoo #1 [ 3258.356744] Hardware name: To Be Filled By O.E.M. X370 Killer SLI/X370 Killer SLI, BIOS P7.10 05/10/2022 [ 3258.356747] RIP: 0010:do_exit+0x8b2/0xa50 [ 3258.356749] Code: 1c ff ff ff 48 89 df e8 ac 96 0c 00 e9 8e f9 ff ff 0f 0b e9 9a f7 ff ff 4c 89 e6 bf 05 06 00 00 e8 43 ea 00 00 e9 6a f8 ff ff <0f> 0b e9 bd f7 ff ff 48 8b bb f8 06 00 00 e8 cb dc ff ff 48 85 c0 [ 3258.356753] RSP: 0018:ffffc90000c4fed8 EFLAGS: 00010286 [ 3258.356756] RAX: 0000000080000000 RBX: ffff88811121be00 RCX: 0000000000000000 [ 3258.356758] RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff [ 3258.356760] RBP: ffff88810fcb5100 R08: 0000000000009ffb R09: 00000000ffffdfff [ 3258.356762] R10: ffff88a03f23f240 R11: ffff88a03f23f240 R12: 0000000000000009 [ 3258.356764] R13: ffff8881101e7380 R14: 0000000000000000 R15: 0000000000000000 [ 3258.356766] FS: 0000000000000000(0000) GS:ffff889fbe880000(0000) knlGS:0000000000000000 [ 3258.356768] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3258.356770] CR2: 0000000000000050 CR3: 00000009705ad000 CR4: 0000000000350ee0 [ 3258.356772] Call Trace: [ 3258.356774] <TASK> [ 3258.356775] ? __warn+0x7d/0xc0 [ 3258.356779] ? do_exit+0x8b2/0xa50 [ 3258.356782] ? report_bug+0xe2/0x170 [ 3258.356785] ? handle_bug+0x3c/0x60 [ 3258.356788] ? exc_invalid_op+0x13/0x60 [ 3258.356790] ? asm_exc_invalid_op+0x16/0x20 [ 3258.356794] ? do_exit+0x8b2/0xa50 [ 3258.356796] ? do_exit+0x68/0xa50 [ 3258.356798] make_task_dead+0x89/0x90 [ 3258.356800] rewind_stack_and_make_dead+0x17/0x20 [ 3258.356803] RIP: 0000:0x0 [ 3258.356807] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ 3258.356809] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 [ 3258.356811] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 3258.356813] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 3258.356815] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 3258.356817] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 3258.356818] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 3258.356821] </TASK> [ 3258.356823] ---[ end trace 0000000000000000 ]--- I'm using gentoo on raid6, # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] md126 : active raid1 dm-18[2] dm-10[0] 8379392 blocks super 1.2 [2/2] [UU] md127 : active raid6 md126[0](J) sdb2[1] sda2[6] sdd2[2] sde2[3] sdc2[5] sdf2[4] 23441016832 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6] [UUUUUU] # mdadm --detail /dev/md126 /dev/md126: Version : 1.2 Creation Time : Mon Nov 28 23:35:53 2022 Raid Level : raid1 Array Size : 8379392 (7.99 GiB 8.58 GB) Used Dev Size : 8379392 (7.99 GiB 8.58 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Wed Aug 23 03:12:09 2023 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : resync Name : m:r6_wjournal (local to host m) UUID : b26db5f9:4632ad63:ad25132e:6fbac76a Events : 21830666 Number Major Minor RaidDevice State 0 253 10 0 active sync /dev/dm-10 2 253 18 1 active sync /dev/dm-18 # mdadm --detail /dev/md127 /dev/md127: Version : 1.2 Creation Time : Tue Nov 29 00:24:51 2022 Raid Level : raid6 Array Size : 23441016832 (21.83 TiB 24.00 TB) Used Dev Size : 5860254208 (5.46 TiB 6.00 TB) Raid Devices : 6 Total Devices : 7 Persistence : Superblock is persistent Update Time : Wed Aug 23 03:12:39 2023 State : clean Active Devices : 6 Working Devices : 7 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Consistency Policy : journal Name : m:m_r6_pv (local to host m) UUID : 14a8576e:8050f86a:e7e29ac7:07d1ddf1 Events : 19741955 Number Major Minor RaidDevice State 1 8 18 0 active sync /dev/sdb2 2 8 50 1 active sync /dev/sdd2 3 8 66 2 active sync /dev/sde2 4 8 82 3 active sync /dev/sdf2 5 8 34 4 active sync /dev/sdc2 6 8 2 5 active sync /dev/sda2 0 9 126 - journal /dev/md/m:r6_wjournal Is this a raid module bug? or should I report the bug to other mailing list? .