Re: General Protection Fault in md raid10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2024/04/29 3:41, Colgate Minuette 写道:
Hello all,

I am trying to set up an md raid-10 array spanning 8 disks using the following
command

mdadm --create /dev/md64 --level=10 --layout=o2 -n 8 /dev/sd[efghijkl]1

The raid is created successfully, but the moment that the newly created raid
starts initial sync, a general protection fault is issued. This fault happens
on kernels 6.1.85, 6.6.26, and 6.8.5 using mdadm version 4.3. The raid is then
completely unusable. After the fault, if I try to stop the raid using

mdadm --stop /dev/md64

mdadm hangs indefinitely.

I have tried raid levels 0 and 6, and both work as expected without any errors
on these same 8 drives. I also have a working md raid-10 on the system already
with 4 disks(not related to this 8 disk array).

Other things I have tried include trying to create/sync the raid from a debian
live environment, and using near/far/offset layouts, but both methods came back
with the same protection fault. Also ran a memory test on the computer, but
did not have any errors after 10 passes.

Below is the output from the general protection fault. Let me know of anything
else to try or log information that would be helpful to diagnose.

[   10.965542] md64: detected capacity change from 0 to 120021483520
[   10.965593] md: resync of RAID array md64
[   10.999289] general protection fault, probably for non-canonical address
0xd071e7fff89be: 0000 [#1] PREEMPT SMP NOPTI
[   11.000842] CPU: 4 PID: 912 Comm: md64_raid10 Not tainted 6.1.85-1-MANJARO
#1 44ae6c380f5656fa036749a28fdade8f34f2f9ce
[   11.001192] Hardware name: ASUS System Product Name/TUF GAMING X670E-PLUS
WIFI, BIOS 1618 05/18/2023
[   11.001482] RIP: 0010:bio_copy_data_iter+0x187/0x260
[   11.001756] Code: 29 f1 4c 29 f6 48 c1 f9 06 48 c1 fe 06 48 c1 e1 0c 48 c1
e6 0c 48 01 e9 48 01 ee 48 01 d9 4c 01 d6 83 fa 08 0f 82 b0 fe ff ff <48> 8b 06
48 89 01 89 d0 48 8b 7c 06 f8 48 89 7c 01 f8 48 8d 79 08
[   11.002045] RSP: 0018:ffffa838124ffd28 EFLAGS: 00010216
[   11.002336] RAX: ffffca0a84195a80 RBX: 0000000000000000 RCX: ffff89be8656a000
[   11.002628] RDX: 0000000000000642 RSI: 000d071e7fff89be RDI: ffff89beb4039df8
[   11.002922] RBP: ffff89bd80000000 R08: ffffa838124ffd74 R09: ffffa838124ffd60
[   11.003217] R10: 00000000000009be R11: 0000000000002000 R12: ffff89be8bbff400
[   11.003522] R13: ffff89beb4039a00 R14: ffffca0a80000000 R15: 0000000000001000
[   11.003825] FS:  0000000000000000(0000) GS:ffff89c5b8700000(0000) knlGS:
0000000000000000
[   11.004126] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.004429] CR2: 0000563308baac38 CR3: 000000012e900000 CR4:
0000000000750ee0
[   11.004737] PKRU: 55555554
[   11.005040] Call Trace:
[   11.005342]  <TASK>
[   11.005645]  ? __die_body.cold+0x1a/0x1f
[   11.005951]  ? die_addr+0x3c/0x60
[   11.006256]  ? exc_general_protection+0x1c1/0x380
[   11.006562]  ? asm_exc_general_protection+0x26/0x30
[   11.006865]  ? bio_copy_data_iter+0x187/0x260
[   11.007169]  bio_copy_data+0x5c/0x80
[   11.007474]  raid10d+0xcad/0x1c00 [raid10
1721e6c9d579361bf112b0ce400eec9240452da1]
Can you try to use addr2line or gdb to locate which this code line
is this correspond to?

I never see problem like this before... And it'll be greate if you
can bisect this since you can reporduce this problem easily.

Thanks,
Kuai

[   11.007788]  ? srso_alias_return_thunk+0x5/0x7f
[   11.008099]  ? srso_alias_return_thunk+0x5/0x7f
[   11.008408]  ? prepare_to_wait_event+0x60/0x180
[   11.008720]  ? unregister_md_personality+0x70/0x70 [md_mod
64c55bfe07bb9f714eafd175176a02873a443cb7]
[   11.009039]  md_thread+0xab/0x190 [md_mod
64c55bfe07bb9f714eafd175176a02873a443cb7]
[   11.009359]  ? sched_energy_aware_handler+0xb0/0xb0
[   11.009681]  kthread+0xdb/0x110
[   11.009996]  ? kthread_complete_and_exit+0x20/0x20
[   11.010319]  ret_from_fork+0x1f/0x30
[   11.010325]  </TASK>
[   11.010326] Modules linked in: platform_profile libarc4 snd_hda_core
snd_hwdep i8042 realtek kvm cfg80211 snd_pcm sp5100_tco mdio_devres serio
snd_timer raid10 irqbypass wmi_bmof pcspkr k10temp i2c_piix4 rapl rfkill
libphy snd soundcore md_mod gpio_amdpt acpi_cpufreq gpio_generic mac_hid
uinput i2c_dev sg crypto_user fuse loop nfnetlink bpf_preload ip_tables
x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid dm_crypt cbc
encrypted_keys trusted asn1_encoder tee dm_mod crct10dif_pclmul crc32_pclmul
crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel
sha512_ssse3 sha256_ssse3 sha1_ssse3 nvme aesni_intel crypto_simd mpt3sas
nvme_core cryptd ccp nvme_common xhci_pci raid_class xhci_pci_renesas
scsi_transport_sas amdgpu drm_ttm_helper ttm video wmi gpu_sched drm_buddy
drm_display_helper cec
[   11.012188] ---[ end trace 0000000000000000 ]---



.






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux