Hello everyone, I've been working on a system with a software RAID for the last couple of weeks. I ran through the process of creating the array as RAID5 using /dev/nvme0n1p1, /dev/nvme1n1p1, and /dev/nvme2n1p1. I then create the filesystem, update mdadm.conf, and run update-initramfs -u. The array and file system are created successfully. It's created as /dev/md127. I mount it to the system and I can write data to it. /etc/fstab has also been updated. After rebooting the machine, the system enters Emergency Mode. Commenting out the newly created device and rebooting the machine brings it back to Emergency Mode. I can also skip EM by adding the nofail option to the mount point in /etc/fstab. Today, I walked through recreating the array. Once created, I ran mkfs.ext4 again. This time, I noticed that the command found an ext4 file system. To try and repair it, I ran fsck -y against /dev/md127. The end of the fsck noted that a resize of the inode (re)creation failed: Inode checksum does not match inode. Mounting failed, so we made the filesystem again. It's worth noting that there's NO data on this array at this time. Hence why we were able to go through with making the filesystem again. I made sure to gather all of the info noted within the mdadm wiki and I've included that below. The only thing not included is mdadm --detail of each of the partitions because the system doesn't recognize them as being part of an md. Also, md0 hosts the root volume and isn't a part of the output below. As far as troubleshooting is concerned, I've tried the following: 1. mdadm --manage /dev/md127 --run 2. echo "clean" > /sys/block/md127/md/array_state & then run command 1 3. mdadm --assemble --force /dev/md127 /dev/nvme0n1p1 /dev/nvme1n1p1 /dev/nvme2n1p1 & then run command 1 I've also poured over logs. Once, I noticed that nvme2n1p1 wasn't being recognized as a part of the kernel logs. To rule that out as the issue, I created a RAID1 between nvme0n1p1 & nvme1n1p1. This still didn't work. Looking through journalctl -xb, I found an error noting a package that was missing. The package is named ibblockdev-mdraid2. Installing that package still didn't help. Lastly, I included the output of wipefs at the behest of a colleague. Any support you can provide will be greatly appreciated. Regards, Ryan E. ____________________________________________ Start of the mdadm bug report log file. Date: Tue Aug 6 02:42:59 PM PDT 2024 uname: Linux REDACTED 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux command line flags: ____________________________________________ mdadm --version mdadm - v4.2 - 2021-12-30 ____________________________________________ cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sdb2[1] sda2[0] 1874715648 blocks super 1.2 [2/2] [UU] bitmap: 8/14 pages [32KB], 65536KB chunk unused devices: <none> ____________________________________________ mdadm --examine /dev/nvme0n1p1 /dev/nvme1n1p1 /dev/nvme2n1p1 /dev/nvme0n1p1: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee) /dev/nvme1n1p1: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee) /dev/nvme2n1p1: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee) ____________________________________________ mdadm --detail /dev/nvme0n1p1 /dev/nvme1n1p1 /dev/nvme2n1p1 mdadm: /dev/nvme0n1p1 does not appear to be an md device mdadm: /dev/nvme1n1p1 does not appear to be an md device mdadm: /dev/nvme2n1p1 does not appear to be an md device ____________________________________________ smartctl --xall /dev/nvme0n1p1 smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-117-generic] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: SAMSUNG MZQL23T8HCLS-00A07 Serial Number: S64HNS0TC05245 Firmware Version: GDC5602Q PCI Vendor/Subsystem ID: 0x144d IEEE OUI Identifier: 0x002538 Total NVM Capacity: 3,840,755,982,336 [3.84 TB] Unallocated NVM Capacity: 0 Controller ID: 6 NVMe Version: 1.4 Number of Namespaces: 32 Namespace 1 Size/Capacity: 3,840,755,982,336 [3.84 TB] Namespace 1 Utilization: 71,328,116,736 [71.3 GB] Namespace 1 Formatted LBA Size: 512 Local Time is: Tue Aug 6 15:16:08 2024 PDT Firmware Updates (0x17): 3 Slots, Slot 1 R/O, no Reset required Optional Admin Commands (0x005f): Security Format Frmw_DL NS_Mngmt Self_Test MI_Snd/Rec Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x0e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 80 Celsius Critical Comp. Temp. Threshold: 83 Celsius Namespace 1 Features (0x1a): NA_Fields No_ID_Reuse NP_Fields Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 25.00W 14.00W - 0 0 0 0 70 70 1 + 8.00W 8.00W - 1 1 1 1 70 70 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 1 - 4096 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 35 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 31,574,989 [16.1 TB] Data Units Written: 304,488 [155 GB] Host Read Commands: 36,420,064 Host Write Commands: 3,472,342 Controller Busy Time: 63 Power Cycles: 11 Power On Hours: 5,582 Unsafe Shutdowns: 9 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 35 Celsius Temperature Sensor 2: 44 Celsius Error Information (NVMe Log 0x01, 16 of 64 entries) No Errors Logged ____________________________________________ smartctl --xall /dev/nvme1n1p1 smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-117-generic] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: SAMSUNG MZQL23T8HCLS-00A07 Serial Number: S64HNS0TC05241 Firmware Version: GDC5602Q PCI Vendor/Subsystem ID: 0x144d IEEE OUI Identifier: 0x002538 Total NVM Capacity: 3,840,755,982,336 [3.84 TB] Unallocated NVM Capacity: 0 Controller ID: 6 NVMe Version: 1.4 Number of Namespaces: 32 Namespace 1 Size/Capacity: 3,840,755,982,336 [3.84 TB] Namespace 1 Utilization: 71,324,651,520 [71.3 GB] Namespace 1 Formatted LBA Size: 512 Local Time is: Tue Aug 6 15:16:22 2024 PDT Firmware Updates (0x17): 3 Slots, Slot 1 R/O, no Reset required Optional Admin Commands (0x005f): Security Format Frmw_DL NS_Mngmt Self_Test MI_Snd/Rec Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x0e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 80 Celsius Critical Comp. Temp. Threshold: 83 Celsius Namespace 1 Features (0x1a): NA_Fields No_ID_Reuse NP_Fields Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 25.00W 14.00W - 0 0 0 0 70 70 1 + 8.00W 8.00W - 1 1 1 1 70 70 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 1 - 4096 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 34 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 24,073,787 [12.3 TB] Data Units Written: 7,805,460 [3.99 TB] Host Read Commands: 29,506,475 Host Write Commands: 10,354,117 Controller Busy Time: 64 Power Cycles: 11 Power On Hours: 5,582 Unsafe Shutdowns: 9 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 34 Celsius Temperature Sensor 2: 44 Celsius Error Information (NVMe Log 0x01, 16 of 64 entries) No Errors Logged ____________________________________________ smartctl --xall /dev/nvme2n1p1 smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-117-generic] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: SAMSUNG MZQL23T8HCLS-00A07 Serial Number: S64HNS0TC05244 Firmware Version: GDC5602Q PCI Vendor/Subsystem ID: 0x144d IEEE OUI Identifier: 0x002538 Total NVM Capacity: 3,840,755,982,336 [3.84 TB] Unallocated NVM Capacity: 0 Controller ID: 6 NVMe Version: 1.4 Number of Namespaces: 32 Namespace 1 Size/Capacity: 3,840,755,982,336 [3.84 TB] Namespace 1 Utilization: 3,840,514,523,136 [3.84 TB] Namespace 1 Formatted LBA Size: 512 Local Time is: Tue Aug 6 15:16:33 2024 PDT Firmware Updates (0x17): 3 Slots, Slot 1 R/O, no Reset required Optional Admin Commands (0x005f): Security Format Frmw_DL NS_Mngmt Self_Test MI_Snd/Rec Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x0e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 80 Celsius Critical Comp. Temp. Threshold: 83 Celsius Namespace 1 Features (0x1a): NA_Fields No_ID_Reuse NP_Fields Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 25.00W 14.00W - 0 0 0 0 70 70 1 + 8.00W 8.00W - 1 1 1 1 70 70 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 1 - 4096 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 33 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 33,340 [17.0 GB] Data Units Written: 24,215,921 [12.3 TB] Host Read Commands: 812,460 Host Write Commands: 31,463,496 Controller Busy Time: 50 Power Cycles: 12 Power On Hours: 5,582 Unsafe Shutdowns: 9 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 33 Celsius Temperature Sensor 2: 42 Celsius Error Information (NVMe Log 0x01, 16 of 64 entries) No Errors Logged ____________________________________________ lsdrv PCI [nvme] 22:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO └nvme nvme0 SAMSUNG MZQL23T8HCLS-00A07 {S64HNS0TC05245} └nvme0n1 3.49t [259:0] Partitioned (gpt) └nvme0n1p1 3.49t [259:1] Partitioned (gpt) PCI [nvme] 23:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO └nvme nvme1 SAMSUNG MZQL23T8HCLS-00A07 {S64HNS0TC05241} └nvme1n1 3.49t [259:2] Partitioned (gpt) └nvme1n1p1 3.49t [259:3] Partitioned (gpt) PCI [nvme] 24:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO └nvme nvme2 SAMSUNG MZQL23T8HCLS-00A07 {S64HNS0TC05244} └nvme2n1 3.49t [259:4] Partitioned (gpt) └nvme2n1p1 3.49t [259:5] Partitioned (gpt) PCI [ahci] 64:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02) ├scsi 0:0:0:0 ATA SAMSUNG MZ7L31T9 {S6ESNS0W416204} │└sda 1.75t [8:0] Partitioned (gpt) │ ├sda1 512.00m [8:1] vfat {B0FD-2869} │ │└Mounted as /dev/sda1 @ /boot/efi │ └sda2 1.75t [8:2] MD raid1 (0/2) (w/ sdb2) in_sync 'ubuntu-server:0' {2bcfa20a-e221-299c-d3e6-f4cf8124e265} │ └md0 1.75t [9:0] MD v1.2 raid1 (2) active {2bcfa20a:-e221-29:9c-d3e6-:f4cf8124e265} │ │ Partitioned (gpt) │ └md0p1 1.75t [259:6] ext4 {81b5ccee-9c72-4cac-8579-3b9627a8c1b6} │ └Mounted as /dev/md0p1 @ / └scsi 1:0:0:0 ATA SAMSUNG MZ7L31T9 {S6ESNS0W416208} └sdb 1.75t [8:16] Partitioned (gpt) ├sdb1 512.00m [8:17] vfat {B11F-39A7} └sdb2 1.75t [8:18] MD raid1 (1/2) (w/ sda2) in_sync 'ubuntu-server:0' {2bcfa20a-e221-299c-d3e6-f4cf8124e265} └md0 1.75t [9:0] MD v1.2 raid1 (2) active {2bcfa20a:-e221-29:9c-d3e6-:f4cf8124e265} Partitioned (gpt) PCI [ahci] 66:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 91) └scsi 2:x:x:x [Empty] PCI [ahci] 66:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 91) └scsi 10:x:x:x [Empty] PCI [ahci] 04:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 91) └scsi 18:x:x:x [Empty] PCI [ahci] 04:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 91) └scsi 26:x:x:x [Empty] Other Block Devices ├loop0 0.00k [7:0] Empty/Unknown ├loop1 0.00k [7:1] Empty/Unknown ├loop2 0.00k [7:2] Empty/Unknown ├loop3 0.00k [7:3] Empty/Unknown ├loop4 0.00k [7:4] Empty/Unknown ├loop5 0.00k [7:5] Empty/Unknown ├loop6 0.00k [7:6] Empty/Unknown └loop7 0.00k [7:7] Empty/Unknown ____________________________________________ wipefs /dev/nvme0n1p1 DEVICE OFFSET TYPE UUID LABEL nvme0n1p1 0x200 gpt nvme0n1p1 0x37e38900000 gpt nvme0n1p1 0x1fe PMBR ____________________________________________ wipefs /dev/nvme1n1p1 DEVICE OFFSET TYPE UUID LABEL nvme1n1p1 0x200 gpt nvme1n1p1 0x37e38900000 gpt nvme1n1p1 0x1fe PMBR ____________________________________________ wipefs /dev/nvme2n1p1 DEVICE OFFSET TYPE UUID LABEL nvme2n1p1 0x200 gpt nvme2n1p1 0x37e38900000 gpt nvme2n1p1 0x1fe PMBR