RAID missing post reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello everyone,

I've been working on a system with a software RAID for the last couple
of weeks. I ran through the process of creating the array as RAID5
using /dev/nvme0n1p1, /dev/nvme1n1p1, and /dev/nvme2n1p1. I then
create the filesystem, update mdadm.conf, and run update-initramfs -u.

The array and file system are created successfully. It's created as
/dev/md127. I mount it to the system and I can write data to it.
/etc/fstab has also been updated.

After rebooting the machine, the system enters Emergency Mode.
Commenting out the newly created device and rebooting the machine
brings it back to Emergency Mode. I can also skip EM by adding the
nofail option to the mount point in /etc/fstab.

Today, I walked through recreating the array. Once created, I ran
mkfs.ext4 again. This time, I noticed that the command found an ext4
file system. To try and repair it, I ran fsck -y against /dev/md127.
The end of the fsck noted that a resize of the inode (re)creation
failed: Inode checksum does not match inode. Mounting failed, so we
made the filesystem again.

It's worth noting that there's NO data on this array at this time.
Hence why we were able to go through with making the filesystem again.
I made sure to gather all of the info noted within the mdadm wiki and
I've included that below. The only thing not included is mdadm
--detail of each of the partitions because the system doesn't
recognize them as being part of an md. Also, md0 hosts the root volume
and isn't a part of the output below.

As far as troubleshooting is concerned, I've tried the following:
1. mdadm --manage /dev/md127 --run
2. echo "clean" > /sys/block/md127/md/array_state & then run command 1
3. mdadm --assemble --force /dev/md127 /dev/nvme0n1p1 /dev/nvme1n1p1
/dev/nvme2n1p1 & then run command 1

I've also poured over logs. Once, I noticed that nvme2n1p1 wasn't
being recognized as a part of the kernel logs. To rule that out as the
issue, I created a RAID1 between nvme0n1p1 & nvme1n1p1. This still
didn't work.

Looking through journalctl -xb, I found an error noting a package that
was missing. The package is named ibblockdev-mdraid2. Installing that
package still didn't help.

Lastly, I included the output of wipefs at the behest of a colleague.
Any support you can provide will be greatly appreciated.

Regards,
Ryan E.


____________________________________________

Start of the mdadm bug report log file.

Date: Tue Aug  6 02:42:59 PM PDT 2024
uname: Linux REDACTED 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5
20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
command line flags:

____________________________________________

mdadm --version

mdadm - v4.2 - 2021-12-30

____________________________________________

cat /proc/mdstat

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5]
[raid4] [raid10]
md0 : active raid1 sdb2[1] sda2[0] 1874715648 blocks super 1.2 [2/2]
[UU] bitmap: 8/14 pages [32KB], 65536KB chunk unused devices: <none>

____________________________________________

mdadm --examine /dev/nvme0n1p1 /dev/nvme1n1p1 /dev/nvme2n1p1

/dev/nvme0n1p1: MBR Magic : aa55 Partition[0] : 4294967295 sectors at
1 (type ee)
/dev/nvme1n1p1: MBR Magic : aa55 Partition[0] : 4294967295 sectors at
1 (type ee)
/dev/nvme2n1p1: MBR Magic : aa55 Partition[0] : 4294967295 sectors at
1 (type ee)

____________________________________________

mdadm --detail /dev/nvme0n1p1 /dev/nvme1n1p1 /dev/nvme2n1p1

mdadm: /dev/nvme0n1p1 does not appear to be an md device
mdadm: /dev/nvme1n1p1 does not appear to be an md device
mdadm: /dev/nvme2n1p1 does not appear to be an md device

____________________________________________

smartctl --xall /dev/nvme0n1p1

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-117-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SAMSUNG MZQL23T8HCLS-00A07
Serial Number:                      S64HNS0TC05245
Firmware Version:                   GDC5602Q
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 3,840,755,982,336 [3.84 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
NVMe Version:                       1.4
Number of Namespaces:               32
Namespace 1 Size/Capacity:          3,840,755,982,336 [3.84 TB]
Namespace 1 Utilization:            71,328,116,736 [71.3 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Tue Aug  6 15:16:08 2024 PDT
Firmware Updates (0x17):            3 Slots, Slot 1 R/O, no Reset required
Optional Admin Commands (0x005f):   Security Format Frmw_DL NS_Mngmt
Self_Test MI_Snd/Rec
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero
Sav/Sel_Feat Timestmp
Log Page Attributes (0x0e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     80 Celsius
Critical Comp. Temp. Threshold:     83 Celsius
Namespace 1 Features (0x1a):        NA_Fields No_ID_Reuse NP_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    25.00W   14.00W       -    0  0  0  0       70      70
 1 +     8.00W    8.00W       -    1  1  1  1       70      70

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        35 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    31,574,989 [16.1 TB]
Data Units Written:                 304,488 [155 GB]
Host Read Commands:                 36,420,064
Host Write Commands:                3,472,342
Controller Busy Time:               63
Power Cycles:                       11
Power On Hours:                     5,582
Unsafe Shutdowns:                   9
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               35 Celsius
Temperature Sensor 2:               44 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

____________________________________________

smartctl --xall /dev/nvme1n1p1

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-117-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SAMSUNG MZQL23T8HCLS-00A07
Serial Number:                      S64HNS0TC05241
Firmware Version:                   GDC5602Q
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 3,840,755,982,336 [3.84 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
NVMe Version:                       1.4
Number of Namespaces:               32
Namespace 1 Size/Capacity:          3,840,755,982,336 [3.84 TB]
Namespace 1 Utilization:            71,324,651,520 [71.3 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Tue Aug  6 15:16:22 2024 PDT
Firmware Updates (0x17):            3 Slots, Slot 1 R/O, no Reset required
Optional Admin Commands (0x005f):   Security Format Frmw_DL NS_Mngmt
Self_Test MI_Snd/Rec
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero
Sav/Sel_Feat Timestmp
Log Page Attributes (0x0e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     80 Celsius
Critical Comp. Temp. Threshold:     83 Celsius
Namespace 1 Features (0x1a):        NA_Fields No_ID_Reuse NP_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    25.00W   14.00W       -    0  0  0  0       70      70
 1 +     8.00W    8.00W       -    1  1  1  1       70      70

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        34 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    24,073,787 [12.3 TB]
Data Units Written:                 7,805,460 [3.99 TB]
Host Read Commands:                 29,506,475
Host Write Commands:                10,354,117
Controller Busy Time:               64
Power Cycles:                       11
Power On Hours:                     5,582
Unsafe Shutdowns:                   9
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               34 Celsius
Temperature Sensor 2:               44 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

____________________________________________

smartctl --xall /dev/nvme2n1p1

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-117-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SAMSUNG MZQL23T8HCLS-00A07
Serial Number:                      S64HNS0TC05244
Firmware Version:                   GDC5602Q
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 3,840,755,982,336 [3.84 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
NVMe Version:                       1.4
Number of Namespaces:               32
Namespace 1 Size/Capacity:          3,840,755,982,336 [3.84 TB]
Namespace 1 Utilization:            3,840,514,523,136 [3.84 TB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Tue Aug  6 15:16:33 2024 PDT
Firmware Updates (0x17):            3 Slots, Slot 1 R/O, no Reset required
Optional Admin Commands (0x005f):   Security Format Frmw_DL NS_Mngmt
Self_Test MI_Snd/Rec
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero
Sav/Sel_Feat Timestmp
Log Page Attributes (0x0e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     80 Celsius
Critical Comp. Temp. Threshold:     83 Celsius
Namespace 1 Features (0x1a):        NA_Fields No_ID_Reuse NP_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    25.00W   14.00W       -    0  0  0  0       70      70
 1 +     8.00W    8.00W       -    1  1  1  1       70      70

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        33 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    33,340 [17.0 GB]
Data Units Written:                 24,215,921 [12.3 TB]
Host Read Commands:                 812,460
Host Write Commands:                31,463,496
Controller Busy Time:               50
Power Cycles:                       12
Power On Hours:                     5,582
Unsafe Shutdowns:                   9
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               33 Celsius
Temperature Sensor 2:               42 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

____________________________________________

lsdrv

PCI [nvme] 22:00.0 Non-Volatile memory controller: Samsung Electronics
Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
└nvme nvme0 SAMSUNG MZQL23T8HCLS-00A07               {S64HNS0TC05245}
 └nvme0n1 3.49t [259:0] Partitioned (gpt)
  └nvme0n1p1 3.49t [259:1] Partitioned (gpt)
PCI [nvme] 23:00.0 Non-Volatile memory controller: Samsung Electronics
Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
└nvme nvme1 SAMSUNG MZQL23T8HCLS-00A07               {S64HNS0TC05241}
 └nvme1n1 3.49t [259:2] Partitioned (gpt)
  └nvme1n1p1 3.49t [259:3] Partitioned (gpt)
PCI [nvme] 24:00.0 Non-Volatile memory controller: Samsung Electronics
Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
└nvme nvme2 SAMSUNG MZQL23T8HCLS-00A07               {S64HNS0TC05244}
 └nvme2n1 3.49t [259:4] Partitioned (gpt)
  └nvme2n1p1 3.49t [259:5] Partitioned (gpt)
PCI [ahci] 64:00.0 SATA controller: ASMedia Technology Inc. ASM1062
Serial ATA Controller (rev 02)
├scsi 0:0:0:0 ATA      SAMSUNG MZ7L31T9 {S6ESNS0W416204}
│└sda 1.75t [8:0] Partitioned (gpt)
│ ├sda1 512.00m [8:1] vfat {B0FD-2869}
│ │└Mounted as /dev/sda1 @ /boot/efi
│ └sda2 1.75t [8:2] MD raid1 (0/2) (w/ sdb2) in_sync 'ubuntu-server:0'
{2bcfa20a-e221-299c-d3e6-f4cf8124e265}
│  └md0 1.75t [9:0] MD v1.2 raid1 (2) active
{2bcfa20a:-e221-29:9c-d3e6-:f4cf8124e265}
│   │               Partitioned (gpt)
│   └md0p1 1.75t [259:6] ext4 {81b5ccee-9c72-4cac-8579-3b9627a8c1b6}
│    └Mounted as /dev/md0p1 @ /
└scsi 1:0:0:0 ATA      SAMSUNG MZ7L31T9 {S6ESNS0W416208}
 └sdb 1.75t [8:16] Partitioned (gpt)
  ├sdb1 512.00m [8:17] vfat {B11F-39A7}
  └sdb2 1.75t [8:18] MD raid1 (1/2) (w/ sda2) in_sync
'ubuntu-server:0' {2bcfa20a-e221-299c-d3e6-f4cf8124e265}
   └md0 1.75t [9:0] MD v1.2 raid1 (2) active
{2bcfa20a:-e221-29:9c-d3e6-:f4cf8124e265}
                    Partitioned (gpt)
PCI [ahci] 66:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
FCH SATA Controller [AHCI mode] (rev 91)
└scsi 2:x:x:x [Empty]
PCI [ahci] 66:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD]
FCH SATA Controller [AHCI mode] (rev 91)
└scsi 10:x:x:x [Empty]
PCI [ahci] 04:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
FCH SATA Controller [AHCI mode] (rev 91)
└scsi 18:x:x:x [Empty]
PCI [ahci] 04:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD]
FCH SATA Controller [AHCI mode] (rev 91)
└scsi 26:x:x:x [Empty]
Other Block Devices
├loop0 0.00k [7:0] Empty/Unknown
├loop1 0.00k [7:1] Empty/Unknown
├loop2 0.00k [7:2] Empty/Unknown
├loop3 0.00k [7:3] Empty/Unknown
├loop4 0.00k [7:4] Empty/Unknown
├loop5 0.00k [7:5] Empty/Unknown
├loop6 0.00k [7:6] Empty/Unknown
└loop7 0.00k [7:7] Empty/Unknown

____________________________________________

wipefs /dev/nvme0n1p1

DEVICE    OFFSET        TYPE UUID LABEL
nvme0n1p1 0x200         gpt
nvme0n1p1 0x37e38900000 gpt
nvme0n1p1 0x1fe         PMBR

____________________________________________

wipefs /dev/nvme1n1p1

DEVICE    OFFSET        TYPE UUID LABEL
nvme1n1p1 0x200         gpt
nvme1n1p1 0x37e38900000 gpt
nvme1n1p1 0x1fe         PMBR

____________________________________________

wipefs /dev/nvme2n1p1

DEVICE    OFFSET        TYPE UUID LABEL
nvme2n1p1 0x200         gpt
nvme2n1p1 0x37e38900000 gpt
nvme2n1p1 0x1fe         PMBR





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux