Re: raid1 show clean but md0 will not assemble

Roman Mamedov <rm@xxxxxxxxxxx> · Fri, 8 Mar 2024 05:55:58 +0500

On Thu, 7 Mar 2024 16:45:49 -0800
Stewart Andreason <sandreas41@xxxxxxxxx> wrote:

> Hi Roman,
> 
> Does this board have rules about replying to everyone, or not?

Hello,

Yes it was better to reply to everyone, to let people know it is now solved
and nobody else needs to spend time analyzing the original report.

> I'll go with not until advised otherwise.
> 
> 
> >> $ sudo mdadm --assemble --verbose /dev/md0 /dev/sdc1 /dev/sdd1
> >> mdadm: looking for devices for /dev/md0
> >> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 0.
> >> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 1.
> >> mdadm: added /dev/sdc1 to /dev/md0 as 0 (possibly out of date)
> >> mdadm: added /dev/sdd1 to /dev/md0 as 1
> >> mdadm: /dev/md0 has been started with 1 drive (out of 2).
> > Please include "dmesg" output that's printed after running this command.
> 
> 
> Certainly.
> 
> http://seahorsecorral.org/bugreport/Roxy10-dmesg-20240307-clip.txt.gz
> 
> 
> 
> > See the "Event" counters, one drive indeed has less than the other.
> 
> 
> When I first opened these in January, they went into a different 
> enclosure, Acasis EC-7352
> 
> All or 99% of the errors are from that month, both the events and the 3 
> serious errors in the smart log. It was first configured for hardware 
> raid, but proved to have several issues, including not turning on the 
> fan after waking up. Crashed a few times, even in JBOD mode. So I 
> started over in new individual enclosures.
> 
> Hard to tell who was responsible for those crashes, since the most 
> recent ones froze up the whole OS, so no dmesg could be retrieved. That 
> was Jan.29 and was the end of Acasis in my rating.
> 
> 
> > As for the actual steps, when you are in this state as in your report, I'd try:
> >
> >    mdadm --re-add /dev/md0 /dev/sdc1
> 
> I powered up 0, expect sdc, Device Role : Active device 0 , ok.
> 
> I powered up 1, expect sdd, Active device 1
> 
> $ sudo mdadm --detail /dev/md0
> /dev/md0:
>             Version : 1.2
>       Creation Time : Sat Jan 27 12:07:27 2024
>          Raid Level : raid1
>          Array Size : 5860388864 (5588.90 GiB 6001.04 GB)
>       Used Dev Size : 5860388864 (5588.90 GiB 6001.04 GB)
>        Raid Devices : 2
>       Total Devices : 1
>         Persistence : Superblock is persistent
> 
>       Intent Bitmap : Internal
> 
>         Update Time : Sat Mar  2 18:50:02 2024
>               State : clean, degraded
>      Active Devices : 1
>     Working Devices : 1
>      Failed Devices : 0
> 
>                Name : roxy10-debian11-x64:0  (local to host 
> roxy10-debian11-x64)
>                UUID : 1e3f7f7e:23a5b75f:6f76abf5:88f5e704
>              Events : 21691
> 
>      Number   Major   Minor   RaidDevice State
>         0       8       33        0      active sync   /dev/sdc1
>         -       0        0        1      removed
> 
> Why is sdc the active one this time?
> 
> $ lsblk
> 
> sdc       8:32   0   5.5T  0 disk
> └─sdc1    8:33   0   5.5T  0 part
>    └─md0   9:0    0   5.5T  0 raid1
> sdd       8:48   0   5.5T  0 disk
> └─sdd1    8:49   0   5.5T  0 part
> 
> I keep getting confused which drive is the bad one, and repeated my 
> steps before posting my question. Now maybe I was not imagining it.
> 
> I only got the slot numbers verified by reassembling it. so I'll do that 
> step again.
> 
> $ sudo mdadm --stop /dev/md0
> mdadm: stopped /dev/md0
> $ sudo mdadm --assemble --verbose /dev/md0 /dev/sdc1 /dev/sdd1
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 1.
> mdadm: added /dev/sdc1 to /dev/md0 as 0 (possibly out of date)
> mdadm: added /dev/sdd1 to /dev/md0 as 1
> mdadm: /dev/md0 has been started with 1 drive (out of 2).
> 
> Huh. Well, onward then. I'll just include what changed:
> 
> $ sudo mdadm --detail /dev/md0
>      Number   Major   Minor   RaidDevice State
>         -       0        0        0      removed
>         1       8       49        1      active sync   /dev/sdd1
> 
> $ sudo mdadm --re-add /dev/md0 /dev/sdc1
> mdadm: re-added /dev/sdc1
> 
> $ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] 
> [raid4] [raid10]
> md0 : active raid1 sdc1[0] sdd1[1]
>        5860388864 blocks super 1.2 [2/2] [UU]
>        bitmap: 1/44 pages [4KB], 65536KB chunk
> 
> $ sudo mdadm --detail /dev/md0
> 
>      Number   Major   Minor   RaidDevice State
>         0       8       33        0      active sync   /dev/sdc1
>         1       8       49        1      active sync   /dev/sdd1
> 
> $ dmesg
> 
> [37737.530811] md: kicking non-fresh sdc1 from array!
> [37737.556503] md/raid1:md0: active with 1 out of 2 mirrors
> [37737.561908] md0: detected capacity change from 0 to 6001038196736
> [37818.049342] md: recovery of RAID array md0
> [37818.319780] md: md0: recovery done.
> 
> Fixed. I'm so glad I asked the right forum. Thank you!
> 
> 
> > But to me it is puzzling why it got removed to begin with.
> >
> I  had the intent of making a backup of my primary OS when it was 
> unmounted, and after researching the safe commands to reassemble the 
> raid in a different linux OS, I rebooted to System-Rescue-10, copied the 
> mdadm.conf to /etc over the existing template. and attempted to mount 
> /dev/md0
> 
> I got only one drive up. Ensue the mild panic, because when doesn't 
> doing research first ever go perfectly?
> 
> Now with a few more days experience, I get to try again.
> 
> Thanks again,
> 
> Stewart
> 

-- 
With respect,
Roman