Re: RAID5 Phantom Drive Appeared while Reshaping Four Drive Array (HARDLOCK)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

The command that I used was straight forward, I believe.

Here goes, Hopefully this helps clearify (best that I can recall since the terminal history
was destroyed.)

sync; sudo umount /MEGARAID
sudo mdadm -S /dev/md480
sudo mdadm --add /dev/md480 /dev/sde1 ## New drive was prepped and /dev/sde initially
sudo mdadm --grow --raid-devices=4 /dev/md480

I have a function within .bashrc watchraid(){ watch -c -d -n 1 cat /proc/mdstat ;} 
So, that is running all the time and it reported ~4-5 days to complete.

I paused it successfully using either...
echo "frozen" > /sys/block/md480/md/sync_action
OR
echo "idle" >  /sys/block/md480/md/sync_action
(No history available.)

Used rsync to copy ~450GB from network share onto the idling raid.
When rsync completed sucessfully I attempted to rename the rsync log file using Thunar.
It flashed back to the file listing without renaming the file.
So, I tried exact same name again. Same failure. 

Within seconds, every open application/terminal window stopped responding.
(Looking back I probably should have tried a TTY terminal.) Instead I pushed reset button.
Reshaping was at ~30% and stalled. X was completely unresponsive. Hit the reset switch.

The reboot was absolutely normal. The RAID was operational and filesystem intact.
Here's where it gets fuzzy, panic was setting in because the drives didn't look right in
the watch terminal. There were five drives now. One showing failed/removed. I can't recall
if it started reshaping automatically. It shouldn't have. I'm now beginning to believe that
I made a sloppy typo while restarting the reshape manually? In any event, I continued with
routine daily activities while the RAID reshaped. Alot happens in 3-5 days while reshaping.

At ~80% the speed slowed to 0.
sudo umount /MEGARAID
sudo mdadm -S /dev/md480

I had to add a new card to the machine; 1TB NVMe on a PCIe card.
Shutdown. Added card. Booted all looked normal.
However, the filesystem on the raid couldn't auto mount.

watch -c -d -n 1 cat /proc/mdstat

All previously active drives were now listed as spares. So, Life goes on. And here I am now.
The 1TB NVMe that I added is formatted, fully functional and hasn't interfered.

Anyhow, After I create a RAID, I prefer to let mdadm handle itself. In other words, I avoid
using the --force'ing optino. mdadm has been very robust over the years.

I can only conclude that I made a sloppy typo. Right now I'm just trying to deal with the
current state of the array.

I appreciate that your free time has great value. Thank you.
SA

Oh, kind of a silver lining that maybe of interest, I found that the ~7,000 scanned documents
were cataloged/THUMNAILED in a photo album within the digiKam program on a non raid partition.
Most thumbnails are abolutely legible, so, I have high confidence of originals that are lost.

Cheers for having a few luck attributes.

On Mon, 2023-05-22 at 18:50 -0500, Roger Heflin wrote:
> Given what the array is reporting I would doubt that is going to fix
> anything.    The array being in the middle of a reshape makes it
> likely that neither n or n-1 is the right raid config for at least 1/2
> of the data, so it is likely the filesystem will be completely broken.
> 
> Right now the array reports it is a 5 disk array, and the array data
> says it was going from 4 disks to 5.
> 
> What was the command you used to add the 4th disk?     No one is sure
> based on what you are saying how exactly the array got into this
> state.   The data being shown disagrees with what you are reporting,
> and given that no one knows what actually happened.
> 
> On Mon, May 22, 2023 at 3:18 PM raid <raid@electrons.cloud> wrote:
> > Hi
> > 
> > Thanks for your time so far ! Final questions before I rebuild this RAID from scratch.
> > 
> > BTW I created detailed notes when I created this array (as I have for eight other RAIDs that I maintain).
> >     These notes may be applicable later... Here's why.
> > 
> > Do you think that Zero'ing the drives (as is done for initial drive prep) and then recreating the
> > RAID5 using the initial settings (originally three drives, NOW four drives) could possibly offer
> > a greater chance to recover files? As in, more complete file recovery if the striping aligns
> > correctly? Technically, I've had to write off the files that aren't currently backed up.
> > 
> > However, I'm still willing to make an attempt if you think the idea above might yield something
> > better than one or two stripes of data per file?
> > 
> > And/Or any other tips for this final attempt? Setting ReadOnly if possible?
> > 
> > Thanks Again
> > SA
> > 
> > ---
> > Detailed Notes:
> > ============================================================
> >   2021.10.26 0200P NEW RAID MD480 (48TB) 3x 1600GB HITACHI
> > ====================================================================================================================
> > ====
> > = PREPARATION ==
> > 
> > watch -c -d -n 1 cat /proc/mdstat  ############## OPEN A TERMINAL AND MONITOR STATUS ##
> > 
> > sudo lsblk && sudo blkid  ########################################### VERIFY DEVICES ##
> > 
> > sudo umount /MEGARAID                         # Unmount if filesystem is mounted
> > sudo mdadm --stop /dev/md480                  # Stop the RAID/md480 device
> > sudo mdadm --zero-superblock /dev/sd[cdf]1    # Zero  the   superblock(s)  on
> >                                               #      all members of the array
> > sudo mdadm --remove /dev/md480                # Remove the RAID/md480
> > 
> > Edit  ########################################## OPTIONAL FINALIZE PERMANENT REMOVAL ##
> > /etc/fstab
> > /etc/mdadm/mdadm.conf
> > Removing referrences to the mounting and the definition of the RAID/MD480 device(s)
> > NOTE: Some fstab CFG settings allow skipping devices when unavailable at boot. (nofail)
> > 
> > sudo update-initramfs -uv       # -uv  update ; verbose  ########### RESET INITRAMFS ##
> > 
> > ======================================================================================== CREATE RAID & ADD
> > FILESYSTEM ==
> >   MEGARAID 2021.10.26 0200P
> > ##############  RAID5 ARRAY MD480 32TB (32,001,527,644,160 bytes) Available (3x16TB) ##
> > 
> > sudo mdadm --create --verbose /dev/md480 --level=5 --raid-devices=3 --uuid=2021102502005a7a5a7abeefcafebabe
> > /dev/sd[cdf]1
> > 
> > 31,251,491,840 BLOCKS CREATED IN ~20 HOURS
> > 
> > ############################################################  CREATE FILESYSTEM EXT4 ##
> >  -v VERBOSE
> >  -L DISK LABEL
> >  -U UUID FORMATTED AS 8CHARS-4CHARS-4CHARS-4CHARS-12CHARS
> >  -m OVERFLOW PROTECTION PERCENTAGE IE. .025 OF 24,576GB IS ~615MB FREE IS CONSIDERED FULL
> >  -b BLOCK SIZE 1/4 OF STRIDE= OFFERS BEST OVERALL PERFORMANCE
> >  -E STRIDE= MULTIPLE OF 8
> >     STRIPE-WIDTH= STRIDE X 2
> > 
> > sudo mkfs.ext4 -v -L MEGARAID    -U 20211028-0500-5a7a-5a7a-beefcafebabe -m .025 -b 4096 -E stride=32,stripe-
> > width=64
> > /dev/md480
> > 
> > sudo mkdir  /MEGARAID  ; sudo chown adminx:adminx -R /MEGARAID
> > 
> > ##############################################################  SET CORRECT HOMEHOST ##
> > 
> > sudo umount /MEGARAID
> > sudo mdadm --stop /dev/md480
> > sudo mdadm --assemble --update=homehost --homehost=GRANDSLAM /dev/md480 /dev/sd[cdf]1
> > sudo blkid
> > 
> > /dev/sdc1: UUID="20211025-0200-5a7a-5a7a-beefcafebabe"
> >            UUID_SUB="8f0835db-3ea2-4540-2ab4-232d6203d1b7"
> >            LABEL="GRANDSLAM:480" TYPE="linux_raid_member"
> >            PARTLABEL="HIT*16TB*001*RAID5"
> >            PARTUUID="3b68fe63-35d0-404d-912e-dfe1127f109b"
> > 
> > /dev/sdd1: UUID="20211025-0200-5a7a-5a7a-beefcafebabe"
> >            UUID_SUB="b4660f49-867b-9f1e-ecad-0acec7119c37"
> >            LABEL="GRANDSLAM:480" TYPE="linux_raid_member"
> >            PARTLABEL="HIT*16TB*002*RAID5"
> >            PARTUUID="32c50f4f-f6ce-4309-b8e4-facdb6e05ba8"
> > 
> > /dev/sdf1: UUID="20211025-0200-5a7a-5a7a-beefcafebabe"
> >            UUID_SUB="79a3dff4-c53f-9071-f9c1-c262403fbc10"
> >            LABEL="GRANDSLAM:480" TYPE="linux_raid_member"
> >            PARTLABEL="HIT*16TB*003*RAID5"
> >            PARTUUID="7ec27f96-2275-4e09-9013-ac056f11ebfb"
> > 
> > /dev/md480: LABEL="MEGARAID" UUID="20211028-0500-5a7a-5a7a-beefcafebabe" TYPE="ext4"
> > 
> > ############################################################### ENTRY FOR /ETC/FSTAB ##
> > 
> > /dev/md480              /MEGARAID               ext4            nofail,noatime,nodiratime,relatime,errors=remount-ro
> > 0               2
> > 
> > #################################################### ENTRY FOR /ETC/MDADM/MDADM.CONF ##
> > 
> > ARRAY /dev/md480 metadata=1.2 name=GRANDSLAM:480 UUID=20211025:02005a7a:5a7abeef:cafebabe
> > 
> > #######################################################################################
> > 
> > sudo update-initramfs -uv       # -uv  update ; verbose
> > sudo mount -a
> > sudo chown adminx:adminx -R /MEGARAID
> > 
> > ############################################################### END 2021.10.28 0545A ##
> > 
> > 
> > 
> > 
> > 
> > 
> > On Mon, 2023-05-22 at 15:51 +0800, Yu Kuai wrote:
> > > Hi,
> > > 
> > > 在 2023/05/22 14:56, raid 写道:
> > > > Hi,
> > > > Thanks for the guidance as the current state has at least changed somewhat.
> > > > 
> > > > BTW Sorry about Life getting in the way of tech. =) Reason for my delayed response.
> > > > 
> > > > -sudo mdadm -I /dev/sdc1
> > > > mdadm: /dev/sdc1 attached to /dev/md480, not enough to start (1).
> > > > -sudo mdadm -D /dev/md480
> > > > /dev/md480:
> > > >             Version : 1.2
> > > >          Raid Level : raid0
> > > >       Total Devices : 1
> > > >         Persistence : Superblock is persistent
> > > > 
> > > >               State : inactive
> > > >     Working Devices : 1
> > > > 
> > > >       Delta Devices : 1, (-1->0)
> > > >           New Level : raid5
> > > >          New Layout : left-symmetric
> > > >       New Chunksize : 512K
> > > > 
> > > >                Name : GRANDSLAM:480
> > > >                UUID : 20211025:02005a7a:5a7abeef:cafebabe
> > > >              Events : 78714
> > > > 
> > > >      Number   Major   Minor   RaidDevice
> > > > 
> > > >         -       8       33        -        /dev/sdc1
> > > > -sudo mdadm -I /dev/sdd1
> > > > mdadm: /dev/sdd1 attached to /dev/md480, not enough to start (2).
> > > > -sudo mdadm -D /dev/md480
> > > > /dev/md480:
> > > >             Version : 1.2
> > > >          Raid Level : raid0
> > > >       Total Devices : 2
> > > >         Persistence : Superblock is persistent
> > > > 
> > > >               State : inactive
> > > >     Working Devices : 2
> > > > 
> > > >       Delta Devices : 1, (-1->0)
> > > >           New Level : raid5
> > > >          New Layout : left-symmetric
> > > >       New Chunksize : 512K
> > > > 
> > > >                Name : GRANDSLAM:480
> > > >                UUID : 20211025:02005a7a:5a7abeef:cafebabe
> > > >              Events : 78714
> > > > 
> > > >      Number   Major   Minor   RaidDevice
> > > > 
> > > >         -       8       49        -        /dev/sdd1
> > > >         -       8       33        -        /dev/sdc1
> > > > -sudo mdadm -I /dev/sde1
> > > > mdadm: /dev/sde1 attached to /dev/md480, not enough to start (2).
> > > > -sudo mdadm -D /dev/md480
> > > > /dev/md480:
> > > >             Version : 1.2
> > > >          Raid Level : raid0
> > > >       Total Devices : 3
> > > >         Persistence : Superblock is persistent
> > > > 
> > > >               State : inactive
> > > >     Working Devices : 3
> > > > 
> > > >       Delta Devices : 1, (-1->0)
> > > >           New Level : raid5
> > > >          New Layout : left-symmetric
> > > >       New Chunksize : 512K
> > > > 
> > > >                Name : GRANDSLAM:480
> > > >                UUID : 20211025:02005a7a:5a7abeef:cafebabe
> > > >              Events : 78712
> > > > 
> > > >      Number   Major   Minor   RaidDevice
> > > > 
> > > >         -       8       65        -        /dev/sde1
> > > >         -       8       49        -        /dev/sdd1
> > > >         -       8       33        -        /dev/sdc1
> > > > -sudo mdadm -I /dev/sdf1
> > > > mdadm: /dev/sdf1 attached to /dev/md480, not enough to start (3).
> > > > -sudo mdadm -D /dev/md480
> > > > /dev/md480:
> > > >             Version : 1.2
> > > >          Raid Level : raid0
> > > >       Total Devices : 4
> > > >         Persistence : Superblock is persistent
> > > > 
> > > >               State : inactive
> > > >     Working Devices : 4
> > > > 
> > > >       Delta Devices : 1, (-1->0)
> > > >           New Level : raid5
> > > >          New Layout : left-symmetric
> > > >       New Chunksize : 512K
> > > > 
> > > >                Name : GRANDSLAM:480
> > > >                UUID : 20211025:02005a7a:5a7abeef:cafebabe
> > > >              Events : 78714
> > > > 
> > > >      Number   Major   Minor   RaidDevice
> > > > 
> > > >         -       8       81        -        /dev/sdf1
> > > >         -       8       65        -        /dev/sde1
> > > >         -       8       49        -        /dev/sdd1
> > > >         -       8       33        -        /dev/sdc1
> > > > -sudo mdadm -R /dev/md480
> > > > mdadm: failed to start array /dev/md480: Input/output error
> > > > ---
> > > > NOTE: Of additional interest...
> > > > ---
> > > > -sudo mdadm -D /dev/md480
> > > > /dev/md480:
> > > >             Version : 1.2
> > > >       Creation Time : Tue Oct 26 14:06:53 2021
> > > >          Raid Level : raid5
> > > >       Used Dev Size : 18446744073709551615
> > > >        Raid Devices : 5
> > > >       Total Devices : 3
> > > >         Persistence : Superblock is persistent
> > > > 
> > > >         Update Time : Thu May  4 14:39:03 2023
> > > >               State : active, FAILED, Not Started
> > > >      Active Devices : 3
> > > >     Working Devices : 3
> > > >      Failed Devices : 0
> > > >       Spare Devices : 0
> > > > 
> > > >              Layout : left-symmetric
> > > >          Chunk Size : 512K
> > > > 
> > > > Consistency Policy : unknown
> > > > 
> > > >       Delta Devices : 1, (4->5)
> > > > 
> > > >                Name : GRANDSLAM:480
> > > >                UUID : 20211025:02005a7a:5a7abeef:cafebabe
> > > >              Events : 78714
> > > > 
> > > >      Number   Major   Minor   RaidDevice State
> > > >         -       0        0        0      removed
> > > >         -       0        0        1      removed
> > > >         -       0        0        2      removed
> > > >         -       0        0        3      removed
> > > >         -       0        0        4      removed
> > > > 
> > > >         -       8       81        3      sync   /dev/sdf1
> > > >         -       8       49        1      sync   /dev/sdd1
> > > >         -       8       33        0      sync   /dev/sdc1
> > > 
> > > So the reason that this array can't start is that /dev/sde1 is not
> > > recognized as RaidDevice 2, and there are two RaidDevice missing for
> > > a raid5.
> > > 
> > > Sadly I have no idea to workaroud this, sb metadate seems to be broken.
> > > 
> > > Thanks,
> > > Kuai
> > > > ---
> > > > -watch -c -d -n 1 cat /proc/mdstat
> > > > ---
> > > > Every 1.0s: cat /proc/mdstat                                                     OAK2023: Mon May 22 01:48:24
> > > > 2023
> > > > 
> > > > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> > > > md480 : inactive sdf1[4] sdd1[1] sdc1[0]
> > > >        46877239294 blocks super 1.2
> > > > 
> > > > unused devices: <none>
> > > > ---
> > > > Hopeful that is some progress towards an array start? It's definately unexpected output to me.
> > > > I/O Error starting md480
> > > > 
> > > > Thanks!
> > > > SA
> > > > 
> > > > On Thu, 2023-05-18 at 11:15 +0800, Yu Kuai wrote:
> > > > 
> > > > > I have no idle why other disk shows that device 2 is missing, and what
> > > > > is device 4.
> > > > > 
> > > > > Anyway, can you try the following?
> > > > > 
> > > > > mdadm -I /dev/sdc1
> > > > > mdadm -D /dev/mdxxx
> > > > > 
> > > > > mdadm -I /dev/sdd1
> > > > > mdadm -D /dev/mdxxx
> > > > > 
> > > > > mdadm -I /dev/sde1
> > > > > mdadm -D /dev/mdxxx
> > > > > 
> > > > > mdadm -I /dev/sdf1
> > > > > mdadm -D /dev/mdxxx
> > > > > 
> > > > > If above works well, you can try:
> > > > > 
> > > > > mdadm -R /dev/mdxxx, and see if the array can be started.
> > > > > 
> > > > > Thanks,
> > > > > Kuai
> > > > 
> > > > .
> > > > 




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux