RE: Failed adadm RAID array after aborted Grown operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Roger.

My apologies for not replying earlier.  By the time I read this I already 
had a reshape underway to reduce the size of the array back to the original 
30 disks.  So far it seems to be progressing OK although the ETA is around 
10 days which is why I didn’t respond sooner – I’ve been bury dealing with 
the fallout from this.

Do I understand that you would recommend upgrading our installation of Linux 
once the repair is complete or are advising downloading and compiling a new 
kernel as part of the repair?  Or are you suggesting that it was the fact 
that we’re on such an old version of CentOS that caused this mess?  I ask 
because once this is repaired (assuming it does complete successfully), I 
would like to extend the array to the full 45 drives of which this server is 
capable

Thanks,
Bob

From: Roger Heflin <rogerheflin@xxxxxxxxx>
Sent: Monday, 9 May 2022 9:05 PM
To: Wols Lists <antlists@xxxxxxxxxxxxxxx>
Cc: Bob Brand <brand@xxxxxxxxxxxxxxx>; Linux RAID 
<linux-raid@xxxxxxxxxxxxxxx>; Phil Turmel <philip@xxxxxxxxxx>; NeilBrown 
<neilb@xxxxxxxx>
Subject: Re: Failed adadm RAID array after aborted Grown operation

The short term easiest way for a new kernel might be this.

Download a Fedora 35 livecd and boot from it.  It will allow you to turn on 
the raid and/or reshape the raid and/or abort the reshape using the fedora 
35 kernel and mdadm tools.    Though all of this will need to be done 
manually from either the gui and/or command line, so it will be somewhat of 
a pain.

The other choice is to download/compile/install a current http://kernel.org 
kernel.  This takes some time (you have to install compiler/header rpms), 
and follow this 
(https://docs.rockylinux.org/guides/custom-linux-kernel/)--rockylinux so a 
redhat clone list of instructions.  How long it takes will depend on the 
number of cpus your machine has and the value after the -j<cpustouse>. 
The biggest issue with this will likely be dealing with compile errors for 
missing dependencies you get for this or that tool and/or devel package 
being missing.   And then you would still need to download the newest mdadm 
and compile and install it.   These steps will take longer, but doing this 
will get your system on a new kernel and new tools, and typically once you 
know how to do this, this process of compiling/installing a kernel has for 
the most part not changed in a long time.  And I have been doing this on and 
off for 20+ years and newer kernel on older userspace is widely used by a 
lot of the kernel developers so is generally well testing and in my 
experience just works to get you on a new kernel with minimal trouble.



On Mon, May 9, 2022 at 5:24 AM Wols Lists <mailto:antlists@xxxxxxxxxxxxxxx> 
wrote:
On 09/05/2022 01:09, Bob Brand wrote:
> Hi Wol,
>
> My apologies for continually bothering you but I have a couple of 
> questions:

Did you read the links I sent you?
>
> 1. How do I overcome the error message "mount: /dev/md125: can't read
> superblock."  Do it use fsck?
>
> 2. The removed disk is showing as "   -   0   0   30   removed". Is it 
> safe
> to use "mdadm /dev/md2 -r detached" or "mdadm /dev/md2 -r failed" to
> overcome this?

I don't know :-( This is getting a bit out of my depth. But I'm
SERIOUSLY concerned you're still futzing about with CentOS 7!!!

Why didn't you download CentOS 8.5? Why didn't you download RHEL 8.5, or
the latest Fedora? Why didn't you download SUSE SLES 15?

Any and all CentOS 7 will come with either an out-of-date mdadm, or a
Frankenkernel. NEITHER are a good idea.

Go back to the links I gave you, download and run lsdrv, and post the
output here. Hopefully somebody will tell you the next steps. I will do
my best.
>
> Thank you!
>
Cheers,
Wol
>
> -----Original Message-----
> From: Bob Brand <mailto:brand@xxxxxxxxxxxxxxx>
> Sent: Monday, 9 May 2022 9:33 AM
> To: Bob Brand <mailto:brand@xxxxxxxxxxxxxxx>; Wol 
> <mailto:antlists@xxxxxxxxxxxxxxx>;
> mailto:linux-raid@xxxxxxxxxxxxxxx
> Cc: Phil Turmel <mailto:philip@xxxxxxxxxx>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> I just tried it again with the --invalid_backup switch and it's now 
> showing
> the State as "clean, degraded".and it's showing all the disks except for 
> the
> suspect one that I removed.
>
> I'm unable to mount it and see the contents. I get the error "mount:
> /dev/md125: can't read superblock."
>
> Is there more that I need to do?
>
> Thanks
>
>
> -----Original Message-----
> From: Bob Brand <mailto:brand@xxxxxxxxxxxxxxx>
> Sent: Monday, 9 May 2022 9:02 AM
> To: Bob Brand <mailto:brand@xxxxxxxxxxxxxxx>; Wol 
> <mailto:antlists@xxxxxxxxxxxxxxx>;
> mailto:linux-raid@xxxxxxxxxxxxxxx
> Cc: Phil Turmel <mailto:philip@xxxxxxxxxx>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> Hi Wol,
>
> I've booted to the installation media and I've run the following command:
>
> mdadm
> /dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak
>   --verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19
> /dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak
>    --verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19
>
> But I'm still getting the error:
>
> mdadm: /dev/md125 has an active reshape - checking if critical section 
> needs
> to be restored
> mdadm: No backup metadata on /mnt/sysimage/grow_md125.back
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
>
>
> Should I try the --invalid_backup switch or --force?
>
> Thanks,
> Bob
>
>
> -----Original Message-----
> From: Bob Brand <mailto:brand@xxxxxxxxxxxxxxx>
> Sent: Monday, 9 May 2022 8:19 AM
> To: Wol <mailto:antlists@xxxxxxxxxxxxxxx>; 
> mailto:linux-raid@xxxxxxxxxxxxxxx
> Cc: Phil Turmel <mailto:philip@xxxxxxxxxx>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> OK.  I've downloaded a Centos 7 - 2009 ISO from http://centos.org - that 
> seems to
> be the most recent they have.
>
>
> -----Original Message-----
> From: Wol <mailto:antlists@xxxxxxxxxxxxxxx>
> Sent: Monday, 9 May 2022 8:16 AM
> To: Bob Brand <mailto:brand@xxxxxxxxxxxxxxx>; 
> mailto:linux-raid@xxxxxxxxxxxxxxx
> Cc: Phil Turmel <mailto:philip@xxxxxxxxxx>
> Subject: Re: Failed adadm RAID array after aborted Grown operation
>
> How old is CentOS 7? With that kernel I guess it's quite old?
>
> Try and get a CentOS 8.5 disk. At the end of the day, the version of linux
> doesn't matter. What you need is an up-to-date rescue disk.
> Distro/whatever is unimportant - what IS important is that you are using 
> the
> latest mdadm, and a kernel that matches.
>
> The problem you have sounds like a long-standing but now-fixed bug. An
> original CentOS disk might be okay (with matched kernel and mdadm), but
> almost certainly has what I consider to be a "dodgy" version of mdadm.
>
> If you can afford the downtime, after you've reverted the reshape, I'd try
> starting it again with the rescue disk. It'll probably run fine. Let it
> complete and then your old CentOS 7 will be fine with it.
>
> Cheers,
> Wol
>
> On 08/05/2022 23:04, Bob Brand wrote:
>> Thank Wol.
>>
>> Should I use a CentOS 7 disk or a CentOS disk?
>>
>> Thanks
>>
>> -----Original Message-----
>> From: Wols Lists <mailto:antlists@xxxxxxxxxxxxxxx>
>> Sent: Monday, 9 May 2022 1:32 AM
>> To: Bob Brand <mailto:brand@xxxxxxxxxxxxxxx>; 
>> mailto:linux-raid@xxxxxxxxxxxxxxx
>> Cc: Phil Turmel <mailto:philip@xxxxxxxxxx>
>> Subject: Re: Failed adadm RAID array after aborted Grown operation
>>
>> On 08/05/2022 14:18, Bob Brand wrote:
>>> If you’ve stuck with me and read all this way, thank you and I hope
>>> you can help me.
>>
>> https://raid.wiki.kernel.org/index.php/Linux_Raid
>>
>> Especially
>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
>>
>> What you need to do is revert the reshape. I know what may have
>> happened, and what bothers me is your kernel version, 3.10.
>>
>> The first thing to try is to boot from up-to-date rescue media and see
>> if an mdadm --revert works from there. If it does, your Centos should
>> then bring everything back no problem.
>>
>> (You've currently got what I call a Frankensetup, a very old kernel, a
>> pretty new mdadm, and a whole bunch of patches that does who knows what.
>> You really need a matching kernel and mdadm, and your frankenkernel
>> won't match anything ...)
>>
>> Let us know how that goes ...
>>
>> Cheers,
>> Wol
>>
>>
>>
>> CAUTION!!! This E-mail originated from outside of WMA Water. Do not
>> click links or open attachments unless you recognize the sender and
>> know the content is safe.
>>
>>
>
>
>
> CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
> links or open attachments unless you recognize the sender and know the
> content is safe.
>
>
>




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux