Re: [Solved] Re: FC5 S/W Raid Rebuilding to Infiinity(and beyond!)

Nigel Wade <nmw@xxxxxxxxxxxx> · Thu, 23 Nov 2006 09:27:26 +0000

Sean Bruno wrote:
You have found yourself in the same situation I found myself in recently. 
Actually my situation was slightly different, but the resulting problem is the 
same. In my case at re-boot md decided that one partition of a mirror was out of 
sync, and so initiated a re-sync with the other partition. However, the 
partition which was active contained a bad sector, so the re-sync failed, over 
and over and over..., just like yours is doing.

In order to fix my system I used the following steps.

The first step is to take the offending filesystem offline. Then I copied the 
existing partition onto the good disk using dd, with the noerror option so it 
would continue past read errors. In my case I knew that the read error was not 
part of the actual filesystem in use because it passed fsck. When the copy was 
complete I ran fsck on the new filesystem just to be sure it had copied ok.

After this I created a new RAID consisting of just the good partition (in my 
case the RAID was md1 and the new partition was sda3):
  # mdadm -C /dev/md1 --force -n 1 -l 1 /dev/sda3

As a temporary fix, until a new disk arrived, I ran
   # e2fsk -c -d -f /dev/sdb3
to mark back blocks (sdb3 was the failing partition).
Then I ran:
   # mdadm --zero-superblock /dev/sdb3
to remove the md superblock from the partition so it was no longer part of a RAID.

Finally, I used mdadm to add the dodgy partition back into the RAID:

# mdadm -a /dev/md1 /dev/sdb3

and to grow the RAID to 2 partitions:

# mdadm --grow -n 2 /dev/md1

Thanks for the assistance with this Nigel.  I was able to recover from
this 'double' failure with your procedure.  I had purchased 2 new disks
in order to replace the failed drives and I am back up at this time.

Sean

You may want to do some additional testing to verify the status of the new 
filesystem. In my original message I implied that fsck was sufficient, but as 
Tony quite rightly pointed out, it isn't. On my failing disk I knew that the bad 
block wasn't part of the active filesystem, so a simple copy/fsck was 
sufficient. During the copy there were no errors, and a comparison of the two 
filesystems showed no discrepancies.

When you copied your filesystem, did the system generate any error messages? If 
so, you will probably want to investigate which file the bad block belonged to, 
and determine the impact that having that file corrupted might cause, and 
whether you can restore that file from a backup.

--
Nigel Wade, System Administrator, Space Plasma Physics Group,
            University of Leicester, Leicester, LE1 7RH, UK
E-mail :    nmw@xxxxxxxxxxxx
Phone :     +44 (0)116 2523548, Fax : +44 (0)116 2523555

--
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list