Re: Question about recovery via mdadm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sunday February 16, qralston+ml.linux-raid@andrew.cmu.edu wrote:
> On 2003-02-17 at 09:09:53+1100 Neil Brown <neilb@cse.unsw.edu.au> wrote:
> 
> > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=82815
> > 
> > I think that bug should be fixed by the follow patch which has been
> > submitted and accepted and should be in 2.4.21.
> 
> I already tried backporting the md driver from 2.4.21-pre3 (which
> contains the patch you included).  Unfortunately, not only does it not
> fix the problem, but it makes it worse: with the patch applied, after
> the Oops occurs, touching the md device in any way hangs.  This
> includes the "md: stopping all md devices" which occurs at shutdown,
> so as a result, at shutdown, the entire machine hangs, and you have to
> go physically reset or power cycle the machine.
> 
> I've appended the Oops I generated using the md driver from
> 2.4.21-pre3 in Red Hat's kernel-2.4.18-19.8.0.  This is how I produced
> it:
> 
>     $ mdadm --create /dev/md0 --verbose --level=mirror --raid-devices=2 /dev/sdb1 /dev/sdc1
>     <wait for sync>
>     $ mdadm /dev/md0 -f /dev/sdc1 -r /dev/sdc1 -a /dev/sdc1
>     <wait for sync>
>     $ mdadm /dev/md0 -f /dev/sdb1 -r /dev/sdb1 -a /dev/sdb1
>     <mdrecovery generates Oops>
> 
> As I said before, I'm at a loss to figure out where the bug is, but if
> you have any further things to try, I'd be happy to give them a
> whirl...

Hmmm... you can probasbly alleviate the symptoms with:

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~  2003-02-17 13:24:11.000000000 +1100
+++ ./drivers/md/md.c   2003-02-17 13:24:59.000000000 +1100
@@ -1048,7 +1048,7 @@ repeat:
                        printk("(skipping faulty ");
                if (rdev->alias_device)
                        printk("(skipping alias ");
-               if (disk_faulty(&rdev->sb->this_disk)) {
+               if (!rdev->faulty && disk_faulty(&rdev->sb->this_disk)) {
                        printk("(skipping new-faulty %s )\n",
                               partition_name(rdev->dev));
                        continue;

but the real problem is a lack of locking.  A lot of work went into
2.5 to get the locking right in the md driver and it resulted in a
substantial shake-up of the code.  I do have a patch that does better
locking for 2.4, but it is rather ugly.... maybe I should revisit it.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux