Re: [PATCH] md: Fix bug where new drives added to an md array sometimes don't sync properly.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Thanks Neil,

I just gave this patched module a shot on four systems. So far, I haven't seen the device number inappropriately increment, though as per a mail I sent a short while ago that seemed remedied by using the 1.2 superblock, for some reason. However, it appears to have introduced a new issue, and another is unresolved by it:



// BUG 1
The single-command syntax to fail and remove a drive is still failing, I do not know if this is somehow contributing to the further (new) issues below:

  [root@gtmp06 tmp]# mdadm /dev/md0 --fail /dev/dm-0 --remove /dev/dm-0
  mdadm: set /dev/dm-0 faulty in /dev/md0
  mdadm: hot remove failed for /dev/dm-0: Device or resource busy

  [root@gtmp06 tmp]# mdadm /dev/md0 --remove /dev/dm-0
  mdadm: hot removed /dev/dm-0


// BUG 2
Now, upon adding or re-adding a "fail...remove"'d drive, it is not used for resync. I realized previously that added drives weren't re-synced until the existing array build was done, then they were grabbed. This however is a clean/active array that is rejecting the drive.

I've performed this identically on both a clean & active array, as well as a newly-created (resync'ing) array, to the same effect. Even after rebuild or reboot, the removed drive isn't taken back and remains listed as a "faulty spare", with dmesg indicating that it is "non-fresh".




// DMESG:

md: kicking non-fresh dm-0 from array!


// ARRAY status 'mdadm -D /dev/md0'

          State : active, degraded
 Active Devices : 13
Working Devices : 13
 Failed Devices : 1
  Spare Devices : 0

         Layout : near=1, offset=2
     Chunk Size : 512K

           Name : 0
           UUID : 05c2faf4:facfcad3:ba33b140:100f428a
         Events : 22

    Number   Major   Minor   RaidDevice State
       0     253        1        0      active sync   /dev/dm-1
       1     253        2        1      active sync   /dev/dm-2
       2     253        5        2      active sync   /dev/dm-5
       3     253        4        3      active sync   /dev/dm-4
       4     253        6        4      active sync   /dev/dm-6
       5     253        3        5      active sync   /dev/dm-3
       6     253       13        6      active sync   /dev/dm-13
       7       0        0        7      removed
       8     253        7        8      active sync   /dev/dm-7
       9     253        8        9      active sync   /dev/dm-8
      10     253        9       10      active sync   /dev/dm-9
      11     253       11       11      active sync   /dev/dm-11
      12     253       10       12      active sync   /dev/dm-10
      13     253       12       13      active sync   /dev/dm-12

       7     253        0        -      faulty spare   /dev/dm-0




Let me know what more I can do to help track this down. I'm reverting this patch, since it is behaving less-well than before. Will be happy to try others.

Attached are typescript of the drive remove/add sessions and all output.


/eli


Neil Brown wrote:
On Friday October 6, estair@xxxxxxx wrote:
 >
 > This patch has resolved the immediate issue I was having on 2.6.18 with
 > RAID10.  Previous to this change, after removing a device from the array
 > (with mdadm --remove), physically pulling the device and
 > changing/re-inserting, the "Number" of the new device would be
 > incremented on top of the highest-present device in the array.  Now, it
 > resumes its previous place.
 >
 > Does this look to be 'correct' output for a 14-drive array, which dev 8
 > was failed/removed from then "add"'ed?  I'm trying to determine why the
 > device doesn't get pulled back into the active configuration and
 > re-synced.  Any comments?

Does this patch help?



Fix count of degraded drives in raid10.


Signed-off-by: Neil Brown <neilb@xxxxxxx>

### Diffstat output
 ./drivers/md/raid10.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c
--- .prev/drivers/md/raid10.c   2006-10-09 14:18:00.000000000 +1000
+++ ./drivers/md/raid10.c       2006-10-05 20:10:07.000000000 +1000
@@ -2079,7 +2079,7 @@ static int run(mddev_t *mddev)
                disk = conf->mirrors + i;

                if (!disk->rdev ||
-                   !test_bit(In_sync, &rdev->flags)) {
+                   !test_bit(In_sync, &disk->rdev->flags)) {
                        disk->head_position = 0;
                        mddev->degraded++;
                }


NeilBrown


Attachment: gzKY3Inxrxoy.gz
Description: GNU Zip compressed data

Attachment: gzHKRDSqUyeA.gz
Description: GNU Zip compressed data


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux