Re: weird issues with raid1

"Jon Nelson" <jnelson-linux-raid@xxxxxxxxxxx> · Mon, 15 Dec 2008 15:47:40 -0600

On Mon, Dec 15, 2008 at 3:33 PM, Neil Brown <neilb@xxxxxxx> wrote:
> On Monday December 15, jnelson-linux-raid@xxxxxxxxxxx wrote:
>>
>> However, it raises a question: bitmaps are about 'resync' not
>> 'recovery'?  How do they differ?
>
> With resync, the expectation is that most of the device is correct.
> The bitmap tells us which sectors aren't, and we just resync those.
>
> With recover, the expectation is that the entire drive contains
> garbage and it has to be recovered from beginning to end.
>
> Each device has a flag to say where the device is in sync write the
> array.  The bit map records which sectors of "in-sync" devices may not
> actually in in-sync at the moment.
> 'resync' synchronises the 'in-sync' devices.
> 'recovery' synchronises a 'not-in-sync' device.b
>
>
>>
>> >> Question 1:
>> >> I'm using a bitmap. Why does the rebuild start completely over?
>> >
>> > Because the bitmap isn't used to guide a rebuild, only a resync.
>> >
>> > The effect of --re-add is to make md do a resync rather than a rebuild
>> > if the device was previously a fully active member of the array.
>>
>> Aha!  This explains a question I raised in another email. What
>> happened there is a previously fully active member of the raid got
>> added, somehow, as a spare, via --incremental. That's when the entire
>> raid thought it needed to be rebuilt. How did that (the device being
>> treated as a spare instead of as a previously fully active member)
>> happen?
>
> It is hard to guess without details, and they might be hard to collect
> after the fact.
> Maybe if you have the kernel logs of when the server rebooted and the
> recovery started, that might contain some hints.

I hope this helps.

Prior to the reboot:

Dec 15 15:19:39 turnip kernel: md: md11: recovery done.
Dec 15 15:19:39 turnip kernel: RAID1 conf printout:
Dec 15 15:19:39 turnip kernel:  --- wd:2 rd:2
Dec 15 15:19:39 turnip kernel:  disk 0, wo:0, o:1, dev:nbd0
Dec 15 15:19:39 turnip kernel:  disk 1, wo:0, o:1, dev:sda

During booting:

<6>raid1: raid set md11 active with 1 out of 2 mirrors
<6>md11: bitmap initialized from disk: read 1/1 pages, set 1 bits
<6>created bitmap (10 pages) for device md11

After boot:

Dec 15 15:34:38 turnip kernel: md: bind<nbd0>
Dec 15 15:34:38 turnip kernel: RAID1 conf printout:
Dec 15 15:34:38 turnip kernel:  --- wd:1 rd:2
Dec 15 15:34:38 turnip kernel:  disk 0, wo:1, o:1, dev:nbd0
Dec 15 15:34:38 turnip kernel:  disk 1, wo:0, o:1, dev:sda
Dec 15 15:34:38 turnip kernel: md: recovery of RAID array md11
Dec 15 15:34:38 turnip kernel: md: minimum _guaranteed_  speed: 1000
KB/sec/disk.
Dec 15 15:34:38 turnip kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for recovery.
Dec 15 15:34:38 turnip kernel: md: using 128k window, over a total of
78123988 blocks.

/dev/nbd0 was added via --incremental (mdadm 3.0)

--detail:

/dev/md11:
        Version : 01.00.03
  Creation Time : Mon Dec 15 07:06:13 2008
     Raid Level : raid1
     Array Size : 78123988 (74.50 GiB 80.00 GB)
  Used Dev Size : 156247976 (149.01 GiB 160.00 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 11
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Dec 15 15:35:17 2008
          State : active, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 9% complete

           Name : turnip:11  (local to host turnip)
           UUID : cf24d099:9e174a79:2a2f6797:dcff1420
         Events : 3914

    Number   Major   Minor   RaidDevice State
       2      43        0        0      spare rebuilding   /dev/nbd0
       3       8        0        1      active sync   /dev/sda

turnip:~ # mdadm --examine /dev/sda
/dev/sda:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : cf24d099:9e174a79:2a2f6797:dcff1420
           Name : turnip:11  (local to host turnip)
  Creation Time : Mon Dec 15 07:06:13 2008
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 160086384 (76.34 GiB 81.96 GB)
     Array Size : 156247976 (74.50 GiB 80.00 GB)
  Used Dev Size : 156247976 (74.50 GiB 80.00 GB)
   Super Offset : 160086512 sectors
          State : clean
    Device UUID : 0059434c:ecef51a0:2974482d:ba38f944

Internal Bitmap : 2 sectors from superblock
    Update Time : Mon Dec 15 15:45:21 2008
       Checksum : 21396863 - correct
         Events : 3916

    Array Slot : 3 (failed, failed, empty, 1)
   Array State : _U 2 failed
turnip:~ #

turnip:~ # mdadm --examine /dev/nbd0
/dev/nbd0:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : cf24d099:9e174a79:2a2f6797:dcff1420
           Name : turnip:11  (local to host turnip)
  Creation Time : Mon Dec 15 07:06:13 2008
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 160086384 (76.34 GiB 81.96 GB)
     Array Size : 156247976 (74.50 GiB 80.00 GB)
  Used Dev Size : 156247976 (74.50 GiB 80.00 GB)
   Super Offset : 160086512 sectors
          State : clean
    Device UUID : 01524a75:c309869c:6da972c9:084115c6

Internal Bitmap : 2 sectors from superblock
      Flags : write-mostly
    Update Time : Mon Dec 15 15:45:21 2008
       Checksum : 63bab8ce - correct
         Events : 3916

    Array Slot : 2 (failed, failed, empty, 1)
   Array State : _u 2 failed
turnip:~ #

Thanks!!

-- 
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html