Re: Reliability of bitmapped resync

Piergiorgio Sartor <piergiorgio.sartor@xxxxxxxx> · Tue, 24 Feb 2009 20:39:31 +0100

Hi,

> I'll wait for these details before I start hunting further.

OK, here we are.
Some forewords, the last disk to fail at boot was
/dev/sda, this data was collected after a "clean"
add of the /dev/sda3 to the RAID.
This means the superblock was zeroed and the device
added, so it should be clean.

mdadm --examine /dev/sda3

/dev/sda3:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x1
     Array UUID : b601d547:b62e9563:2c68459c:22db163f
           Name : root
  Creation Time : Tue Feb 10 15:43:09 2009
     Raid Level : raid10
   Raid Devices : 2

 Avail Dev Size : 483941796 (230.76 GiB 247.78 GB)
     Array Size : 483941632 (230.76 GiB 247.78 GB)
  Used Dev Size : 483941632 (230.76 GiB 247.78 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : active
    Device UUID : f3665458:d51d27f5:87724fb8:529f91f1

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Feb 24 09:03:46 2009
       Checksum : 68a2de81 - correct
         Events : 6541

         Layout : near=1, far=2
     Chunk Size : 64K

    Array Slot : 3 (failed, failed, 1, 0)
   Array State : Uu 2 failed

mdadm --examine-bitmap /dev/sda3

        Filename : /dev/sda3
           Magic : 6d746962
         Version : 4
            UUID : b601d547:b62e9563:2c68459c:22db163f
          Events : 6541
  Events Cleared : 6540
           State : OK
       Chunksize : 256 KB
          Daemon : 5s flush period
      Write Mode : Normal
       Sync Size : 241970816 (230.76 GiB 247.78 GB)
          Bitmap : 945199 bits (chunks), 524289 dirty (55.5%)

Now, one thing I do not understand, but maybe it is
anyway OK, and it is this last line:

          Bitmap : 945199 bits (chunks), 524289 dirty (55.5%)

Because the array status was fully recovered (in sync)
and /dev/sdb3 showed:

          Bitmap : 945199 bits (chunks), 1 dirty (0.0%)

Confirmed somehow by /proc/mdstat

How it could be 55.5% dirty? Is this expected?

Further note.

I tested, on an identical PC, with a slightly different
RAID (metadata 1.0 vs. 1.1), the following:

mdadm --fail /dev/md2 /dev/sdb3
wait a little
mdadm --remove /dev/md2 /dev/sdb3
do something to make the bitmap a bit dirty
mdadm --re-add /dev/md2 /dev/sdb3
wait for resync to finish with "watch cat /proc/mdstat"
echo check > /sys/block/md/md2/sync_action
watch cat /proc/mdstat /sys/block/md/md2/mismatch_cnt

Now, immediatly the mismatch count went to something
like 1152 (or similar).
After around 25% of the check it was around 1440,
then I issued an "idle" and re-added the disk cleanly.

This repeats the experience I already had.

This is still a RAID-10 f2, with header 1.0, chunk 64KB
and bitmap chunksize of 16MB (or 16384KB).

Somehow it seems, at least on this setup, that
the bitmap does not track everything or the
resync does not consider all the bitmap chunk.

Thanks,

bye,

-- 

piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html