On Tuesday February 24, piergiorgio.sartor@xxxxxxxx wrote: > Hi, > > > I'll wait for these details before I start hunting further. > > OK, here we are. > Some forewords, the last disk to fail at boot was > /dev/sda, this data was collected after a "clean" > add of the /dev/sda3 to the RAID. > This means the superblock was zeroed and the device > added, so it should be clean. > .... Thanks a lot for that. > > Now, one thing I do not understand, but maybe it is > anyway OK, and it is this last line: > > Bitmap : 945199 bits (chunks), 524289 dirty (55.5%) > > Because the array status was fully recovered (in sync) > and /dev/sdb3 showed: > > Bitmap : 945199 bits (chunks), 1 dirty (0.0%) > > Confirmed somehow by /proc/mdstat > > How it could be 55.5% dirty? Is this expected? This is a bug. Is fixed by a patch that I have queued for 2.6.30. As it doesn't cause a crash or data corruption, it doesn't get to jump the queue. It is very small though: --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -266,7 +266,6 @@ static mdk_rdev_t *next_active_rdev(mdk_rdev_t *rdev, mddev_t *mddev) list_for_each_continue_rcu(pos, &mddev->disks) { rdev = list_entry(pos, mdk_rdev_t, same_set); if (rdev->raid_disk >= 0 && - test_bit(In_sync, &rdev->flags) && !test_bit(Faulty, &rdev->flags)) { /* this is a usable devices */ atomic_inc(&rdev->nr_pending); I'm fairly use I have found the bug that caused the problem you first noticed. It was introduced in 2.6.25. Below are two patches for raid10 which I have just submitted for 2.6.29 (As they can cause data corruption and so can jump the queue). The first solves your problem. The second solves a similar situation when the bitmap chunk size is smaller. If you are able to test and confirm, that would be great. Thanks a lot for reporting the problem and following through! NeilBrown From: NeilBrown <neilb@xxxxxxx> Subject: [PATCH 2/2] md/raid10: Don't call bitmap_cond_end_sync when we are doing recovery. Date: Wed, 25 Feb 2009 13:38:19 +1100 Message-ID: <20090225023819.28579.20247.stgit@xxxxxxxxxxxxxx> In-Reply-To: <20090225023819.28579.71372.stgit@xxxxxxxxxxxxxx> References: <20090225023819.28579.71372.stgit@xxxxxxxxxxxxxx> User-Agent: StGIT/0.14.2 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit For raid1/4/5/6, resync (fixing inconsistencies between devices) is very similar to recovery (rebuilding a failed device onto a spare). The both walk through the device addresses in order. For raid10 it can be quite different. resync follows the 'array' address, and makes sure all copies are the same. Recover walks through 'device' addresses and recreates each missing block. The 'bitmap_cond_end_sync' function allows the write-intent-bitmap (When present) to be updated to reflect a partially completed resync. It makes assumptions which mean that it does not work correctly for raid10 recovery at all. In particularly, it can cause bitmap-directed recovery of a raid10 to not recovery some of the blocks that need to be recovered. So move the call to bitmap_cond_end_sync into the resync path, rather than being in the common "resync or recovery" path. Cc: stable@xxxxxxxxxx Signed-off-by: NeilBrown <neilb@xxxxxxx> --- drivers/md/raid10.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 118f89e..e1feb87 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -1749,8 +1749,6 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i if (!go_faster && conf->nr_waiting) msleep_interruptible(1000); - bitmap_cond_end_sync(mddev->bitmap, sector_nr); - /* Again, very different code for resync and recovery. * Both must result in an r10bio with a list of bios that * have bi_end_io, bi_sector, bi_bdev set, @@ -1886,6 +1884,8 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i /* resync. Schedule a read for every block at this virt offset */ int count = 0; + bitmap_cond_end_sync(mddev->bitmap, sector_nr); + if (!bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, mddev->degraded) && !conf->fullsync && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) { From: NeilBrown <neilb@xxxxxxx> Subject: [PATCH 1/2] md/raid10: Don't skip more than 1 bitmap-chunk at a time during recovery. Date: Wed, 25 Feb 2009 13:38:19 +1100 Message-ID: <20090225023819.28579.71372.stgit@xxxxxxxxxxxxxx> User-Agent: StGIT/0.14.2 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit When doing recovery on a raid10 with a write-intent bitmap, we only need to recovery chunks that are flagged in the bitmap. However if we choose to skip a chunk as it isn't flag, the code currently skips the whole raid10-chunk, thus it might not recovery some blocks that need recovering. This patch fixes it. In case that is confusing, it might help to understand that there is a 'raid10 chunk size' which guides how data is distributed across the devices, and a 'bitmap chunk size' which says how much data corresponds to a single bit in the bitmap. This bug only affects cases where the bitmap chunk size is smaller than the raid10 chunk size. Cc: stable@xxxxxxxxxx Signed-off-by: NeilBrown <neilb@xxxxxxx> --- drivers/md/raid10.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 6736d6d..118f89e 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2010,13 +2010,13 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i /* There is nowhere to write, so all non-sync * drives must be failed, so try the next chunk... */ - { - sector_t sec = max_sector - sector_nr; - sectors_skipped += sec; + if (sector_nr + max_sync < max_sector) + max_sector = sector_nr + max_sync; + + sectors_skipped += (max_sector - sector_nr); chunks_skipped ++; sector_nr = max_sector; goto skipped; - } } static int run(mddev_t *mddev) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html