Re: dmesg deluge: RAID1 conf printout

NeilBrown <neilb@xxxxxxx> · Mon, 23 Apr 2012 09:48:53 +1000

On Sun, 22 Apr 2012 19:21:11 +0200 Jan Ceuleers <jan.ceuleers@xxxxxxxxxxxx>
wrote:

> NeilBrown wrote:
> > Looks like:
> >
> >     commit 7bfec5f35c68121e7b1849f3f4166dd96c8da5b3
> >
> > is at fault.  It causes md to attempt to add spares into the array more often.
> > Would I be right in guessing that you have one spare in this array?
> > If you remove the spare, the messages should stop.
> 
> Hmmm. The commit message is as follows:
> 
> commit 7bfec5f35c68121e7b1849f3f4166dd96c8da5b3
> Author: NeilBrown <neilb@xxxxxxx>
> Date:   Fri Dec 23 10:17:53 2011 +1100
> 
>      md/raid5: If there is a spare and a want_replacement device, start 
> replaceme
> 
>      When attempting to add a spare to a RAID[456] array, also consider
>      adding it as a replacement for a want_replacement device.
> 
>      This requires that common md code attempt hot_add even when the array
>      is not formally degraded.
> 
>      Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx>
>      Signed-off-by: NeilBrown <neilb@xxxxxxx>
> 
> 
> Does this also apply to RAID1 (which is all I've got on this machine: no 
> RAID456)?

Yes it does apply to RAID1.  Part of the patch was RAID5-specific but part of
it was to common code that would affect other levels.  That part was not
meant to be a big change, but it turned out to be a little bigger than I
expected.

The following should fix it.

Thanks again for the report,
NeilBrown


From 321f820a905993f694f7ba4347492e9273831813 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@xxxxxxx>
Date: Mon, 23 Apr 2012 09:46:28 +1000
Subject: [PATCH] md: don't call ->add_disk unless there is good reason.

Commit 7bfec5f35c68121e7b18

   md/raid5: If there is a spare and a want_replacement device, start replacement.

cause md_check_recovery to call ->add_disk much more often.
Instead of only when the array is degraded, it is now called whenever
md_check_recovery finds anything useful to do, which includes
updating the metadata for clean<->dirty transition.
This causes unnecessary work, and causes info messages from ->add_disk
to be reported much too often.

So refine md_check_recovery to only do any actual recovery checking
(including ->add_disk) if MD_RECOVERY_NEEDED is set.

This fix is suitable for 3.3.y:

Cc: stable@xxxxxxxxxxxxxxx
Reported-by: Jan Ceuleers <jan.ceuleers@xxxxxxxxxxxx>
Signed-off-by: NeilBrown <neilb@xxxxxxx>

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9524192..47f1fdb6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7813,14 +7813,14 @@ void md_check_recovery(struct mddev *mddev)
 		 * any transients in the value of "sync_action".
 		 */
 		set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-		clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 		/* Clear some bits that don't mean anything, but
 		 * might be left set
 		 */
 		clear_bit(MD_RECOVERY_INTR, &mddev->recovery);
 		clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
 
-		if (test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
+		if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
+		    test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
 			goto unlock;
 		/* no recovery is running.
 		 * remove any failed drives, then
Attachment:
signature.asc

Description: PGP signature