Re: [PATCH] drivers/md/md.c: ignore recovery_offset if bitmap exists

Nate Dailey <nate.dailey@xxxxxxxxxxx> · Fri, 30 Oct 2015 09:30:32 -0400

I first tested 4.3-rc6 that I already had laying around, and verified that the 
bug still happens.

Then I reverted 7eb418851f3278de67126ea0c427641ab4792c57, rebuilt & installed, 
and tested again. Reverting this patch did indeed fix the bug.

Thank you!

Nate



On 10/29/2015 10:51 PM, Neil Brown wrote:
On Sat, Aug 15 2015, Nate Dailey wrote:

I hate to nag... but looking for feedback on this change, which addresses what
seems to me to be a serious bug.
Being a nag is good.  I don't have the earlier emails in my inbox - I
wonder what happened to them.... and for some reason this one was marked
"read".
But it arrived about when I converted over to notmuch and just before I
went on 3 weeks leave...

Anyway, Jes just poked me so I'm looking now.

Thanks,
Nate




On 07/29/2015 04:46 PM, Joe Lawrence wrote:
On 07/28/2015 03:28 PM, Nate Dailey wrote:
If a bitmap recovery is interrupted and later restarted, then
sectors below the recovery offset, written between interruption
and resumption, will not be copied. This results in corruption.

See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777511
for a script that can be used to repro this.

Seems like ignoring the recovery_offset if a bitmap exists is
the way to go.
This doesn't feel like the right solution.
Why does the presence of a bitmap affect the validity of
->recovery_offset.

Surely recovery_offset should always be reliable and we should always
use it.  Maybe it isn't being updated correctly in some situation when a
bitmap is present.

Does it ever make sense to honour the recovery-offset when a device is
re-added?
I don't think it does....

Oh.  Look what I found.
commit 7eb418851f3278de67126ea0c427641ab4792c57
Author: NeilBrown <neilb@xxxxxxx>
Date:   Tue Jan 14 15:55:14 2014 +1100

     md: allow a partially recovered device to be hot-added to an array.

...
-               rdev->recovery_offset = 0;
+               if (rdev->saved_raid_disk < 0)
+                       rdev->recovery_offset = 0;


we used to clear recovery_offset for a re-add, but we don't any more.
I guess this patch introduced the bug.

I cannot find anything in my mail logs to suggest why I wrote that
patch.

Right now I cannot think of any real justification for that patch.
Could someone please test to see if reverting that patch fixes the
problem?

sorry for the delay in getting to this.

Thanks.
NeilBrown



Signed-off-by: Nate Dailey <nate.dailey@xxxxxxxxxxx>
---
   drivers/md/md.c | 24 +++++++++++++-----------
   1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0c2a4e8..79c6285 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7738,16 +7738,18 @@ void md_do_sync(struct md_thread *thread)
   	else {
   		/* recovery follows the physical size of devices */
   		max_sectors = mddev->dev_sectors;
-		j = MaxSector;
-		rcu_read_lock();
-		rdev_for_each_rcu(rdev, mddev)
-			if (rdev->raid_disk >= 0 &&
-			    !test_bit(Faulty, &rdev->flags) &&
-			    !test_bit(In_sync, &rdev->flags) &&
-			    rdev->recovery_offset < j)
-				j = rdev->recovery_offset;
-		rcu_read_unlock();
-
+		/* we don't use the offset if there's a bitmap */
+		if (!mddev->bitmap) {
+			j = MaxSector;
+			rcu_read_lock();
+			rdev_for_each_rcu(rdev, mddev)
+				if (rdev->raid_disk >= 0 &&
+				    !test_bit(Faulty, &rdev->flags) &&
+				    !test_bit(In_sync, &rdev->flags) &&
+				    rdev->recovery_offset < j)
+					j = rdev->recovery_offset;
+			rcu_read_unlock();
+		}
   		/* If there is a bitmap, we need to make sure all
   		 * writes that started before we added a spare
   		 * complete before we start doing a recovery.
@@ -7756,7 +7758,7 @@ void md_do_sync(struct md_thread *thread)
   		 * recovery has checked that bit and skipped that
   		 * region.
   		 */
-		if (mddev->bitmap) {
+		else {
   			mddev->pers->quiesce(mddev, 1);
   			mddev->pers->quiesce(mddev, 0);
   		}

[+cc Ben & Cyril from the Debian bug report]

-- Joe

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html