Re: [RFC][PATCH] md: avoid fullsync if a faulty member missed a dirty transition

Neil Brown <neilb@xxxxxxx> · Fri, 16 May 2008 21:54:31 +1000

On Friday May 9, snitzer@xxxxxxxxx wrote:
> On Fri, May 9, 2008 at 2:01 AM, Neil Brown <neilb@xxxxxxx> wrote:
> >
> > On Friday May 9, snitzer@xxxxxxxxx wrote:
> 
> >  > Unfortunately my testing with this patch results in a full resync.
> >  >
> >  > Here is the state of the array after shutdown:
> >  > # mdadm -X /dev/nbd0 /dev/sdq
> >  >         Filename : /dev/nbd0
> >  >            Magic : 6d746962
> >  >          Version : 4
> >  >             UUID : 7140cc3c:8681416c:12c5668a:984ca55d
> >  >           Events : 896
> >  >   Events Cleared : 897
> >
> >  Events Cleared is *larger* than Events!!! Is that repeatable?  I can
> >  only see it happening if a very small race were lost.  You don't have
> >  any other patches in there do you?
> 
> Yes, it is repeatable with your previous patch.  But with your most
> recent patch I had the following after shutdown:
> 
> # mdadm -X /dev/nbd0 /dev/sdq
>         Filename : /dev/nbd0
>           Events : 1732
>   Events Cleared : 1732
>           Bitmap : 409600 bits (chunks), 1 dirty (0.0%)
> 
>         Filename : /dev/sdq
>           Events : 1736
>   Events Cleared : 1736
>           Bitmap : 409600 bits (chunks), 1 dirty (0.0%)
> 
> Unfortunately sdq's events_cleared appears to have been updated
> _after_ the array became degraded.
> As such a full resync occurred because 1732 < 1736.
> 
> >  This patch should close the race, though I still find it hard to
> >  believe that you lost the race.
> 
> Comments inlined below.
> 
> >  Signed-off-by: Neil Brown <neilb@xxxxxxx>
> >
> >  ### Diffstat output
> >   ./drivers/md/bitmap.c |   20 +++++++++++++++-----
> >   1 file changed, 15 insertions(+), 5 deletions(-)
> >
> >
> >  diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
> >  --- .prev/drivers/md/bitmap.c   2008-05-09 11:02:13.000000000 +1000
> >  +++ ./drivers/md/bitmap.c       2008-05-09 16:00:07.000000000 +1000
> >
> > @@ -465,8 +465,6 @@ void bitmap_update_sb(struct bitmap *bit
> >         spin_unlock_irqrestore(&bitmap->lock, flags);
> >         sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
> >         sb->events = cpu_to_le64(bitmap->mddev->events);
> >  -       if (!bitmap->mddev->degraded)
> >  -               sb->events_cleared = cpu_to_le64(bitmap->mddev->events);
> 
> Before, events_cleared was _not_ updated if the array was degraded.
> Your patch doesn't appear to maintain that design.

It does, but it is well hidden.
Bits in the bitmap are only cleared when the array is not degraded.
The new code for updating events_cleared is only triggered when a bit
is about to be cleared.

> 
> I needed "bitmap->mddev->sb_wait" and "bitmap->mddev->flags" to get
> the code to compile.

Sorry about that...

I decided to bite the bullet and create a setup where I could test
this myself.  Using the faulty personality of md make it fairly
straight forward.  This script:

------------------------------------------------------------
mdadm -Ss 
mdadm -B /dev/md9 -l faulty -n 1 /dev/sdc
mdadm -CR /dev/md0 -l1 -n2 -d 1 --bitmap internal --assume-clean /dev/sdb /dev/md9
mkfs /dev/md0 3000000
mount /dev/md0 /mnt
echo hello > /mnt/afile
sync
sleep 4
echo before grow
mdadm -E /dev/md9 | grep -E '(State|Event).*:'
mdadm -X /dev/md9 | grep Bitmap
mdadm -G /dev/md9 -l faulty -p wa
umount /mnt
echo after umount
mdadm -X /dev/sdb | grep Event
mdadm -S /dev/md0
echo sdb
mdadm -E /dev/sdb | grep Event
mdadm -E /dev/sdb | grep State' :'
mdadm -X /dev/sdb | grep Event

echo mdp
mdadm -E /dev/md9 | grep Event
mdadm -E /dev/md9 | grep State' :'
mdadm -X /dev/md9 | grep Event

mdadm -G /dev/md9 -l faulty -p none
mdadm -A /dev/md0 /dev/sdb
mdadm /dev/md0 -a /dev/md9
sleep 1
cat /proc/mdstat
------------------------------------------------------------

reproduces exactly your problem (I think).

This helped me discover what was wrong with my patch.  It has to do
with the event counter going backwards sometimes.

This patch makes the above test work as expected, and should provide
happiness for you too.

NeilBrown


Signed-off-by: Neil Brown <neilb@xxxxxxx>

### Diffstat output
 ./drivers/md/bitmap.c |   26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c

--- .prev/drivers/md/bitmap.c	2008-05-16 20:27:49.000000000 +1000
+++ ./drivers/md/bitmap.c	2008-05-16 21:49:20.000000000 +1000
@@ -454,8 +454,11 @@ void bitmap_update_sb(struct bitmap *bit
 	spin_unlock_irqrestore(&bitmap->lock, flags);
 	sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
 	sb->events = cpu_to_le64(bitmap->mddev->events);
-	if (!bitmap->mddev->degraded)
-		sb->events_cleared = cpu_to_le64(bitmap->mddev->events);
+	if (bitmap->mddev->events < bitmap->events_cleared){
+		/* rocking back to read-only */
+		bitmap->events_cleared = bitmap->mddev->events;
+		sb->events_cleared = cpu_to_le64(bitmap->events_cleared);
+	}
 	kunmap_atomic(sb, KM_USER0);
 	write_page(bitmap, bitmap->sb_page, 1);
 }
@@ -1085,9 +1088,22 @@ void bitmap_daemon_work(struct bitmap *b
 			} else
 				spin_unlock_irqrestore(&bitmap->lock, flags);
 			lastpage = page;
-/*
-			printk("bitmap clean at page %lu\n", j);
-*/
+
+			/* We are possibly going to clear some bits, so make
+			 * sure that events_cleared is up-to-date.
+			 */
+			if (bitmap->events_cleared < bitmap->mddev->events) {
+				bitmap_super_t *sb;
+				bitmap->events_cleared = bitmap->mddev->events;
+				wait_event(bitmap->mddev->sb_wait,
+				    !test_bit(MD_CHANGE_CLEAN,
+					      &bitmap->mddev->flags));
+				sb = kmap_atomic(bitmap->sb_page, KM_USER0);
+				sb->events_cleared =
+					cpu_to_le64(bitmap->events_cleared);
+				kunmap_atomic(sb, KM_USER0);
+				write_page(bitmap, bitmap->sb_page, 1);
+			}
 			spin_lock_irqsave(&bitmap->lock, flags);
 			clear_page_attr(bitmap, page, BITMAP_PAGE_CLEAN);
 		}
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html