RE: [PATCH] md: do not write resync checkpoint, if max_sector has been reached.

"Hawrylewicz Czarnowski, Przemyslaw" <przemyslaw.hawrylewicz.czarnowski@xxxxxxxxx> · Tue, 1 Feb 2011 23:45:23 +0000

> -----Original Message-----
> From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-
> owner@xxxxxxxxxxxxxxx] On Behalf Of NeilBrown
> Sent: Tuesday, February 01, 2011 2:08 AM
> To: Hawrylewicz Czarnowski, Przemyslaw
> Cc: linux-raid@xxxxxxxxxxxxxxx; Neubauer, Wojciech; Williams, Dan J;
> Ciechanowski, Ed
> Subject: Re: [PATCH] md: do not write resync checkpoint, if max_sector has
> been reached.
> 
[cut]

> > >
> > > This is wrong.  If curr_resync has reached some value, then the array
> *is*
> > > in-sync up to that point.
> > >
> > > If a device fails then that often makes the array fully in-sync -
> because
> > > there it no longer any room for inconsistency.
> > > This is particularly true for RAID1.  If one drive in a 2-drive RAID1
> > > fails,
> > > then the array instantly becomes in-sync.
> > > For RAID5, we should arguably fail the array at that point rather than
> > > marking it in-sync, but that would probably cause more data loss than
> it
> > > avoids, so we don't.
> > > In any case - the array is now in-sync.
> > Yes, I agree. But it is not the point here, in this bug.
> >
> > >
> > > If a spare is added by mdmon at this time, then the array is not 'out
> of
> > > sync', it is 'in need for recovery'.  'recovery' and 'resync' are
> different
> > > things.
> > I fully understand the difference between recovery and resync (and
> reshape).
> >
> > >
> > > md_check_recovery should run remove_and_add_spares are this point.
> That
> > And it does.
> >
> > > should return a non-zero value (because it found the spare that mdmon
> > > added)
> > But the return value is wrong (it is correct according to current
> configuration). Please let me explain once again what's going on.
> >
> > The flow is as follows:
> > 0. resync is in progress
> > 1. one disk fails
> > 2. md_error() wakes up raid thread
> > 3. md_do_sync() gets skipped=1 from mddev->pers->sync_request() and some
> amount of skipped sectors/stripes - usually all remaining to resync. mddev-
> >recovery_cp is set to last sector (max_sector in md_do_sync)
> > 3a. md_check_recovery() sees MD_RECOVERY_INTR (clears it) and unregisters
> recovery thread (which actually does resync)
> > 3b. mdmon unblocks array member
> > 4. md_check_recovery checks if some action is required.
> > 4a. reshape is not taken into account as reshape_position==MaxSector
> > 4b. recovery is not taken into account as mdmon did not add spare yet
> > 4c. resync is started, as recovery_cp!=MaxSector (!)
> > 5. md_do_sync exists normally (gets skipped=1 from mddev->pers-
> >sync_request()) as checkpoint pointed at the last sector; it clears mddev-
> >recovery_cp.
> > 6. mdmon adds disk (via slot-store())
> > 7. md_check_recovery() does cleanup after finished resync
> > 7a. MD_RECOVERY_INTR is not set anymore, mddev->pers->spare_active() is
> started and ALL devices !In_sync available in array are set in_sync, and
> array degradation is cleared(!).
> > 7b. remove_and_add_spares() does not see spares available (mddev-
> >degraded==0) so recovery does not start.
> >
> 
> Thank you for this excellent problem description.
> 
> I think the mistake here is at step 6.
> mdmon should not be trying to add a disk until the resync has completed.
> In particular, mdmon shouldn't try, and md should not allow mdmon to
> succeed.
> 
> So slot_store should return -EBUSY if MD_RECOVERY_RUNNING is set
> 
> mdmon needs to 'know' when a sync/etc is happening, and should avoid
> processing ->check_degraded if it is.
> 
> I have added the md fix to my for-next branch.
I took a look at this change. It does no compile. It should be:

if (test_bit(MD_RECOVERY_RUNNING, &rdev->mddev->recovery)
	return -EBUSY;

> I might do the mdadm fix later.

Now (without this kernel/mdmon handshaking) there is a infinite loop in manager (it is quite well reproducible for IMSM raid5). I have noticed, that state is still "resync", mdmon continuously sends "-blocked" and manager does not add spare (because of resync). I haven't found root cause why array_state is still "resync", but at first glance after md_do_sync finishes md_check_recovery do not start.
I will make some more tests tomorrow...

Przemek
> 
> Thanks,
> NeilBrown
> 
> 
[cut]

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html