Song Liu <songliubraving@xxxxxx> writes: >> -----Original Message----- >> From: Neil Brown [mailto:neilb@xxxxxxx] >> Sent: Tuesday, September 29, 2015 9:25 PM >> To: Shaohua Li; linux-raid@xxxxxxxxxxxxxxx >> Cc: Kernel Team; Song Liu; hch@xxxxxxxxxxxxx; dan.j.williams@xxxxxxxxx >> Subject: Re: [PATCH 7/8] md: skip resync for raid array with journal >> >> Shaohua Li <shli@xxxxxx> writes: >> >> > If a raid array has journal, the journal can guarantee the >> > consistency, we can skip resync after a unclean shutdown. The >> > exception is raid creation or user initiated resync, which we still do a raid >> resync. >> > >> > Signed-off-by: Shaohua Li <shli@xxxxxx> >> > --- >> > drivers/md/md.c | 4 ++++ >> > drivers/md/md.h | 1 + >> > 2 files changed, 5 insertions(+) >> > >> > diff --git a/drivers/md/md.c b/drivers/md/md.c index b3f9eed..95824fb >> > 100644 >> > --- a/drivers/md/md.c >> > +++ b/drivers/md/md.c >> > @@ -1669,6 +1669,8 @@ static int super_1_validate(struct mddev *mddev, >> struct md_rdev *rdev) >> > } >> > set_bit(Journal, &rdev->flags); >> > rdev->journal_tail = le64_to_cpu(sb->journal_tail); >> > + if (mddev->recovery_cp == MaxSector) >> > + set_bit(MD_JOURNAL_CLEAN, &mddev- >> >flags); >> > break; >> > default: >> > rdev->saved_raid_disk = role; >> > @@ -1711,6 +1713,8 @@ static void super_1_sync(struct mddev *mddev, >> struct md_rdev *rdev) >> > sb->events = cpu_to_le64(mddev->events); >> > if (mddev->in_sync) >> > sb->resync_offset = cpu_to_le64(mddev->recovery_cp); >> > + else if (test_bit(MD_JOURNAL_CLEAN, &mddev->flags)) >> > + sb->resync_offset = cpu_to_le64(MaxSector); >> > else >> > sb->resync_offset = cpu_to_le64(0); >> > >> > diff --git a/drivers/md/md.h b/drivers/md/md.h index 226f4ba..0288a0b >> > 100644 >> > --- a/drivers/md/md.h >> > +++ b/drivers/md/md.h >> > @@ -236,6 +236,7 @@ struct mddev { >> > #define MD_STILL_CLOSED 4 /* If set, then array has not been >> opened since >> > * md_ioctl checked on it. >> > */ >> > +#define MD_JOURNAL_CLEAN 5 /* A raid with journal is already clean >> */ >> > >> > int suspended; >> > atomic_t active_io; >> > -- >> > 1.8.1 >> >> This looks right as far as it goes, but I don't think it goes far enough. >> The particular scenario that bothers me is if the array is started without the >> journal being present. >> I cannot see anything to prevent that - is there? > > I have sent mdadm patches that prevent the array to assemble with missing > journal, for both --assemble and --incremental: > > http://marc.info/?l=linux-raid&m=144080445720123 > http://marc.info/?l=linux-raid&m=144080446120127 > That is probably good and useful, but the kernel cannot rely on mdadm to protect it. If the kernel is asked to start a journalled array when no journal is present, it must do something that is safe. That might be to refuse, it might be to force a resync, it might be to somehow "know" if the journal is empty and only force a resync in that case. This should be treated much like a "dirty/degraded" start. Currently the kernel refuses to do that unless a module parameter is set. When mdadm is asked to --force start a dirty/degraded array, it modifies the metadata so that it doesn't appear to be degraded. Something similar might be best for the missing-journal case. I *think* the current kernel code will allow the array to be started without a resync even if the journal was not empty, and that is not safe. > We can force start in both cases with --force or --run. > > Currently, mdadm doesn't clear MD_FEATURE_JOURNAL bit, so we get warning > every time. > > I did miss one step: to mark the missing journal as fault. > >> >> In that case we need to assume that the array is not in-sync, and we need to >> clear MD_FEATURE_JOURNAL so if it gets stopped and then assembled again >> with the stale journal doesn't get used. >> >> One unfortunate side effect of that is that you couldn't stop the array cleanly >> (leaving the journal effectively empty) and then restart with no journal and >> no resync. Is that a problem I wonder? >> >> I'm not sure what the best solution is here, but we need a clear >> understanding of what happens if you try to assemble an array without the >> journal where previously it had one, and I don't think the current code gets it >> right. > > I just talked with Shaohua. We think we can probably do the following: > > 1. When assemble with cache device missing, show the warning (with --force > option). Run resync if previous shutdown is not clean (this is the default > behavior. Probably sensible. But I think the current code never marks the array as "not clean" when there is a journal. > > 2. When we get I/O error on the journal device at run time, make the whole > array as read only. Yes, I think that is best. > > 3. Add new operation that adds new journal device to an array with missing > journal. We can limit this option to inactive array only. It would be great if we could add a journal to an active array... > > Would this follow cover all the cases? Quite possibly. I don't want to commit myself until I see the code :-) Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature