Re: [GIT PATCH 0/2] external-metadata recovery checkpointing for 2.6.33

Dan Williams <dan.j.williams@xxxxxxxxx> · Mon, 14 Dec 2009 21:19:38 -0700



On Mon, Dec 14, 2009 at 5:37 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> On Sun, 2009-12-13 at 21:07 -0700, Neil Brown wrote:
>> +static ssize_t recovery_start_store(mdk_rdev_t *rdev, const char *buf, size_t len)
>> +{
>> +     unsigned long long recovery_start;
>> +
>> +     if (cmd_match(buf, "none"))
>> +             recovery_start = MaxSector;
>> +     else if (strict_strtoull(buf, 10, &recovery_start))
>> +             return -EINVAL;
>> +
>> +     if (rdev->mddev->pers &&
>> +         rdev->raid_disk >= 0)
>> +             return -EBUSY;
>
> Ok, I had a chance to test this out and have a question about how you
> envisioned mdmon handling this restriction which is a bit tighter than
> what I had before.  The prior version allowed updates as long as the
> array was read-only.  This version forces recovery_start to be written
> at sysfs_add_disk() time (before 'slot' is written). The conceptual
> problem I ran into was a race between ->activate_spare() determining the
> last valid checkpoint and the monitor thread starting up the array:
>
> ->activate_spare(): read recovery checkpoint
> ( array becomes read/write )
> ( array becomes dirty, checkpoint invalidated )
> sysfs_add_disk(): write invalid recovery checkpoint
> ( recovery starts from the wrong location )

On second thought, if we get to activate_spare() it's already too
late.  Moving this to mdadm at assembly time (prior to setting
readonly) is a better approach.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html