Re: Rewrite md raid1 member

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Fri, 19 Aug 2016 10:10:23 -0600

On Fri, Aug 19, 2016 at 6:46 AM, Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
> On Fri, Aug 19, 2016 at 12:52:21PM +0100, Wols Lists wrote:
>> On 18/08/16 05:01, Chris Dunlop wrote:
>>> I'm interested to see if there's a way of essentially doing the above on a
>>> live system, assuming there's appropriate care taken to not trash any
>>> existing data (including superblocks).
>>>
>>> I.e. is it *theoretically* possible to write the same data back to the whole
>>> disk safely. E.g. using 'dd' from/to the same disk is almost there, but, as
>>> described, there's a window of opportunity where you could get stale data on
>>> the disk and a raid repair could then copy that stale data to the good disk.
>>
>> There is something called "scrub". My superficial knowledge of raid
>> doesn't let me know what it is, but as far as I can make out it forces a
>> whole-disk-write or somesuch. Explicitly to flush out such problems. If
>> someone else can tell you how to scrub your disks, I'd try that.
>
> A scrub will read the RAID members to check that both sides match (raid 1,
> 10), or that the checksum is correct (raid 4,5,6).
>
> To initiate a scrub of md0:
>
> echo repair > /sys/block/md0/md/sync_action
>
> You can watch it using /proc/mdstat, e.g.:
>
> watch cat /proc/mdstat
>
> It won't write anything if it doesn't detect any errors.
>
> In my case, I want it to write everything.
>
> If I do my 'dd' to write everything as previously described, with the window
> of opportunity for stale data to end up on the written disk, one option
> would to run a scrub / repair to check the data is the same - but if I'm
> unlucky with my dd and the data isn't the same for some sector[s], I want to
> ensure the correct data is copied over the stale data and not the other way
> around, e.g. to specify "in the event of a mismatch, use the data from sda
> and overwrite the data on sdb".
>
> Unfortunately I don't know how that can be done.
>
> Does anyone know?

Basically you want what Btrfs balance does, except simpler: rather
than relocating extents into new allocation groups, you just want to
read and rewrite everything as it is.

You definitely can't do this with dd when md + mounted file system,
that's inevitably going to result in the file system making changes
after this operation has done a read, and therefore its write will
clobber the file system's modifications. It'll be data loss at a
minimum, and if it's file system metadata, it'll be worse in that
it'll make the file system inconsistent. Further it's a problem
overwriting good data, not accounting for the possibility of a crash
or power failure. You'd really want this operation to be CoW, so that
the good data is effectively duplicated somewhere else and only once
that operation is on stable media would it be pointed to, and the
original data turned to free space.

I'm not really understanding the use case of why you'd want to do
this. At a fundamental level it sounds like you don't trust the
devices the data resides on. If that's true, then there are related
concerns that aren't mitigated by this rewrite feature alone.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html