Re: raid1 out of sync, but which files are affected?

"Nik.Brt." <nik.brt@xxxxxxxxxxxxx> · Wed, 13 Feb 2019 08:32:12 +0100

On 12/02/2019 20:11, Nix wrote:
On 10 Feb 2019, Harald Dunkel spake thusly:

On 1/27/19 12:21 AM, Nik.Brt. wrote:

These mismatches happen, in raid1, but why they happen is not precisely known. There are a few ideas... and it is said that they are harmless in most cases (=outside of files).
The phenomenon happens a lot less if you have LVM over the raid1, and also this is not exactly known why.

This is more than alarming. Do I put my data at risk using software RAID1?

No, because the only situation in which they are known to happen is when
you have a powerdown or crash or similar event when the data has hit one
spindle and not the other. In this case, *either* content is valid: if
you get one, you could have got the other if the machine powered down a
fraction of a second earlier or later. All that matters is that the data
remains conssitent.

No this is a wrong interpretation. RAID should protect against that.
The mismatches you mention should go away shortly after rebooting, 
because the RAID logic addresses those.

After reboot, if there is no bitmap, one disk (raid1) is taken for good 
and it is fully replicated onto the other one, and the second disk is 
not read until replication has passed. On raid 5-6 the data disks are 
taken for good and the parities are recomputed.

If there is a bitmap, only such regions which are set dirty are 
recomputed. By using flush the bitmap is always guaranteed to be the 
first to be set dirty and the last one to be set clean. That of course 
requires that the disks implement the flush command correctly.

RAID is not meant to protect against data loss on sudden powerdown
(that's the job of UPSes and filesystem journals) nor really against
data loss on single-sector disk damage or intemittent connectivity
problems. It's meant to protect against data loss on whole-disk failure.
That's all. If it protects against other things, good, but other
scenarios are not within the design intent of RAID.

RAID would be very weak if it was so vulnerable.

RAID does protect against single-sector damage IF such disk reports read 
error. Cabling errors and other logic errors, which don't result in CRC 
errors on the disk surface, are nasty in this sense. Cabling errors 
hopefully should report CRC errors at the disk side, visible in dmesg 
and in SMART.