Re: RAID levels not intuitive in anaconda GUI

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Wed, 2 Jan 2013 12:52:01 -0700

On Dec 19, 2012, at 2:26 PM, Marian Ganisin <mganisin@xxxxxxxxxx> wrote:
> 
>  If the md driver detects a write error on a device in a RAID1, RAID4, RAID5,
>  RAID6, or RAID10 array, it immediately disables that device (marking it as
>  faulty) and continues operation on the remaining devices.
> 
> Marian is right, wiki as well as man md agree with him, all RAID levels except of RAID0 do error detection.

This md entry is referring to the drive firmware itself reporting a (sector) read or write error. This error detection always occurs, and is totally independent of md. The md RAID levels merely dictate subsequent behaviors of this drive error detection.

When a sector read error occurs, md will get the data from a mirrored copy (RAID 1), or rebuild it from parity (RAID 4, 5, 6). That recovered data is then also written to the LBA that previously had the read error, and if the firmware determines the sector is bad it will remap to a reserve sector. So md isn't actually doing error detection at all in normal operation, it's the drive firmware that does this. What md provides is a way to correct for the error. In the case of a write error there is no way out, the device must wholesale be considered faulty.

> Therefore I say that usage of term 'error detection' is highly confusing as it actually counts wide range of RAID levels, RAID is error detecting redundant composition of disks by definition (once more, many professionals do not count RAID0 to RAID family).

Error correction is more correct than error detection. Scrub check or repair is the method for md based error detection. But scrubs aren't configured by default, so in fact md error detection is never occurring out of the box. Therefore I find the term "error detection" is misleading.

> As the anaconda implements creation of MD devices it would be really nice to have it aligned with MD terminology as expressed in relevant man pages.

What I'm finding is that it does do this, it's just not immediately discoverable. You must check a box, and click Apply, for the RAID level label to change.

> I asked my colleagues if RAID10 has parity, the answer was: If it is
> able to detect error, it has to have a parity.

Well, they're wrong. Md using RAID 1 can also detect error during a scrub.

In the case of a RAID 1 scrub, if data chunks don't match between two devices, md reports an error has been detected. Of course, it's ambiguous which drive contains the valid/invalid chunks.

In the case of a RAID 5 scrub, it means reading all data and parity chunks and recomputing parity to compare to the parity chunks. If there's a mismatch, md reports an error has been detected. But again it's ambiguous whether it's the data chunk or parity chunk that's wrong. Yet a repair type of scrub for RAID 5/6 assume data chunks are correct, and write new parity chunks.

So parity is not required for detecting error.

Chris Murphy

_______________________________________________
Anaconda-devel-list mailing list
Anaconda-devel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/anaconda-devel-list