On Thu, Jun 23, 2022 at 11:05 PM ToddAndMargo via users <users@xxxxxxxxxxxxxxxxxxxxxxx> wrote: > > Hi All, > > Any of you guys know of a PCIe card that will do > hardware RAID 1 with two NVMe drives? > > I have found some, but they are way to elaborate, > and as such, way too expensive. I'm really not certain how sophisticated or reliable either PCIe or NVMe is with respect to error reporting. Or even if it varies by make/model. My understanding is that internally it has to be good because your data isn't really stored in any recognizable form on solid state drives, it's a "probabilistic representation of your data" and requires really sophisticated encoding/decoding to "almost certainly" return your data. But when that doesn't happen, curiously it (anecdotally) seems rare to get discrete read errors like we see with hard drives. Common instead, the drive returns garbage or zeros instead of your data. This is where btrfs shines, in general, but really shines in the raid1 configuration. In the normal single drive configuration, Btrfs will verbosely complain. It has limited ability to correct when the metadata profile is dup (two copies of the file system on one drive), which is the mkfs default since btrfs-progs ~5.15. For various reasons, even dup might have two bad copies on a single SSD. But in the raid1 configuration (two copies on different devices), Btrfs can unambiguously determine on every read whether data or metadata is wrong, and grab the good copy from the other drive, and overwrite the bad copy. And this is all automatic. You can see the same scary verbose message in dmesg, but you'll see additional messages for the fixups. Fixup also happens during scrub, useful for the areas that aren't regularly read. Conversely, any hardware, mdadm, or LVM RAID depends on the hardware reporting a read error. If garbage or zeros are returned, the RAID can't do anything about it. [1] Sounds great. So why not btrfs raid1? Well, right now the code that handles degraded mdadm RAID is all in dracut (in the initramfs). The initramfs contains dracut scripts that try to assemble the RAID and if a drive is missing, it won't assemble, so the scripts know to start a loop to wait for about 3 minutes, and then attempt a degraded assemble. But dracut doesn't handle Btrfs in the same situation, and no one has done the work so far to make it possible. If a drive flat out dies, what happens at boot time is you get an indefinite wait for the device to appear, because of a udev rule that requires waiting for all Btrfs devices to appear before mount is attempted. That's good because we don't want to prematurely try to do a normal or degraded mount. Anyway, this area needs development work. So if your use case requires unattended boot when a drive has failed, this set up is not for you. So those are the current trade offs. [1] There's experimental dm-integrity support via cryptsetup. It works rather differently than Btrfs, but has the ability to detect such corruption problems and report them to the upper layer as a read error where the normal RAID error correction can then work properly. -- Chris Murphy _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure