On July 1, 2019 8:56:35 AM CDT, Blake Hudson <blake@xxxxxxxx> wrote: > > >Warren Young wrote on 6/28/2019 6:53 PM: >> On Jun 28, 2019, at 8:46 AM, Blake Hudson <blake@xxxxxxxx> wrote: >>> Linux software RAID…has only decreased availability for me. This has >been due to a combination of hardware and software issues that are are >generally handled well by HW RAID controllers, but are often handled >poorly or unpredictably by desktop oriented hardware and Linux >software. >> Would you care to be more specific? I have little experience with >software RAID, other than ZFS, so I don’t know what these “issues” >might be. > >I've never used ZFS, as its Linux support has been historically poor. >My >comments are limited to mdadm. I've experienced three faults when using > >Linux software raid (mdadm) on RH/RHEL/CentOS and I believe all of them > >resulted in more downtime than would have been experienced without the >RAID: > 1) A single drive failure in a RAID4 or 5 array (desktop IDE) >caused the entire system to stop responding. The result was a degraded >(from the dead drive) and dirty (from the crash) array that could not >be >rebuilt (either of the former conditions would have been fine, but not >both due to buggy Linux software). > 2) A single drive failure in a RAID1 array (Supermicro SCSI) caused > >the system to be unbootable. We had to update the BIOS to boot from the > >working drive and possibly grub had to be repaired or reinstalled as I >recall (it's been a long time). > 3) A single drive failure in a RAID 4 or 5 array (desktop IDE) was >not clearly identified and required a bit of troubleshooting to >pinpoint >which drive had failed. > >Unfortunately, I've never had an experience where a drive just failed >cleanly and was marked bad by Linux software RAID and could then be >replaced without fanfare. This is in contrast to my HW raid experiences > >where a single drive failure is almost always handled in a reliable and > >predictable manner with zero downtime. Your points about having to use >a >clunky BIOS setup or CLI tools may be true for some controllers, as are > >your points about needing to maintain a spare of your RAID controller, >ongoing driver support, etc. I've found the LSI brand cards have good >Linux driver support, CLI tools, an easy to navigate BIOS, and are >backwards compatible with RAID sets taken from older cards so I have no > >problem recommending them. LSI cards, by default, also regularly test >all drives to predict failures (avoiding rebuild errors or double >failures). +1 in favor of hardware RAID. My usual argument is: in case of hardware RAID dedicated piece of hardware runs a single task: RAID function, which boils down to simple, short, easy to debug well program. In case of software RAID there is no dedicated hardware, and if kernel (big and buggy code) is panicked, current RAID operation will never be finished which leaves the mess. One does not need computer science degree to follow this simple logic. Valeri > > >_______________________________________________ >CentOS mailing list >CentOS@xxxxxxxxxx >https://lists.centos.org/mailman/listinfo/centos ++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++ _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx https://lists.centos.org/mailman/listinfo/centos