> > > On 2019-07-01 10:01, Warren Young wrote: >> On Jul 1, 2019, at 8:26 AM, Valeri Galtsev <galtsev@xxxxxxxxxxxxxxxxx> >> wrote: >>> >>> RAID function, which boils down to simple, short, easy to debug well >>> program. > > I didn't intend to start software vs hardware RAID flame war when I > joined somebody's else opinion. > > Now, commenting with all due respect to famous person who Warren Young > definitely is. > >> >> RAID firmware will be harder to debug than Linux software RAID, if only >> because of easier-to-use tools. > > I myself debug neither firmware (or "microcode", speaking the language > as it was some 30 years ago), not Linux kernel. In both cases it is > someone else who does the debugging. > > You are speaking as the person who routinely debugs Linux components. I > still have to stress, that in debugging RAID card firmware one has small > program which this firmware is. > > In the case of debugging EVERYTHING that affects reliability of software > RAID, on has to debug the following: > > 1. Linux kernel itself, which is huge; > > 2. _all_ the drivers that are loaded when system runs. Some of the > drivers on one's system may be binary only, like NVIDIA video card > drives. So, even for those who like Warren can debug all code, these > still are not accessible. > > All of the above can potentially panic kernel (as they all run in kernel > context), so they all affect reliability of software RAID, not only the > chunk of software doing software RAID function. > >> >> Furthermore, MD RAID only had to be debugged once, rather that once per >> company-and-product line as with hardware RAID. > > Alas, MD RAID itself not the only thing that affects reliability of > software RAID. Panicking kernel has grave effects on software RAID, so > anything that can panic kernel had also to be debugged same thoroughly. > And it always have to be redone once changed to kernel or drivers are > introduced. > >> >> I hope you’re not assuming that hardware RAID has no bugs. It’s >> basically a dedicated CPU running dedicated software that’s difficult to >> upgrade. > > That's true, it is dedicated CPU running dedicated program, and it keeps > doing it even if the operating system crashed. Yes, hardware itself can > be unreliable. But in case of RAID card it is only the card itself. > Failure rate of which in my racks is much smaller that overall failure > rate of everything. In case of kernel panic, any piece of hardware > inside computer in some mode of failure can cause it. > > One more thing: apart from hardware RAID "firmware" program being small > and logically simple, there is one more factor: it usually runs on RISC > architecture CPU, and introduce bugs programming for RISC architecture > IMHO is more difficult that when programming for i386 and amd64 > architectures. Just my humble opinion I carry since the time I was > programming. > >> >>> if kernel (big and buggy code) is panicked, current RAID operation will >>> never be finished which leaves the mess. >> >> When was the last time you had a kernel panic? And of those times, when >> was the last time it happened because of something other than a hardware >> or driver fault? If it wasn’t for all this hardware doing strange >> things, the kernel would be a lot more stable. :) > > Yes, I half expected that. When did we last have kernel crash, and who > of us is unable to choose reliable hardware, and unable to insist that > our institution pays mere 5-10% higher price for reliable box than they > would for junk hardware? Indeed, we all run reliable boxes, and I am > retiring still reliably working machines of age 10-13 years... > > However, I would rather suggest to compare not absolute probabilities, > which, exactly as you said, are infinitesimal. But with relative > probabilities, I still will go with hardware RAID. > >> >> You seem to be saying that hardware RAID can’t lose data. You’re >> ignoring the RAID 5 write hole: >> >> https://en.wikipedia.org/wiki/RAID#WRITE-HOLE > > Neither of our RAID cards runs without battery backup. > >> >> If you then bring up battery backups, now you’re adding cost to the >> system. And then some ~3-5 years later, downtime to swap the battery, >> and more downtime. And all of that just to work around the RAID write >> hole. > > You are absolutely right about system with hardware RAID being more > expensive than that with software RAID. I would say, for "small scale > big storage" boxes (i.e. NOT distributed file systems), hardware RAID > adds about 5-7% of cost in our case. Now, with hardware RAID all > maintenance (what one needs to do in case of single failed drive > replacement routine) takes about 1/10 of a time necessary do deal with > similar failure in case of software RAID. I deal with both, as it > historically happened, so this is my own observation. Maybe software > RAID boxes I have to deal with are too messy (imagine almost two dozens > of software RAIDs 12-16 drives each on one machine; even bios runs out > of numbers in attempt to enumerate all drives...) No, I am not taking > the blame for building box like that ;-) > > All in all, simpler way of routinely dealing with hardware RAID saves > human time involved, and in a long run quite likely is money saving > (think of salaries, benefits etc.), though it looks more expensive at > the moment of hardware purchase. It can also be the other way around: If you are a Linux only shop and you have a large number of systems with a large number of different controller brands and generations, you may just start to hate how they all work differently, have their different issues and can really give lots of gray hairs. Doing it all with MD RAID can make your life much easier! Peoples should also be aware that the firmware of common desktop disks is not optimal for handling errors in RAID configurations. They need different firmware parameters for optimal use in RAID, be it hardware or software. Regards, Simon _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx https://lists.centos.org/mailman/listinfo/centos