Well, I can't formerly speak for IBM, EMC, LSI, NetApp, and others when I say you are wrong in just about everything you wrote. Their architectures are "absolutely crazy", and their firmware doesn't meet your personal criteria for being "well-architected". Qlogic, Emulex and LSI are also wrong since they have vanity firmware/drivers for specific RAID subsystems to increase interoperability between all of the RAID hardware/software layers. Now contrast zfs with md+filesystemofyourchoice. The performance, reliability, security, data integrity, and self-healing capability of zfs are as profoundly superior to md and your design philosophy, as the current md architecture is to MS-DOS/FAT. The empirical evidence speaks for itself. The RAID hardware vendors and the architects of zfs spend billions of dollars annually on R&D, have superior products, and do it my way, not yours. If you want to respond with a flame, then take it to a zfs group. I see no need to respond further. -----Original Message----- From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of Peter Grandi Sent: Sunday, April 20, 2008 8:41 AM To: Linux RAID Subject: Re: Need some information and help on mdadm in order to support it on IBM z Systems [ ... ] > Well, I can name many RAID controllers that will automatically > add a "faulty" drive back into an array. This is a very good > thing to have, and is counter-intuitive to all but experienced > RAID architects. It is not counter-intuitive, it is absolutely crazy in the general case, and in particular cases it leads to loss of focus and mingling of astractions layers that should remain separate. [ ... ] > There are hundreds of published patents concerning data > recovery and availability in various failure scenarios, so > anybody who wants to learn more can simply search the > USPTO.GOV database and read them for themselves. Sure, those are heuristics that are implemented on top of a RAID subsystem, and that are part of a ''computer assisted storage administration'' logic. Just like IBM developed many years ago expert systems to automate or assist with recovery from various faults (not just storage faults) on 370/390 class mainframes. Such recovery tools have nothing to do with RAID as such, even if they are often packaged with RAID products. They belong in a totally different abastraction layer, as this example makes starkly clear: > A few real-world reasons you want this capability ... > * Your RAID system consists of 2 external units, with RAID > controller(s) and disks in unit A, and unit B is an > expansion chassis, with interconnecting cables. You have a > LUN that is spread between 2 enclosures. Enclosure "B" goes > offline, either because of a power failure; a sysadmin who > doesn't know you should power the RAID head down first, then > expansion; or he/she powers it up backwards .. bottom line > is that the drive "failed", but it only "failed" because it > was disconnected due to power issues. > [ ... ] This relies on case-based expert-system like fault analysis and recovery using knowledge of non-RAID aspects of the storage subsystem. Fine, but it has nothing to do with RAID -- as it requires a kind of ''total system'' approach. > Well architected firmware needs to be able to recognize this > scenario and put the disk back. Well architected *RAID* firmware should do nothing of the sort. RAID has a (deceptively) simple operation model and yet getting RAID firmware right is hard enough. Well architected fault analysis and recovery daemons might well recognize that scenario and put the disk back, but that's a completely different story from RAID firmware doing that. > To go back to Mario's argument that you *could* make things > far worse .. absolutely. Sure, because fault analysis and recovery heuristics take chances that can go spectacularly wrong, as well as being pretty hard to code too. While I am not however against optional fault analysis and recovery layers on top of RAID, I really object to statements like this: > The RAID architect needs to incorporate hot-adding md disks > back into the array, as long as it is done properly. Becase the RAID architect should stay well clear of considering such kind of issues, and of polluting the base RAID firmware with additional complications; even the base RAID logic is amazingly bug infested in various products I have had the misfortune to suffer. The role of the RAID achitect is to focus on the performance and correctness of the basic RAID logic, and let the architects of fault analysis and recovery daemons worry with other issues, and perhaps to provide suitable hooks to them to make their life easier. > RAID recovery logic is perhaps 75% of the source code for > top-of-the-line RAID controllers. Their firmware determines > why a disk "failed", and does what it can to bring it back > online and fix the damage. There is a rationale for bundling a storage fault analysis and recovery daemon into a RAID host adapter, but I don't like that, because often there are two downside: * Fault analysis and recovery usually are best done at the highest possible abstraction level, that is as software daemons running on the host, as they have more information than a daemon running inside the host adapter. * Mingling fault analysis and recovery software with the base RAID logic (as the temptation then becomes hard to resist) tends to distract from the overridingly important task of getting the latter to perform reliably and to report errors clearly and usefully. In previous discussions in this list there were crazy proposals to make part of the Linux RAID logic some detection )not too bad) and recovery (chancy horror) from (ambiguous) unreported errors and the reason why I am objecting strenuously here is to help quash calls for something insane like that. Separation of concerns and of abstractions layers and keeping fundamental firmware logic simple are rather important goals in mission critical subsystems. > A $50 SATA RAID controller Except perhaps the IT8212 chips these don't exist :-). > has perhaps 10% of the logic dedicated to failover/failback. That 10% is 10% too many. It is already difficult to find simple, reliable RAID host adapters, never mind get RAID host adapters that try to be too clever. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html