>>>>> "Nicolas" == Nicolas Noble <nicolas@xxxxxxxxxxxxxx> writes: >> I >> think in your case you're better off stopping an array that has less than >> parity drives than it should, either using a udev rule or using mdadm >> --monitor. Nicolas> I actually have been unsuccessful in these attempts so far. What Nicolas> happens is that you very quickly get processes that get indefinitely Nicolas> stuck (indefinitely as in 'waiting on a very very long kernel Nicolas> timeout') trying to write something, so that the ext4fs layer becomes Nicolas> unresponsive on these threads, or take a very long time. Killing the Nicolas> processes takes a very long time because they are stuck in a kernel Nicolas> operation. And if potentially more processes can spawn back up, the Nicolas> automated script starts an interesting game of whack-a-mole in order Nicolas> to unmount the filesystem. Nicolas> And you can't stop the underlying arrays without first Nicolas> stopping the whole chain (umount, stop the lvm volume, Nicolas> etc...), otherwise you simply get "device is busy" errors, Nicolas> hence the whack-a-mole process killing. The only working Nicolas> method I've managed to successfully implement is to Nicolas> programatically loop over the list of all the drives involved Nicolas> in the filesystem, on all the raids involved, and flag all of Nicolas> them as failed drives. This way, you get to really put Nicolas> "emergency brakes" on. I find that to be a very, very scary Nicolas> method however. I think this is the wrong idea. You do want MD to re-try errors on underlying devices, because some drives will return an error, and if MD has long enough timeouts, it can recover and try to re-write the bad sector(s) on the drive, which early on will let the bad block be mapped out and new block put in place. But you're looking for a solution when one device in a stripped RAID0 goes away, what happens to the filesystem then. And in that case your shit out of luck. No filesystem is designed to cope with that type of failure. So there might be ext4 or xfs or jfs options which will help you in this case, but it's not a simple thing to program around. Esp once the size of the volume gets really big. John -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html