On 04/27/2010 12:20 PM, Phillip Susi wrote: > RAID is an acronym that just happens to also spell an English word. > Removed and Failed are not, and your cat example is a complete non > sequitur. My point is that when naming technical things you should do > so sanely. You wouldn't label the state a disk goes into when it keeps > failing IO requests as "Iceland" would you? Of course not. The state > is named failed because the disk has failed. And the state "removed" is labeled as such because the device has been removed from the list of slave devices that the kernel keeps. Nothing more. You are reading into it things that weren't intended. What you are reading into it might even be a reasonable interpretation, but it's not the actual interpretation. > I didn't say never allow it to be added back, I said don't go doing it > automatically. An explicit add should, of course, work as it does now, > but it should not be added just because udev decided it has appeared and > called mdadm --incremental on it. This is, in fact, completely contrary to where we are heading with things. We in fact *do* want udev invoked incremental rules to readd the device after it has been removed. The entire hotunplug/hotplug support I'm working on does *exactly* that. On device removal is does both a fail and remove action, an on device insertion is does a readd or add as needed. So, as I said, you are reading more into "removed" than we intend, and we *will* be automatically removing devices when they go away, so it's entirely appropriate that if we automatically remove them then we don't consider "removed" to be a manual intervention only state, it is a valid automatic state and recovery from it should be equally automatic. >> No, it's not. The udev rules that add the drive don't race with >> manually removing it because they don't act on change events, only add >> events. > > And who is to say that you won't get one of those? A power failure > after --remove and when the system comes back up, viola, the disk gets > put back into the array. Or maybe your hotplug environment has a loose > cable that slips out and you put it back. This clearly violates the > principal of least surprise. No, it doesn't. This is exactly what people expect in a hotplug environment. A device shows up, you use it. If you don't want the device to be used, then remove the superblock. This whole argument centers around the fact that, to you, --remove means "don't use this device again". That's a very reasonable thing to think, but it's not actually what it means. It simply means "remove this device from the slaves held by this array". Only under certain circumstances will it get readded back into the array automatically (you reboot the machine, power failure, cable unplug/plug, etc.) This is because of the interaction between hotplug discovery and the fact that we merely removed the drive from the list of slaves to the array, we did not mark the drive as "not to be used". That's what zero-superblock is for. And this whole argument that the drive being readded is a big deal is bogus too. You can always just re-remove the device if it got added. If you were wanting to preserve the data on the drive (say you were splitting a raid1 array and wanting it to remain as it was for possible revert capability) then you could issue this command: mdadm /dev/md0 -f /dev/sdc1 -r /dev/sdc1; mdadm --zero-superblock /dev/sdc1 and that should be sufficient to satisfy your needs. If we race between the remove and the zero-superblock with something like a power failure then obviously so little will have changed that you can simply repeat the procedure until you successfully complete it without a power failure. >> Not going to happen. Doing what you request would undo a number of very >> useful features in the raid stack. So you might as well save your >> breath, we aren't going to make a remove event equivalent to a zero >> superblock event because then the entire --readd option would be >> rendered useless. > > I didn't say that. I said that a remove event 1) should actually bother > recording the removed state on the disk being removed ( right now it > only records it on the other disks ), This is intentional. A remove event merely triggers a kernel error cycle on the target device. We don't differentiate between a user initiated remove and one that's the result of catastrophic disc failure. However, trying to access a dead disc causes all sorts of bad behavior on a real running system with a real disc failure, so once we know a disc is bad and we are kicking it from the array, we only try to write that data to the good discs so we aren't hosing the system. > and 2) the fact that the disk is > in the removed state should prevent --incremental from automatically > re-adding it. We are specifically going in the opposite direction here. We *want* to automatically readd removed devices because we are implementing automatic removal on hot unplug, which means we want automatic addition on hot plug. >> Because there are both transient and permanent failures. Experience >> caused us to switch from treating all failures as permanent to treating >> failures as transient and picking up where we left off if at all >> possible because too many people were having a single transient failure >> render their array degraded, only to have a real issue come up sometime >> later that then meant the array was no longer degraded, but entirely >> dead. The job of the raid stack is to survive as much failure as >> possible before dying itself. We can't do that if we allow a single, >> transient event to cause us to stop using something entirely. > > That's a good thing and is why it is fine for --incremental to activate > a disk in the failed state if it appears to have returned to being > operational and it is safe to do so ( meaning hasn't also been activated > degraded ). It should not do this for the removed state however. Again, we are back to the fact that you are interpreting removed to be something it isn't. We can argue about this all day long, but that option has had a specific meaning for long enough, and has been around long enough, that it can't be changed now without breaking all sorts of back compatibility. >> Besides, what you seem to be forgetting is that those events that make >> us genuinely not want to use a device also make it so that at the next >> reboot the device generally isn't available or seen by the OS >> (controller failure, massive failure of the platter, etc). Simply >> failing and removing a device using mdadm mimics a transient failure. >> If you fail, remove, then zero-superblock then you mimic a permanent >> failure. There you go, you have a choice. > > Failed and removed are two different states; they should have different > behaviors. Failed = temporary, removed = more permanent. There is *no* such distinction between failed and removed. Only *you* are inferring that distinction. The real distinction is failed == no longer allowed to process read/write requests from the block layer but still present as a slave to the array, removed == no longer present as a slave to the array. > zero-superblock is completely permanent. Removed should be a good > middle ground where you still CAN re-add the device, but it should not > be done automatically. A semantic change such as this would require huge amounts of pain in terms of fixing up scripts to do as you expect. It would be far easier on the entire mdadm using world to add a new option that implements what you want instead of changing existing behavior. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband
Attachment:
signature.asc
Description: OpenPGP digital signature