Neil, thanks for the review, and for detailed answers to my questions. > When we mark a device 'failed' it should stay marked as 'failed'. When the > array is optimal again it is safe to convert all 'failed' slots to > 'spare/missing' but not before. I did not understand all that reasoning. When you say "slot", you mean index in the dev_roles[] array, correct? If yes, I don't see what importance the index has, compared to the value of the entry itself (which is "role" in your terminology). Currently, 0xFFFE means both "failed" and "missing", and that makes perfect sense to me. Basically this means that this entry of dev_roles[] is unused. When a device fails, it is kicked out of the array, so its entry in dev_roles[] becomes available. (You once mentioned that for older arrays, their dev_roles[] index was also their role, perhaps you are concerned about those too). In any case, I will be watching for changes in this area, if you decide to make them (although I think this might break backwards compatibility, unless a new version of superblock will be used). > If you have a working array and you initiate a write of a data block and the > parity block, and if one of those writes fails, then you no longer have a > working array. Some data blocks in that stripe cannot be recovered. > So we need to make sure that admin knows the array is dead and doesn't just > re-assemble and think everything is OK. I see your point. I don't know what's better: to know the "last known good" configuration, or to know that the array has failed. I guess, I am just used to the former. > I think to resolve this issue we need 2 thing. > > 1/ when assembling an array if any device thinks that the 'chosen' device has > failed, then don't trust that devices. I think that if any device thinks that "chosen" has failed, then either it has a more recent superblock, and then this device should be "chosen" and not the other. Or, the "chosen" device's superblock is the one that counts, then it doesn't matter what current device thinks, because array will be assembled according to the "chosen" superblock. > 2/ Don't erase 'failed' status from dev_roles[] until the array is > optimal. Neil, I think both these points don't resolve the following simple scenario: RAID1 with drive A and B. Drive A fails, array continues to operate on drive B. After reboot, only drive A is accessible. If we go ahead with assemble, we will see stale data. If after reboot, we, however, see only drive A, then (since B is "faulty" in A's superblock), we can go ahead and assemble. The change I suggested will abort in the first case, but will assemble in the second case. But obviously, you know better what MD users expect and want. Thanks again for taking time and reviewing the proposal! And yes, next time, I will put everything in the email. Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html