On Tue, 18 May 2010 11:30:16 +1000 Neil Brown <neilb@xxxxxxx> wrote: > On Mon, 17 May 2010 20:10:36 +0200 > Pierre Vignéras <pierre@xxxxxxxxxxxxx> wrote: > > > Did I miss something, or is there something really strange happening there? > > Something strange... > I cannot explain the 'SpareActive' messages. Actually I can explain that I think. When a device fails it gets marked as faulty, then as soon as there is no more pending IO it gets moved out of the array. "mdadm -D" will show it with a larger 'Number' and a 'RaidDevice' of '-'. Normally these happen almost as a single operation, though a lot of pending IO can slow it down. "mdadm --monitor" identified devices based on 'Number', so it would normally see a working device disappear - which is reported a a failure, and a 'faulty/spare' device appear, which it ignores. However if --monitor gets to check the array between the above to events, it will first see that the working drive is now faulty, so it reports a failure, and then see that the faulty device isn't faulty any more and in fact isn't even there. The "isn't event there" bit doesn't register and it treats it as 'SpareActive'. I should fix that. So I'm quite sure now that your devices didn't really become spares until you removed and added them, which is exactly they way to turn failed devices into spares. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html