Raz Ben-Jehuda(caro) wrote: > I have managed to make the kernel remove a disk > from my raid even if this raid is "/" . I did it by adding a line > in ata_scsi_error that remove the ata disk from the raid array. > This means that when the first error ocurrs on a disk It is removed > from the array. > Well, this is not the best thing to do.. > Question is : > When does a disk become faulty ? When trying to read sectors from a disk and the disk fails the read: 1.) Read the data from the other disks in the RAID and 2.) Overwrite the sectors where the read error occur. If this write also fails, then the disk has used up it's spare sectors area. The RAID array is now by definition in a degraded state since that sector no longer exists in a redundant (readable at least) way. The disk should therefore be kicked, in order to notify the user that it should be replaced immediately. > Is when you have N errors in T time ? Nah, it's when you run out of spare sectors and your data redundancy is thereby lost that you have to fault the disk to prevent future data loss. Don't try to second-guess when the disk is going to get faulty based on how many errors occurred. If you want to do something like that, read out the SMART data from the disk. The manufacturer's data about the disks health should be your data source. > New ideas would be welcomed. HTH.. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html