On 08/08/2013 01:04 AM, NeilBrown wrote: > On Wed, 07 Aug 2013 23:03:47 +0200 Martin Wilck <mwilck@xxxxxxxx> wrote: > >> Hi Neil, everyone, >> >> I'd like to get comments on the following concept for handling missing >> disks. Currently we handle "Missing" and "Failed" almost equally, that's >> not optimal. >> >> 1. Try to detect missing disks and mark them so in the meta data. >> ddf_open_new() could do this. At the latest, it must be done before >> writing the meta data (write_super or sync_metadata). >> >> 2. Detection of a missing disk should not force an immediate meta data >> write, because the disk may be added soon. It's sufficient to mark a >> disk as missing when we write the meta data for other reasons. >> >> 3. Try to be smart when loading headers. A disk that was missing in a >> previous run must have a lower seq number and time stamp in the meta >> data than the other disks, and must be marked "Missing" there (but not >> "Failed"). The meta data on the preciously missing disk should mark it >> as "active/Online". In this case, use the newer meta data, and try to >> re-integrate the previously missing disk (i.e. treat it as preferred >> global spare). >> >> 4. It's possible that an array wasn't written to while a disk had been >> missing. In that case the disk could be re-added to the array without >> recovery. The question is if we can detect this situation. I thought >> first the "Not Consistent" bit might be usable for that, but I doubt it >> now. The spec isn't clear about whether "consistent" state means >> consistency over all configured disks or only over present ones. >> > > Hi Martin. > Thanks for looking into this. > > Can you say why exactly that treating 'Missing' like 'Failed' is not > optimal? I find them to be very similar concepts. The DDF spec says that "Failed" means a disk on which (a certain number of) IO errors were observed. "Missing" may be a disk that simply wasn't present during boot (or wasn't included during container assembly, for that matter). When a missing but otherwise healthy disk reappears, it makes sense to use it as a spare. This is obviously not right for a failed disk. Thus it is important to distinguish "Failed|Missing" from "Missing" alone. There is a large grey zone - a disk may be missing because of IO errors during device detection. A real-life reason for "missing" disks may be fabric problems e.g. for iSCSI targets or FC targets. More often than not, these targets work just fine after the network problem was fixed. Admittedly, if the fabric problems occur during RAID operation, MD has no way to tell fabric from disk problems and must mark the disk as "Failed". > Your idea for marking a missing device that reappears as a preferred spare is > probably a good one - is that incompatible with treating it as failed while > it isn't present? > > As soon as we mark an array as active (aka Not Consistent) we need to mark > any missing devices as 'failed' in some way to ensure that the data is never > seen as valid in the array. Before that we can certainly be lazy about > translating 'missing' to 'failed'... Unfortunately that won't work - "Failed" indicates observed IO errors. Interpreting "Missing|Failed" as "missing in an active array" would break these semantics. > Maybe if you could give some more detail on the problem scenario?? The most likely scenario would be with networked storage. IMSM seems to have some very specific semantics for disks missing at container creation time that I don't fully understand (see tests/09imsm-assemble). Martin > > Thanks, > NeilBrown > -- Dr. Martin Wilck PRIMERGY System Software Engineer x86 Server Engineering FUJITSU Fujitsu Technology Solutions GmbH Heinz-Nixdorf-Ring 1 33106 Paderborn, Germany Phone: ++49 5251 525 2796 Fax: ++49 5251 525 2820 Email: martin.wilck@xxxxxxxxxxxxxx Internet: http://ts.fujitsu.com Company Details: http://ts.fujitsu.com/imprint -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html