On 14 November 2016 at 01:03, Phil Turmel <philip@xxxxxxxxxx> wrote: > Hi Bruce, > > On 11/13/2016 04:06 PM, Wols Lists wrote: >> On 13/11/16 20:51, Bruce Merry wrote: >>> On 13 November 2016 at 22:18, Anthony Youngman <antlists@xxxxxxxxxxxxxxx> wrote: >>>> Quick first response ... > >>>> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch >>> >>> smartctl reports "SCT Error Recovery Control command not supported". >>> Does that mean I should be worried? Is there any way to tell whether a >>> given drive I can buy online supports it? > > You should be worried. It is vital for proper MD raid operation that > drive timeouts be shorter than the kernel timeout for that device. If > you can't make the drive timeout short, you *must* make the kernel > timeout long. Okay, I'll give that script a go to increase my kernel timeout. If I understand correctly, it's not the end of the world if the drive doesn't support SCTERC, provided I have a long kernel timeout (and when things go wrong it might take much longer to recover, but I can live with that). Is that correct? >>> Yes, that sounds like what I need. Thanks to Google I found >>> /usr/share/mdadm/checkarray to trigger this. It still has a few hours >>> to go, but now the bad drive has pending sectors == 65535 (which is >>> suspiciously power-of-two and I assume means it's actually higher and >>> is being clamped), and /sys/block/md0/md/mismatch_cnt is currently at >>> 1408. If scrubbing is supposed to rewrite on failed reads I would have >>> expected pending sectors to go down rather than up, so I'm not sure >>> what's happening. >>> >> Ummm.... >> >> Sounds like that drive could need replacing. I'd get a new drive and do >> that as soon as possible - use the --replace option of mdadm - don't >> fail the old drive and add the new. Dunno where you're based, but 5mins >> on the internet ordering a new drive is probably time well spent. Oh don't worry, I wasted no time in ordering new drives already. > You have two other possibilities: > > 1) Swap volumes in the raid. These are known to drop unneeded writes > when the data isn't needed, even if it made it to one of the mirrors. > That makes harmless mismatches. It won't be that - I keep have separate non-RAIDed partitions for swap. > 2) Trim. Well-behaved drive firmware guarantees zeros for trimmed > sectors, but many drives return random data instead. Zing, mismatches. > It's often unhelpful with encrypted volumes, as even well-behaved > firmware can't deliver zeroed sectors *inside* the encryption. Weee, sounds like fun. I hope it's that. Is there any way to tell which blocks mismatch, so that I can tell which files are in trouble (assuming I can figure out how to map through LVM, LUKS and debuge2fs). Bruce -- Dr Bruce Merry bmerry <@> gmail <.> com http://www.brucemerry.org.za/ http://blog.brucemerry.org.za/ -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html