On 09/27/2016 11:45 AM, Wols Lists wrote: > > I'm thinking about all this. The second section is all about recovering > a failing/ed array, and is new. The first section is the original, > that's being updated. It just feels totally wrong to me now, as it's > becoming a jumbled mess of old and new. > > What I'm probably going to do, is create a new first section about > setting up a raid system. That means that a section on monitoring will > actually make sense and fit between setting it up, and fixing problems. > > (And all the old stuff will end up in the "software archaeology" > section, so people who are still running ancient systems can find it :-) > That would be awesome. There was a shell script out there already for MUNIN, but I modified it a little to add thresholds that throw up flags. I might change some more to handle different thresholds for different devices or the ability to monitor only RAIDs that matter. I have smartctl running for all my drives -- but that doesn't help me at the mdadm level. While you're in the docs adding stuff about mismatch_cnt, is there anything that can help someone backtrace which block cause the count to go up? This would help us mere mortals maybe go back to inspect a block or a file or something to make sure it's not corrupted. -Ben -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html