Re: RAID scrubbing

Berkey B Walker <berk@xxxxxxxxx> · Fri, 16 Apr 2010 20:19:24 -0400

Justin Maggard wrote:
On Wed, Apr 14, 2010 at 6:22 PM, Neil Brown<neilb@xxxxxxx>  wrote:

On Wed, 14 Apr 2010 17:51:11 -0700
Justin Maggard<jmaggard10@xxxxxxxxx>  wrote:

On Fri, Apr 9, 2010 at 7:01 PM, Michael Evans<mjevans1983@xxxxxxxxx>  wrote:

On Fri, Apr 9, 2010 at 6:46 PM, Justin Maggard<jmaggard10@xxxxxxxxx>  wrote:

On Fri, Apr 9, 2010 at 6:41 PM, Michael Evans<mjevans1983@xxxxxxxxx>  wrote:

On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard<jmaggard10@xxxxxxxxx>  wrote:

Hi all,

I've got a system using two RAID5 arrays that share some physical
devices, combined using LVM.  Oddly, when I "echo repair>
/sys/block/md0/md/sync_action", once it finishes, it automatically
starts a repair on md1 also, even though I haven't requested it.
Also, if I try to stop it using "echo idle>
/sys/block/md0/md/sync_action", a repair starts on md1 within a few
seconds.  If I stop that md1 repair immediately, sometimes it will
respawn and start doing the repair again on md1.  What should I be
expecting here?  If I start a repair on one array, is it supposed to
automatically go through and do it on all arrays sharing that
personality?

Thanks!
-Justin

Is md1 degraded with an active spare?  It might be delaying resync on
it until the other devices are idle.

No, both arrays are redundant.  I'm just trying to do scrubbing
(repair) on md0; no resync is going on anywhere.

-Justin

First: Reply to all.

Second, if you insist that things are not as I suspect:

cat /proc/mdstat

mdadm -Dvvs

mdadm -Evvs

I insist it's something different. :)  Just ran into it again on
another system.  Here's the requested output:

Thanks.  Very thorough!

Apr 14 17:32:23 JMAGGARD kernel: md: requested-resync of RAID array md2
Apr 14 17:32:23 JMAGGARD kernel: md: minimum _guaranteed_  speed: 1000
KB/sec/disk.
Apr 14 17:32:23 JMAGGARD kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for requested-resync.
Apr 14 17:32:23 JMAGGARD kernel: md: using 128k window, over a total
of 972041296 blocks.
Apr 14 17:32:51 JMAGGARD kernel: md: md_do_sync() got signal ... exiting
Apr 14 17:33:35 JMAGGARD kernel: md: requested-resync of RAID array md3

So we see the requested-resync (repair) of md2 started as you requested,
then finished at 17:32:51 when you write 'idle' to 'sync_action'.

Then 44 seconds later a similar repair started on md3.
44 seconds is too long for it to be a direct consequence of the md2 repair
stopping.  Something *must* have written to md3/md/sync_action.   But what?

Maybe you have "mdadm --monitor" running and it notices when repair on one
array finished and has been told to run a script (--program or PROGRAM in
mdadm.conf) which would then start a repair on the next array???

Seems a bit far-fetched, but I'm quite confident that some program must be
writing to md3/md/sync_action while you're not watching.

NeilBrown

Well, this is embarrassing.  You're exactly right. :)  Looks like it
was a bug in the script run by mdadm --monitor.  Thanks for the
insight!

-Justin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

This, I think, is a nice (and polite) ending.  Best wishes to all players.
b-

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html