On Wed, Apr 14, 2010 at 6:22 PM, Neil Brown <neilb@xxxxxxx> wrote: > On Wed, 14 Apr 2010 17:51:11 -0700 > Justin Maggard <jmaggard10@xxxxxxxxx> wrote: > >> On Fri, Apr 9, 2010 at 7:01 PM, Michael Evans <mjevans1983@xxxxxxxxx> wrote: >> > On Fri, Apr 9, 2010 at 6:46 PM, Justin Maggard <jmaggard10@xxxxxxxxx> wrote: >> >> On Fri, Apr 9, 2010 at 6:41 PM, Michael Evans <mjevans1983@xxxxxxxxx> wrote: >> >>> On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard <jmaggard10@xxxxxxxxx> wrote: >> >>>> Hi all, >> >>>> >> >>>> I've got a system using two RAID5 arrays that share some physical >> >>>> devices, combined using LVM. Oddly, when I "echo repair > >> >>>> /sys/block/md0/md/sync_action", once it finishes, it automatically >> >>>> starts a repair on md1 also, even though I haven't requested it. >> >>>> Also, if I try to stop it using "echo idle > >> >>>> /sys/block/md0/md/sync_action", a repair starts on md1 within a few >> >>>> seconds. If I stop that md1 repair immediately, sometimes it will >> >>>> respawn and start doing the repair again on md1. What should I be >> >>>> expecting here? If I start a repair on one array, is it supposed to >> >>>> automatically go through and do it on all arrays sharing that >> >>>> personality? >> >>>> >> >>>> Thanks! >> >>>> -Justin >> >>>> >> >>> >> >>> Is md1 degraded with an active spare? It might be delaying resync on >> >>> it until the other devices are idle. >> >> >> >> No, both arrays are redundant. I'm just trying to do scrubbing >> >> (repair) on md0; no resync is going on anywhere. >> >> >> >> -Justin >> >> >> > >> > First: Reply to all. >> > >> > Second, if you insist that things are not as I suspect: >> > >> > cat /proc/mdstat >> > >> > mdadm -Dvvs >> > >> > mdadm -Evvs >> > >> >> I insist it's something different. :) Just ran into it again on >> another system. Here's the requested output: > > Thanks. Very thorough! > > >> Apr 14 17:32:23 JMAGGARD kernel: md: requested-resync of RAID array md2 >> Apr 14 17:32:23 JMAGGARD kernel: md: minimum _guaranteed_ speed: 1000 >> KB/sec/disk. >> Apr 14 17:32:23 JMAGGARD kernel: md: using maximum available idle IO >> bandwidth (but not more than 200000 KB/sec) for requested-resync. >> Apr 14 17:32:23 JMAGGARD kernel: md: using 128k window, over a total >> of 972041296 blocks. >> Apr 14 17:32:51 JMAGGARD kernel: md: md_do_sync() got signal ... exiting >> Apr 14 17:33:35 JMAGGARD kernel: md: requested-resync of RAID array md3 > > So we see the requested-resync (repair) of md2 started as you requested, > then finished at 17:32:51 when you write 'idle' to 'sync_action'. > > Then 44 seconds later a similar repair started on md3. > 44 seconds is too long for it to be a direct consequence of the md2 repair > stopping. Something *must* have written to md3/md/sync_action. But what? > > Maybe you have "mdadm --monitor" running and it notices when repair on one > array finished and has been told to run a script (--program or PROGRAM in > mdadm.conf) which would then start a repair on the next array??? > > Seems a bit far-fetched, but I'm quite confident that some program must be > writing to md3/md/sync_action while you're not watching. > > NeilBrown Well, this is embarrassing. You're exactly right. :) Looks like it was a bug in the script run by mdadm --monitor. Thanks for the insight! -Justin -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html