Thank you very much for taking the time to look into this.
On 07/25/2012 06:00 PM, Phil Turmel wrote:
Piles of small reads scattered across multiple drives, and a
concentration of queued writes to /dev/sda. What's on /dev/sda?
It's not a member of the raid, so it must be some other system task
involved.
/dev/sda1 is the root filesystem. The writes were most likely by MySQL,
but I would have to run iotop to be sure.
[ The output of "lsdrv" [1] might be useful here, along with
"mdadm -D /dev/md0" and "mdadm -E /dev/[b-j]" ]
Here you go: http://pastebin.ca/2174740
MythTV is trying to flush recorded video to disk, I presume. Sync is
known to cause stalls--a great deal of work is on-going to improve
this. How old is this kernel?
After rebooting, MythTV is currently recording two shows, and the resync
is running at full speed.
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdh1[0] sdd1[9] sde1[10] sdb1[6] sdi1[7] sdc1[4]
sdf1[3] sdg1[8] sdj1[1]
6837311488 blocks super 1.2 level 6, 512k chunk, algorithm 2
[9/9] [UUUUUUUUU]
[=>...................] resync = 9.3% (91363840/976758784)
finish=1434.3min speed=10287K/sec
unused devices: <none>
atop shows the avio of all the drives to be less than 1ms, where before
they were much higher. It will run for a couple days under load just
fine, and then it will come to a halt.
It's a 3.2.21 kernel. I'm running Debian Testing, and the exact Debian
package version is:
ii linux-image-3.2.0-3-686-pae
3.2.21-3 Linux 3.2 for modern PCs
[51000.672258] [<c12c409f>] ? sysenter_do_call+0x12/0x28
[51000.672261] [<c12b0000>] ? quirk_usb_early_handoff+0x4a9/0x522
Here is some other possibly relevant info:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdh1[0] sdd1[9] sde1[10] sdb1[6] sdi1[7] sdc1[4]
sdf1[3] sdg1[8] sdj1[1]
6837311488 blocks super 1.2 level 6, 512k chunk, algorithm 2 [9/9]
[UUUUUUUUU]
[==========>..........] resync = 51.3% (501954432/976758784)
finish=28755.6min speed=275K/sec
Is this resync a weekly check, or did something else trigger it?
This is not a scheduled check. It was triggered by, I believe, an
unclean shutdown. An unclean shutdown will trigger a resync. I don't
think it used to do this, but I could be remembering wrong.
unused devices:<none>
# cat /proc/sys/dev/raid/speed_limit_min
10000
MD is unable to reach its minimum rebuild rate while other system
activity is ongoing. You might want to lower this number to see if that
gets you out of the stalls.
Or temporarily shut down mythtv.
I will try lowering those numbers next time this happens, which will
probably be within the next day or two. That's about how often this
happens.
# cat /proc/sys/dev/raid/speed_limit_max
200000
Thanks in advance!
-- Kevin
HTH,
Phil
[1] http://github.com/pturmel/lsdrv
Thanks!
-- Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html