On Wed, 27 Oct 2010 12:48:13 +0200 Martin Hamrle <martin.hamrle@xxxxxxxx> wrote: > > On 27.10.2010 10:01, Neil Brown wrote: > > On Wed, 27 Oct 2010 09:35:17 +0200 > > Martin Hamrle<martin.hamrle@xxxxxxxx> wrote: > > > >> Hi, > >> > >> I'm having this issue on several boxes with several configuration. > >> One of them is a box with 8 drives attached to ARC-1160 in pass through > >> mode and build sw raid5 from these drives. There is also one drive to OS. > >> > >> During resync or check and heavy IO load, process tscpd (tscpd is IO > >> load maker) hungs, the machine is still alive but there are many blocked > >> processes. > >> After tscpd hungs, IO load is generated only by resync. In traceback you > >> can see blocked processes (ps, htop cat) accessing tscpd cmdline in > >> proc. Some tscpd threads is blocked during writing files into fs on > >> raid5. Reading these files is also blocking, reading other files in > >> filesystem is fast as usual. This state takes 110 minutes. After that > >> all blocked processes continue their work. > >> > >> I am not sure what is the reason of the end of the weird state. I think > >> the end was caused by starting copying kernel source into array. > >> > >> Note that this is first time when hung processes wake up I never wait so > >> long. > >> > >> I think that it is related to sw raid because I do not see this issue on > >> hw raid or on sw raid without resync. > >> > >> kern.log contains initial "INFO: task collectd:2577 blocked for more > >> than 120 seconds" > >> and two dumps > >> echo w> /proc/sysrq-trigger > >> > >> log is located http://files.nangu.tv/kernel/kern.log > >> Let me know if you need more info. > >> > > When I try to access your kern.log I get > > > > 403 - Forbidden > Sorry about that, it is fixed now Thanks. Unfortunately it doesn't really show anything interesting. Just lots of threads waiting on locks and such, nothing that even points to a problem with md. However some of the back traces are missing. Notice the lines: Oct 19 13:15:01 osn02 kernel: [72048.851702] md: using 128k window, over a total of 244198464 blocks. Oct 19 13:38:54 osn02 kernel: 009] [<ffffffff810c7c32>] ? congestion_wait+0x66/0x80 Between those there should be quite a lot of other stack trace info, but the kernel log buffer wasn't big enough to hold everything so some got lost. If you boot with log-buf-len=1M it will make the log buffer larger so you want lose anything. That *might* be more helpful, but I cannot promise anything. NeilBrown > > > Just include it in-line in the email. > > > > NeilBrown > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html