[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 09, 2016 at 06:30:16AM -0400, Jan Stancek wrote:
 > Hi,
 > 
 > I'm running v1.6-643-gecea2b06d5f3 on RHEL7.3 and I'm seeing an issue
 > where all child processes stall and none of them is getting killed.
 > They are usually in a syscalls like read, recv, nanosleep, etc.
 > 
 > I suspect this commit introduced the problem, because any syscall
 > that started but not completed is now considered to "make progress":
 > 
 >   commit ecf6dfd83d4c886d78d4605163cb8c3f1728db62
 >   Author: Dave Jones <davej@xxxxxxxxxxxxxxxxx>
 >   Date:   Fri Aug 12 15:05:01 2016 -0400
 > 
 >     if we haven't done a syscall yet, treat child as "making progress".
 >     
 >     Chances are that we haven't been scheduled because some other
 >     children are hogging the cpu.
 > 
 > I'm seeing more the opposite of what commit above says. Most CPUs
 > are idle, because N-1 children are stuck in recv/read/...
 > and last child manages to keep going. Then by a chance it also hits
 > a syscall that doesn't complete and system stays idle
 > (after ~hour I gave up waiting).

Need to think some more on this, but as a quick guess...
try replacing the <= BEFORE with < BEFORE

I'll try and find some time to look into this soon. I'm surprised I
haven't also seen it happen though.  How many CPUs & how many child
processes ?

	Dave

--
To unsubscribe from this list: send the line "unsubscribe trinity" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux