[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> From: "Dave Jones" <davej@xxxxxxxxxxxxxxxxx>
> To: "Jan Stancek" <jstancek@xxxxxxxxxx>
> Cc: trinity@xxxxxxxxxxxxxxx
> Sent: Saturday, 10 September, 2016 3:46:30 AM
> Subject: Re: [bug] child processes stall forever and don't get killed
> 
> On Fri, Sep 09, 2016 at 10:16:17AM -0400, Jan Stancek wrote:
>  
>  > >  > I'm seeing more the opposite of what commit above says. Most CPUs
>  > >  > are idle, because N-1 children are stuck in recv/read/...
>  > >  > and last child manages to keep going. Then by a chance it also hits
>  > >  > a syscall that doesn't complete and system stays idle
>  > >  > (after ~hour I gave up waiting).
>  > > 
>  > > Need to think some more on this, but as a quick guess...
>  > > try replacing the <= BEFORE with < BEFORE
>  > 
>  > I've started new test with patch above reverted and that looks good
>  > so far. No stalls after 1 hour. Previously it stalled after ~20-30
>  > minutes. I noticed that when syscall stat messages (those which show
>  > number of iteration) stopped appearing.
> 
> Ok, I committed that, but with a minor change to widen how long we spend
> in BEFORE state slightly. I doubt that part will have a negative effect,
> but holler if it does..

I applied this patch and I haven't seen stalls in over-night test.

Thanks,
Jan

> 
>  > > I'll try and find some time to look into this soon. I'm surprised I
>  > > haven't also seen it happen though.  How many CPUs & how many child
>  > > processes ?
>  > 
>  > Anywhere from 2-8 CPUs, 8-32 children on x86_64, ppc64le and s390x
>  > systems (RHEL7.3 Beta). It happened usually within 20-30 minutes.
> 
> Weird. I'm doing 24/7 runs on one quad core and didn't hit it.
> But I wonder if I was just fortunate enough that I had some children
> always making progress even if N-1 were stuck.
> 
> 	Dave
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe trinity" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux