Re: ps lockups, cgroup memory reclaim

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 17, 2013 at 04:50:42PM +0100, Mark Hills wrote:
> I'm investigating intermitten kernel lockups in an HPC environment, with 
> the RedHat kernel.
> 
> The symptoms are seen as lockups of multiple ps commands, with one 
> consuming full CPU:
> 
>   # ps aux | grep ps
>   root     19557 68.9  0.0 108100   908 ?        D    Sep16 1045:37 ps --ppid 1 -o args=
>   root     19871  0.0  0.0 108100   908 ?        D    Sep16   0:00 ps --ppid 1 -o args=
> 
> SIGKILL on the busy one causes the other ps processes to run to completion 
> (TERM has no effect).

Any chance you can get to the stack of the non-busy blocked tasks?

It would be /proc/19871/stack in this case.

> In this case I was able to run my own ps to see the process list, but not 
> always.
> 
> perf shows the locality of the spinning, roughly:
> 
>   proc_pid_cmdline
>   get_user_pages
>   handle_mm_fault
>   mem_cgroup_try_charge_swapin
>   mem_cgroup_reclaim
> 
> There are two entry points, the codepaths taken are better shown by the 
> attached profile of CPU time.

Looks like it's spinning like crazy in shrink_mem_cgroup_zone().
Maybe an LRU counter underflow, maybe endlessly looping on the
should_continue_reclaim() compaction condition.  But I don't see an
obvious connection to why killing the busy task wakes up the blocked
one.

So yeah, it would be helpful to know what that task is waiting for.

> We've had this behaviour since switching to Scientific Linux 6 (based on 
> RHEL6, like CentOS) at kernel 2.6.32-279.9.1.el6.x86_64.
> 
> The example above is kernel 2.6.32-358.el6.x86_64.

Can you test with the debug build?  That should trap LRU counter
underflows at least.  If you have the possibility to recompile the
distribution kernel I can provide you with debug patches.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]