On Tue, May 27, 2014 at 11:53 PM, Minchan Kim <minchan@xxxxxxxxxx> wrote: > > So, my stupid idea is just let's expand stack size and keep an eye > toward stack consumption on each kernel functions via stacktrace of ftrace. We probably have to do this at some point, but that point is not -rc7. And quite frankly, from the backtrace, I can only say: there is some bad shit there. The current VM stands out as a bloated pig: > [ 1065.604404] kworker/-5766 0d..2 1071625991us : stack_trace_call: 0) 7696 16 lookup_address+0x28/0x30 > [ 1065.604404] kworker/-5766 0d..2 1071625991us : stack_trace_call: 1) 7680 16 _lookup_address_cpa.isra.3+0x3b/0x40 > [ 1065.604404] kworker/-5766 0d..2 1071625991us : stack_trace_call: 2) 7664 24 __change_page_attr_set_clr+0xe0/0xb50 > [ 1065.604404] kworker/-5766 0d..2 1071625991us : stack_trace_call: 3) 7640 392 kernel_map_pages+0x6c/0x120 > [ 1065.604404] kworker/-5766 0d..2 1071625992us : stack_trace_call: 4) 7248 256 get_page_from_freelist+0x489/0x920 > [ 1065.604404] kworker/-5766 0d..2 1071625992us : stack_trace_call: 5) 6992 352 __alloc_pages_nodemask+0x5e1/0xb20 > [ 1065.604404] kworker/-5766 0d..2 1071625995us : stack_trace_call: 23) 4672 160 __swap_writepage+0x150/0x230 > [ 1065.604404] kworker/-5766 0d..2 1071625996us : stack_trace_call: 24) 4512 32 swap_writepage+0x42/0x90 > [ 1065.604404] kworker/-5766 0d..2 1071625996us : stack_trace_call: 25) 4480 320 shrink_page_list+0x676/0xa80 > [ 1065.604404] kworker/-5766 0d..2 1071625996us : stack_trace_call: 26) 4160 208 shrink_inactive_list+0x262/0x4e0 > [ 1065.604404] kworker/-5766 0d..2 1071625996us : stack_trace_call: 27) 3952 304 shrink_lruvec+0x3e1/0x6a0 > [ 1065.604404] kworker/-5766 0d..2 1071625996us : stack_trace_call: 28) 3648 80 shrink_zone+0x3f/0x110 > [ 1065.604404] kworker/-5766 0d..2 1071625997us : stack_trace_call: 29) 3568 128 do_try_to_free_pages+0x156/0x4c0 > [ 1065.604404] kworker/-5766 0d..2 1071625997us : stack_trace_call: 30) 3440 208 try_to_free_pages+0xf7/0x1e0 > [ 1065.604404] kworker/-5766 0d..2 1071625997us : stack_trace_call: 31) 3232 352 __alloc_pages_nodemask+0x783/0xb20 > [ 1065.604404] kworker/-5766 0d..2 1071625997us : stack_trace_call: 32) 2880 8 alloc_pages_current+0x10f/0x1f0 > [ 1065.604404] kworker/-5766 0d..2 1071625997us : stack_trace_call: 33) 2872 200 __page_cache_alloc+0x13f/0x160 That __alloc_pages_nodemask() thing in particular looks bad. It actually seems not to be the usual "let's just allocate some structures on the stack" disease, it looks more like "lots of inlining, horrible calling conventions, and lots of random stupid variables". >From a quick glance at the frame usage, some of it seems to be gcc being rather bad at stack allocation, but lots of it is just nasty spilling around the disgusting call-sites with tons or arguments. A _lot_ of the stack slots are marked as "%sfp" (which is gcc'ese for "spill frame pointer", afaik). Avoiding some inlining, and using a single flag value rather than the collection of "bool"s would probably help. But nothing really trivially obvious stands out. But what *does* stand out (once again) is that we probably shouldn't do swap-out in direct reclaim. This came up the last time we had stack issues (XFS) too. I really do suspect that direct reclaim should only do the kind of reclaim that does not need any IO at all. I think we _do_ generally avoid IO in direct reclaim, but swap is special. And not for a good reason, afaik. DaveC, remind me, I think you said something about the swap case the last time this came up.. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>