Hello Tetsuo, On (03/02/16 21:01), Tetsuo Handa wrote: > I'm trying to dump information of all threads which might be relevant > to stalling inside memory allocator. But it seems to me that since this > patch changed to allow calling cond_resched() from printk() if it is > safe to do so, it is now possible that the thread which invoked the OOM > killer can sleep for minutes with the oom_lock mutex held when my dump is > in progress. I want to release oom_lock mutex as soon as possible so > that other threads can call out_of_memory() to get TIF_MEMDIE and exit > their allocations. correct, otherwise chances are that you will softlockup or RCU stall the system -- console_unlock() does not stop until the printk logbuf has data in it (which can be appended concurrently from other CPUs). this fact was the motivation for cond_resched() in console_unlock() in the first place and for the later patch that permit to some of printk callers to do the same. > So, how can I prevent printk() triggered by out_of_memory() from sleeping > for minutes with oom_lock mutex held? Guard it with preempt_disable() / > preempt_enable() ? Guard it with rcu_read_lock() / rcu_read_unlock() ? correct, both preempt_disable() and rcu_read_lock() will certainly forbid cond_reshed() in printk()->console_unlock(), but it will also increase the softlockup or/and rcu stall probability, depending on how many threads you have and how many stack traces you need to print. can we do dirty things to the frequency of cond_resched()? for example, do cond_resched() not after every printk(), but every LOG_LINE_MAX chars. kernel/printk/printk.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 9917f69..5467b1e 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2209,6 +2209,7 @@ void console_unlock(void) unsigned long flags; bool wake_klogd = false; bool do_cond_resched, retry; + int num_printed; if (console_suspended) { up_console_sem(); @@ -2229,6 +2230,7 @@ void console_unlock(void) console_may_schedule = 0; again: + num_printed = 0; /* * We released the console_sem lock, so we need to recheck if * cpu is online and (if not) is there at least one CON_ANYTIME @@ -2310,8 +2312,11 @@ skip: start_critical_timings(); local_irq_restore(flags); - if (do_cond_resched) + num_printed += len + ext_len; + if (do_cond_resched && num_printed > LOG_LINE_MAX) { cond_resched(); + num_printed = 0; + } } console_locked = 0; --- several questions, do you use some sort of oom-kill reproducer? ... the problem with printk is that you never know which once of the CPUs will do the printing job at the end; any guess there is highly unreliable. but, assuming that you have executed that oom-kill reproducer many times before the patch in question do you have any rough numbers to compare how many seconds it used to take to dump_stack all of the tasks? hm, a bit puzzled, how much does OOM add to the cond_resched() latency here? is this because at least some of the tasks we switch_to can do direct reclaims and thus we have bigger "pauses" between call_console_drivers() iterations? in that case even after offloading printk job to system_wq we still will suffer, in some sense. oom mutex is going to be released sooner, that's true, but console_unlock() will still cond_resched() after every call_console_drivers(), so the logs potentially can be printed with a huge delay. we probably cond_resched() in console_unlock() too often. I haven't noticed anything like that during my tests. -ss > ---------- > [ 460.893958] tgid=11161 invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), order=0, oom_score_adj=1000 > [ 463.892897] tgid=11161 cpuset=/ mems_allowed=0 > [ 463.894724] CPU: 1 PID: 12346 Comm: tgid=11161 Not tainted 4.5.0-rc6-next-20160302+ #318 > [ 463.897026] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 > [ 463.899841] 0000000000000286 00000000733ef955 ffff8800778a79b0 ffffffff813a2ded > [ 463.902164] 0000000000000000 ffff8800778a7be0 ffff8800778a7a50 ffffffff811c24d0 > [ 463.904474] 0000000000000206 ffffffff81810c30 ffff8800778a79f0 ffffffff810bf839 > [ 463.906801] Call Trace: > [ 463.908101] [<ffffffff813a2ded>] dump_stack+0x85/0xc8 > [ 463.909921] [<ffffffff811c24d0>] dump_header+0x5b/0x3b0 > [ 463.911759] [<ffffffff810bf839>] ? trace_hardirqs_on_caller+0xf9/0x1c0 > [ 463.913833] [<ffffffff810bf90d>] ? trace_hardirqs_on+0xd/0x10 > [ 463.915751] [<ffffffff81146f2d>] oom_kill_process+0x37d/0x570 > [ 463.918024] [<ffffffff81147366>] out_of_memory+0x1f6/0x5a0 > [ 463.919890] [<ffffffff81147424>] ? out_of_memory+0x2b4/0x5a0 > [ 463.921784] [<ffffffff8114d041>] __alloc_pages_nodemask+0xc91/0xeb0 > [ 463.923788] [<ffffffff81196726>] alloc_pages_current+0x96/0x1b0 > [ 463.925729] [<ffffffff8114188d>] __page_cache_alloc+0x12d/0x160 > [ 463.927682] [<ffffffff8114537a>] filemap_fault+0x48a/0x6a0 > [ 463.929547] [<ffffffff81145247>] ? filemap_fault+0x357/0x6a0 > [ 463.931409] [<ffffffff812b9e09>] xfs_filemap_fault+0x39/0x60 > [ 463.933255] [<ffffffff8116f19d>] __do_fault+0x6d/0x150 > [ 463.934974] [<ffffffff81175b1d>] handle_mm_fault+0xecd/0x1800 > [ 463.936791] [<ffffffff81174ca3>] ? handle_mm_fault+0x53/0x1800 > [ 463.938617] [<ffffffff8105a796>] __do_page_fault+0x1e6/0x520 > [ 463.940380] [<ffffffff8105ab00>] do_page_fault+0x30/0x80 > [ 463.942074] [<ffffffff817113e8>] page_fault+0x28/0x30 > [ 463.943750] Mem-Info: > (...snipped...) > [ 554.754959] MemAlloc: tgid=11161(12346) flags=0x400040 switches=865 seq=169 gfp=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD) order=0 delay=81602 > [ 554.754962] tgid=11161 R running task 0 12346 11056 0x00000080 > [ 554.754965] ffff8800778a7838 ffff880034510000 ffff8800778b0000 ffff8800778a8000 > [ 554.754966] 0000000000000091 ffffffff82a86fbc 0000000000000004 0000000000000000 > [ 554.754967] ffff8800778a7850 ffffffff8170a5dd 0000000000000296 ffff8800778a7860 > [ 554.754968] Call Trace: > [ 554.754974] [<ffffffff8170a5dd>] preempt_schedule_common+0x1f/0x42 > [ 554.754975] [<ffffffff8170a617>] _cond_resched+0x17/0x20 > [ 554.754978] [<ffffffff810d17b9>] console_unlock+0x509/0x5c0 > [ 554.754979] [<ffffffff810d1b93>] vprintk_emit+0x323/0x540 > [ 554.754981] [<ffffffff810d1f0a>] vprintk_default+0x1a/0x20 > [ 554.754983] [<ffffffff8114069e>] printk+0x58/0x6f > [ 554.754986] [<ffffffff813aaede>] show_mem+0x1e/0xe0 > [ 554.754988] [<ffffffff811c24ec>] dump_header+0x77/0x3b0 > [ 554.754990] [<ffffffff810bf839>] ? trace_hardirqs_on_caller+0xf9/0x1c0 > [ 554.754991] [<ffffffff810bf90d>] ? trace_hardirqs_on+0xd/0x10 > [ 554.754993] [<ffffffff81146f2d>] oom_kill_process+0x37d/0x570 > [ 554.754995] [<ffffffff81147366>] out_of_memory+0x1f6/0x5a0 > [ 554.754996] [<ffffffff81147424>] ? out_of_memory+0x2b4/0x5a0 > [ 554.754997] [<ffffffff8114d041>] __alloc_pages_nodemask+0xc91/0xeb0 > [ 554.755000] [<ffffffff81196726>] alloc_pages_current+0x96/0x1b0 > [ 554.755001] [<ffffffff8114188d>] __page_cache_alloc+0x12d/0x160 > [ 554.755003] [<ffffffff8114537a>] filemap_fault+0x48a/0x6a0 > [ 554.755004] [<ffffffff81145247>] ? filemap_fault+0x357/0x6a0 > [ 554.755006] [<ffffffff812b9e09>] xfs_filemap_fault+0x39/0x60 > [ 554.755007] [<ffffffff8116f19d>] __do_fault+0x6d/0x150 > [ 554.755009] [<ffffffff81175b1d>] handle_mm_fault+0xecd/0x1800 > [ 554.755010] [<ffffffff81174ca3>] ? handle_mm_fault+0x53/0x1800 > [ 554.755012] [<ffffffff8105a796>] __do_page_fault+0x1e6/0x520 > [ 554.755013] [<ffffffff8105ab00>] do_page_fault+0x30/0x80 > [ 554.755015] [<ffffffff817113e8>] page_fault+0x28/0x30 > ---------- > > CONFIG_PREEMPT_NONE=y > # CONFIG_PREEMPT_VOLUNTARY is not set > # CONFIG_PREEMPT is not set > CONFIG_PREEMPT_COUNT=y > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>