Re: [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Sat, 23 Jan 2016 15:30:31 +0900

Jan Stancek wrote:
> On 01/19/2016 11:29 AM, Tetsuo Handa wrote:
> > although I
> > couldn't find evidence that mlock() and madvice() are related with this hangup,
> 
> I simplified reproducer by having only single thread allocating
> memory when OOM triggers:
>   http://jan.stancek.eu/tmp/oom_hangs/console.log.3-v4.4-8606-with-memalloc.txt
> 
> In this instance it was mmap + mlock, as you can see from oom call trace.
> It made it to do_exit(), but couldn't complete it:

Thank you for retaking.

Comparing console.log.2-v4.4-8606-with-memalloc_wc.txt.bz2 and
console.log.3-v4.4-8606-with-memalloc.txt :

  different things

    Free swap  = 0kB for the former
    Free swap  = 7556632kB for the latter

  common things

    All stalling allocations are order 0.
    Swap cache stats: stopped increasing
    Node 0 Normal free: remained below min:
    A kworker got stuck inside 0x2400000 (GFP_NOIO) allocation within 1 second
    after other allocations (0x24280ca (GFP_HIGHUSER_MOVABLE) or 0x24201ca
    (GFP_HIGHUSER_MOVABLE | __GFP_COLD)) got stuck.

----------
[ 6904.555880] MemAlloc-Info: 2 stalling task, 0 dying task, 0 victim task.
[ 6904.563387] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=10001
[ 6904.571353] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=10001
[ 6915.195869] MemAlloc-Info: 16 stalling task, 0 dying task, 0 victim task.
[ 6915.203458] MemAlloc: systemd-journal(592) seq=33409 gfp=0x24201ca order=0 delay=20495
[ 6915.212300] MemAlloc: NetworkManager(807) seq=42042 gfp=0x24200ca order=0 delay=12030
[ 6915.221042] MemAlloc: gssproxy(815) seq=1551 gfp=0x24201ca order=0 delay=19414
[ 6915.229104] MemAlloc: irqbalance(825) seq=6763 gfp=0x24201ca order=0 delay=11234
[ 6915.237363] MemAlloc: tuned(1339) seq=74664 gfp=0x24201ca order=0 delay=20354
[ 6915.245329] MemAlloc: top(10485) seq=486624 gfp=0x24201ca order=0 delay=20124
[ 6915.253288] MemAlloc: kworker/1:1(20708) seq=48 gfp=0x2400000 order=0 delay=20248
[ 6915.261640] MemAlloc: sendmail(21855) seq=207 gfp=0x24201ca order=0 delay=19977
[ 6915.269800] MemAlloc: oom01(22007) seq=2 gfp=0x24201ca order=0 delay=20269
[ 6915.277466] MemAlloc: oom01(22008) seq=5659 gfp=0x24280ca order=0 delay=20502
[ 6915.285432] MemAlloc: oom01(22009) seq=5189 gfp=0x24280ca order=0 delay=20502
[ 6915.293389] MemAlloc: oom01(22010) seq=4795 gfp=0x24280ca order=0 delay=20502
[ 6915.301353] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=20641
[ 6915.309316] MemAlloc: oom01(22012) seq=3828 gfp=0x24280ca order=0 delay=20502
[ 6915.317280] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=20641
[ 6915.325244] MemAlloc: oom01(22014) seq=3633 gfp=0x24280ca order=0 delay=20502
----------
[19394.048063] MemAlloc-Info: 1 stalling task, 0 dying task, 0 victim task.
[19394.055562] MemAlloc: systemd-journal(22961) seq=151917 gfp=0x24201ca order=0 delay=10001
[19404.625516] MemAlloc-Info: 10 stalling task, 0 dying task, 0 victim task.
[19404.633107] MemAlloc: auditd(783) seq=615 gfp=0x24201ca order=0 delay=15101
[19404.640877] MemAlloc: irqbalance(806) seq=8107 gfp=0x24201ca order=0 delay=18440
[19404.649135] MemAlloc: NetworkManager(820) seq=10854 gfp=0x24200ca order=0 delay=19527
[19404.657874] MemAlloc: gssproxy(826) seq=586 gfp=0x24201ca order=0 delay=18487
[19404.665841] MemAlloc: tuned(1337) seq=40098 gfp=0x24201ca order=0 delay=19900
[19404.673805] MemAlloc: crond(2242) seq=5612 gfp=0x24201ca order=0 delay=15329
[19404.681674] MemAlloc: systemd-journal(22961) seq=151917 gfp=0x24201ca order=0 delay=20579
[19404.690796] MemAlloc: sendmail(31908) seq=7256 gfp=0x24200ca order=0 delay=17633
[19404.699051] MemAlloc: kworker/2:2(32161) seq=9 gfp=0x2400000 order=0 delay=19889
[19404.707306] MemAlloc: oom01(32704) seq=6391 gfp=0x24200ca order=0 delay=19164 exiting
----------

Does somebody know whether GFP_HIGHUSER_MOVABLE depend on workqueue status?

   * GFP_HIGHUSER_MOVABLE is for userspace allocations that the kernel does not
   *   need direct access to but can use kmap() when access is required. They
   *   are expected to be movable via page reclaim or page migration. Typically,
   *   pages on the LRU would also be allocated with GFP_HIGHUSER_MOVABLE.

I don't have reproducer environment. But if this problem involves workqueue,
running kernel module below which requests GFP_NOIO allocation more frequently
than disk_check_events() does might help reproducing this problem.

---------- test/wq_test.c ----------
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/kthread.h>
#include <linux/delay.h>

static void wq_test_fn(struct work_struct *work);
static struct task_struct *task;
static bool pending;
static DECLARE_WORK(wq_test_work, wq_test_fn);

static void wq_test_fn(struct work_struct *unused)
{
	kfree(kmalloc(PAGE_SIZE, GFP_NOIO));
	pending = false;
}

static int wq_test_thread(void *unused)
{
	while (!kthread_should_stop()) {
		msleep(HZ / 10);
		pending = true;
		queue_work(system_freezable_power_efficient_wq, &wq_test_work);
		while (pending)
			msleep(1);
	}
	return 0;
}

static int __init wq_test_init(void)
{
	task = kthread_run(wq_test_thread, NULL, "wq_test");
	return IS_ERR(task) ? -ENOMEM : 0;
}

static void __exit wq_test_exit(void)
{
	kthread_stop(task);
	ssleep(1);
}

module_init(wq_test_init);
module_exit(wq_test_exit);
MODULE_LICENSE("GPL");
---------- test/wq_test.c ----------
---------- test/Makefile ----------
obj-m += wq_test.o
---------- test/Makefile ----------

$ make SUBDIRS=$PWD/test
# insmod test/wq_test.ko

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>