Re: Can't we use timeout based OOM warning/killing?

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Tue, 6 Oct 2015 23:51:49 +0900

Tetsuo Handa wrote:
> Sorry. This was my misunderstanding. But I still think that we need to be
> prepared for cases where zapping OOM victim's mm approach fails.
> ( http://lkml.kernel.org/r/201509242050.EHE95837.FVFOOtMQHLJOFS@xxxxxxxxxxxxxxxxxxx )

I tested whether it is easy/difficult to make zapping OOM victim's mm
approach fail. The result seems that not difficult to make it fail.

---------- Reproducer start ----------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/mman.h>

static int reader(void *unused)
{
	char c;
	int fd = open("/proc/self/cmdline", O_RDONLY);
	while (pread(fd, &c, 1, 0) == 1);
	return 0;
}

static int writer(void *unused)
{
	const int fd = open("/proc/self/exe", O_RDONLY);
	static void *ptr[10000];
	int i;
	sleep(2);
	while (1) {
		for (i = 0; i < 10000; i++)
			ptr[i] = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, fd,
				      0);
		for (i = 0; i < 10000; i++)
			munmap(ptr[i], 4096);
	}
	return 0;
}

int main(int argc, char *argv[])
{
	int zero_fd = open("/dev/zero", O_RDONLY);
	char *buf = NULL;
	unsigned long size = 0;
	int i;
	for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) {
		char *cp = realloc(buf, size);
		if (!cp) {
			size >>= 1;
			break;
		}
		buf = cp;
	}
	for (i = 0; i < 100; i++) {
		clone(reader, malloc(1024) + 1024, CLONE_THREAD | CLONE_SIGHAND | CLONE_VM,
		      NULL);
	}
	clone(writer, malloc(1024) + 1024, CLONE_THREAD | CLONE_SIGHAND | CLONE_VM, NULL);
	read(zero_fd, buf, size); /* Will cause OOM due to overcommit */
	return * (char *) NULL; /* Kill all threads. */
}
---------- Reproducer end ----------

(I wrote this program for trying to mimic a trouble that a customer's system
 hung up with a lot of ps processes blocked at reading /proc/pid/ entries
 due to unkillable down_read(&mm->mmap_sem) in __access_remote_vm(). Though
 I couldn't identify what function was holding the mmap_sem for writing...)

Uptime > 429 of http://I-love.SAKURA.ne.jp/tmp/serial-20151006.txt.xz showed
a OOM livelock that

  (1) thread group leader is blocked at down_read(&mm->mmap_sem) in exit_mm()
      called from do_exit().

  (2) writer thread is blocked at down_write(&mm->mmap_sem) in vm_mmap_pgoff()
      called from SyS_mmap_pgoff() called from SyS_mmap().

  (3) many reader threads are blocking the writer thread because of
      down_read(&mm->mmap_sem) called from proc_pid_cmdline_read().

  (4) while the thread group leader is blocked at down_read(&mm->mmap_sem),
      some of the reader threads are trying to allocate memory via page fault.

So, zapping the first OOM victim's mm might fail by chance.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>