Bug Report: BUG: Bad rss-counter state mm:ffff88101705f800 idx:1 val:512 / application segfaults / thp

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm seeing random application crashes (SIGSEV) and after a few minutes
this appears in the logfiles:

[133933.729199]
/build/linux-lts-wily-4x6IId/linux-lts-wily-4.2.0/mm/pgtable-generic.c:33:
bad pmd ffff880fd06d6200(000000018da009e2)
[133933.763015] BUG: Bad rss-counter state mm:ffff88101705f800 idx:1 val:512
[133933.763039] BUG: non-zero nr_ptes on freeing mm: 1

I'm quite certain that it's not a hardware error. The problems appears
regularly on random machines of a 100+ machine cluster of Dell
PowerEdge R720 servers with 2xXeon E5 (NUMA) and 64GB ECC Memory.

The workload is mostly Hadoop YARN with MapReduce and Spark, the JVM
(mostly from the DataNodes) crashes randomly under load with SIGSEV.

The problems appears with Kernel 4.3.0 and 4.2.7 from Ubuntu Kernel
Mainline PPA[1] and with the current 4.2 Ubuntu Wily Kernel - all of
these kernels already have a related patch[2].

However I'm still seeing the problem. The bug disappears when I
disable transparent hugepages and reboot the machines!

Before disabling transparent hugepages completely I ran this config:

   echo always > /sys/kernel/mm/transparent_hugepage/enabled
   echo never > /sys/kernel/mm/transparent_hugepage/defrag

Unfortunately I can't provide any more data at the moment. Maybe I'm
able to compile a kernel with debug options turned on over the
holidays - if you have any hints where I can help to pin this down
please tell me. On IRC
CONFIG_DEBUG_VM was recommend.

regards and thanks
Martin

1: http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=M;O=D
2: https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable.git/+/47aee4d8e314384807e98b67ade07f6da476aa75

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]