Hi Patrick,
We are using the CentOS 7.4 kernel (3.10.0-693.5.2.el7.x86_64). The
nodes in question have a fairly large amount of RAM (512GB), I don't see
any evidence in any logs that the nodes ran out of memory (no OOM
killer, and we have a small amount of swap that is used to catch memory
pressure which is completely unused). I do sometimes see the ceph-fuse
processes grow in size up towards 20-30GB of RSS (due to the memory bug
that has a fix on the way), but even then, the nodes are far from out of
memory.
I'll set some closer memory monitoring up for the next crash to be
definite about it.
Andras
On 11/27/2017 06:06 PM, Patrick Donnelly wrote:
Hello Andras,
On Mon, Nov 27, 2017 at 2:31 PM, Andras Pataki
<apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
After upgrading to the Luminous 12.2.1 ceph-fuse client, we've seen clients
on various nodes randomly crash at the assert
FAILED assert(0 == "failed to remount for kernel dentry trimming")
with the stack:
ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous
(stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x110) [0x55555584ad80]
2: (C_Client_Remount::finish(int)+0xcf) [0x5555557e7fff]
3: (Context::complete(int)+0x9) [0x5555557e3dc9]
4: (Finisher::finisher_thread_entry()+0x198) [0x555555849d18]
5: (()+0x7e25) [0x7ffff60a3e25]
6: (clone()+0x6d) [0x7ffff4f8234d]
What kernel version are you using? We have seen instances of this
error recently. It may be related to [1]. Are you running out of
memory on these machines?
[1] http://tracker.ceph.com/issues/17517
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com