Hi all,
I have a cluster with 28 nodes (all physical, 4Cores, 32GB Ram), each node has 4 OSDs for a total of 112 OSDs. Each OSD has 106 PGs (counted including replication). There are 3 MONs on this cluster.
I'm running on Ubuntu trusty with kernel 3.13.0-52-generic, with Hammer (0.94.2).
This cluster was installed with Hammer (0.94.1) and has only been upgraded to the latest available version.
On the three mons one is mostly idle, one is using ~170% CPU, and one is using ~270% CPU. They will change as I restart the process (usually the idle one is the one with the lowest uptime).
Running a perf top againt the ceph-mon PID on the non-idle boxes it wields something like this:
4.62% libpthread-2.19.so [.] pthread_mutex_unlock
3.95% libpthread-2.19.so [.] pthread_mutex_lock
3.91% libsoftokn3.so [.] 0x000000000001db26
2.38% [kernel] [k] _raw_spin_lock
2.09% libtcmalloc.so.4.1.2 [.] operator new(unsigned long)
1.79% ceph-mon [.] DispatchQueue::enqueue(Message*, int, unsigned long)
1.62% ceph-mon [.] RefCountedObject::get()
1.58% libpthread-2.19.so [.] pthread_mutex_trylock
1.32% libtcmalloc.so.4.1.2 [.] operator delete(void*)
1.24% libc-2.19.so [.] 0x0000000000097fd0
1.20% ceph-mon [.] ceph::buffer::ptr::release()
1.18% ceph-mon [.] RefCountedObject::put()
1.15% libfreebl3.so [.] 0x00000000000542a8
1.05% [kernel] [k] update_cfs_shares
1.00% [kernel] [k] tcp_sendmsg
The cluster is mostly idle, and it's healthy. The store is 69MB big, and the MONs are consuming around 700MB of RAM.
Any ideas on this situation? Is it safe to ignore?
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com