Hi Greg, the low-/high-CPU comportement is absolutely persistent while a host is UP, no oscillation. But rebooting a node can make its comportment switch low-/high-CPU, as seen this morning after checking the BIOS settings (especially numa) were the same on 2 hosts. Hosts are identical, puppetized and dedicated to their OSD-node role. I don't know if that's a possibility, but third way : the tools collect/deliver wrong informations and don't show all the CPU cycles implied Frederic Gregory Farnum <greg@xxxxxxxxxxx> a écrit le 23/03/15 15:04 : On Mon, Mar 23, 2015 at 4:31 AM, fred@xxxxxxxxxx <fred@xxxxxxxxxx> wrote:Hi Somnath, Thank you, please find my answers below Somnath Roy <Somnath.Roy@xxxxxxxxxxx> a écrit le 22/03/15 18:16 : Hi Frederick, Need some information here. 1. Just to clarify, you are saying it is happening g in 0.87.1 and not in Firefly ? That's a possibility, others running similar hardware (and possibly OS, I can ask) confirm they dont have such visible comportment on Firefly. I'd need to install Firefly on our hosts to be sure. We run on RHEL. 2. Is it happening after some hours of run or just right away ? It's happening on freshly installed hosts and goes on. 3. Please provide ‘perf top’ output of all the OSD nodes. Here they are : http://www.4shared.com/photo/S9tvbNKEce/UnevenLoad3-perf.html http://www.4shared.com/photo/OHfiAtXKba/UnevenLoad3-top.html The left-hand 'high-cpu' nodes have tmalloc calls able to explain the cpu difference. We don't see them on 'low-cpu' nodes : 12,15% libtcmalloc.so.4.1.2 [.] tcmalloc::CentralFreeList::FetchFromSpansHuh. The tcmalloc (memory allocator) workload should be roughly the same across all nodes, especially if they have equivalent distributions of PGs and primariness as you describe. Are you sure this is a persistent CPU imbalance or are they oscillating? Are there other processes on some of the nodes which could be requiring memory from the system? Either you've found a new bug in our memory allocator or something else is going on in the system to make it behave differently across your nodes. -Greg |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com