Re: Uneven CPU usage on OSD nodes

"fred@xxxxxxxxxx" <fred@xxxxxxxxxx> · Mon, 23 Mar 2015 15:35:34 +0100

Hi Greg,

the low-/high-CPU comportement is absolutely persistent while a host is
UP, no oscillation. 

But rebooting a node can make its comportment switch low-/high-CPU, as
seen this morning after checking the BIOS settings (especially numa)
were the same on 2 hosts.

Hosts are identical, puppetized and dedicated to their OSD-node role.

I don't know if that's a possibility, but third way : the tools
collect/deliver wrong informations and don't show all the CPU cycles
implied

Frederic

Gregory Farnum <greg@xxxxxxxxxxx> a écrit le 23/03/15 15:04 :

  On Mon, Mar 23, 2015 at 4:31 AM, fred@xxxxxxxxxx <fred@xxxxxxxxxx> wrote:

    Hi Somnath,

Thank you, please find my answers below

Somnath Roy <Somnath.Roy@xxxxxxxxxxx> a écrit le 22/03/15 18:16 :

Hi Frederick,

Need some information here.

1. Just to clarify, you are saying it is happening g in 0.87.1 and not in
Firefly ?

That's a possibility, others running similar hardware (and possibly OS, I
can ask) confirm they dont have such visible comportment on Firefly.
I'd need to install Firefly on our hosts to be sure.
We run on RHEL.

2. Is it happening after some hours of run or just right away ?

It's happening on freshly installed hosts and goes on.

3. Please provide ‘perf top’ output of all the OSD nodes.

Here they are :
http://www.4shared.com/photo/S9tvbNKEce/UnevenLoad3-perf.html
http://www.4shared.com/photo/OHfiAtXKba/UnevenLoad3-top.html

The left-hand 'high-cpu' nodes have tmalloc calls able to explain the cpu
difference. We don't see them on 'low-cpu' nodes :

12,15%  libtcmalloc.so.4.1.2      [.]
tcmalloc::CentralFreeList::FetchFromSpans

  Huh. The tcmalloc (memory allocator) workload should be roughly the
same across all nodes, especially if they have equivalent
distributions of PGs and primariness as you describe. Are you sure
this is a persistent CPU imbalance or are they oscillating? Are there
other processes on some of the nodes which could be requiring memory
from the system?

Either you've found a new bug in our memory allocator or something
else is going on in the system to make it behave differently across
your nodes.
-Greg

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com