Uneven CPU usage on OSD nodes

"fred@xxxxxxxxxx" <fred@xxxxxxxxxx> · Wed, 18 Mar 2015 15:18:51 +0100

Hi to the ceph-users list !

We're setting up a new Ceph infrastructure :
- 1 MDS admin node
- 4 OSD storage nodes (60 OSDs)
  each of them running a monitor
- 1 client

Each 32GB RAM/16 cores OSD node supports 15 x 4TB SAS OSDs (XFS) and 1 
SSD with 5GB journal partitions, all in JBOD attachement.
Every node has 2x10Gb LACP attachement.
The OSD nodes are freshly installed with puppet then from the admin node
Default OSD weight in the OSD tree
1 test pool with 4096 PGs

During setup phase, we're trying to qualify the performance 
characteristics of our setup.
Rados benchmark are done from a client with these commandes :
rados -p pool -b 4194304 bench 60 write -t 32 --no-cleanup
rados -p pool -b 4194304 bench 60 seq -t 32 --no-cleanup

Each time we observed a recurring phenomena : 2 of the 4 OSD nodes have 
twice the CPU load :
http://www.4shared.com/photo/Ua0umPVbba/UnevenLoad.html
(What to look at is the real-time %CPU and the cumulated CPU time per 
ceph-osd process)

And after a fresh complete reinstall to be sure, this twice-as-high CPU 
load is observed but not on the same 2 nodes :
http://www.4shared.com/photo/2AJfd1B_ba/UnevenLoad-v2.html

Nothing obvious about the installation seems able to explain that.

The crush distribution function doesn't have more than 4.5% inequality 
between the 4 OSD nodes for the primary OSDs of the objects, and less 
than 3% between the hosts if we considere the whole acting sets for the 
objects used during the benchmark. And the differences are not 
accordingly comparable to the CPU loads. So the cause has to be elsewhere.

I cannot be sure it has no impact on performance. Even if we have enough 
CPU cores headroom, logic would say it has to have some consequences on 
delays and also on performances .

Would someone have any idea, or reproduce the test on its setup to see 
if this is a common comportment ?

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com