High osd cpu usage

Alon Avrahami <alonavrahami.isr@xxxxxxxxx> · Wed, 8 Nov 2017 15:13:19 +0200

Hello Guys 
We  have a fresh 'luminous'  (  12.2.0 ) (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)   ( installed using ceph-ansible ) 

the cluster contains 6 *  Intel  server board  S2600WTTR  (  96 osds and  3 mons ) 

We have 6 nodes  ( Intel server board  S2600WTTR ) , Mem - 64G , CPU  -> Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz , 32 cores .
Each server   has 16 * 1.6TB  Dell SSD drives ( SSDSC2BB016T7R )  , total of 96 osds , 3 mons 

The main usage  is rbd's for our  OpenStack environment ( Okata ) 

We're at the beginning of our production tests and it looks like the  osd's are too busy although  we don't generate  too much iops at this stage ( almost nothing ) 
All ceph-osds using 50% of CPU usage and I can't figure out why are they so busy :

top - 07:41:55 up 49 days,  2:54,  2 users,  load average: 6.85, 6.40, 6.37

Tasks: 518 total,   1 running, 517 sleeping,   0 stopped,   0 zombie
%Cpu(s): 14.8 us,  4.3 sy,  0.0 ni, 80.3 id,  0.0 wa,  0.0 hi,  0.6 si,  0.0 st
KiB Mem : 65853584 total, 23953788 free, 40342680 used,  1557116 buff/cache
KiB Swap:  3997692 total,  3997692 free,        0 used. 18020584 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  36713 ceph      20   0 3869588 2.826g  28896 S  47.2  4.5   6079:20 ceph-osd
  53981 ceph      20   0 3998732 2.666g  28628 S  45.8  4.2   5939:28 ceph-osd
  55879 ceph      20   0 3707004 2.286g  28844 S  44.2  3.6   5854:29 ceph-osd
  46026 ceph      20   0 3631136 1.930g  29100 S  43.2  3.1   6008:50 ceph-osd
  39021 ceph      20   0 4091452 2.698g  28936 S  42.9  4.3   5687:39 ceph-osd
  47210 ceph      20   0 3598572 1.871g  29092 S  42.9  3.0   5759:19 ceph-osd
  52763 ceph      20   0 3843216 2.410g  28896 S  42.2  3.8   5540:11 ceph-osd
  49317 ceph      20   0 3794760 2.142g  28932 S  41.5  3.4   5872:24 ceph-osd
  42653 ceph      20   0 3915476 2.489g  28840 S  41.2  4.0   5605:13 ceph-osd
  41560 ceph      20   0 3460900 1.801g  28660 S  38.5  2.9   5128:01 ceph-osd
  50675 ceph      20   0 3590288 1.827g  28840 S  37.9  2.9   5196:58 ceph-osd
  37897 ceph      20   0 4034180 2.814g  29000 S  34.9  4.5   4789:10 ceph-osd
  50237 ceph      20   0 3379780 1.930g  28892 S  34.6  3.1   4846:36 ceph-osd
  48608 ceph      20   0 3893684 2.721g  28880 S  33.9  4.3   4752:43 ceph-osd
  40323 ceph      20   0 4227864 2.959g  28800 S  33.6  4.7   4712:36 ceph-osd
  44638 ceph      20   0 3656780 2.437g  28896 S  33.2  3.9   4793:58 ceph-osd
  61639 ceph      20   0  527512 114300  20988 S   2.7  0.2   2722:03 ceph-mgr
  31586 ceph      20   0  765672 304140  21816 S   0.7  0.5 409:06.09 ceph-mon
     68 root      20   0       0      0      0 S   0.3  0.0   3:09.69 ksoftirqd/12

strace  doesn't show anything suspicious 

root@ecprdbcph10-opens:~# strace -p 36713
strace: Process 36713 attached
futex(0x563343c56764, FUTEX_WAIT_PRIVATE, 1, NUL

Ceph logs don't reveal anything? 
Is this "normal" behavior in Luminous? 
Looking out in older threads I can only find a thread about time gaps which is not our case 

Thanks,
Alon
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com