Ceph OSD performance issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ceph-users,

 

I am having some trouble in finding the bottleneck in my CephFS Infernalis setup.

 

I am running 5 OSD servers which all have 6 OSD's each (so I have 30 OSD's in total). Each OSD is a physical disk (non SSD) and each OSD has it's journal stored on the first partition of it's own disk. I have 3 mon servers and 2 MDS servers which are setup in active / passive mode. All servers have a redundant 10G NIC's configuration.

 

I am monitoring all resources of each server (cpu/memory/network/disk usage) and I would expect that my first bottleneck would be the OSD disk speed but looking at my graphs, that is not the case. I have plenty of CPU / Memory / Network / Disk speed left but I still am not able to get a better performance. The ceph cluster states that it is healthy. I have all setting default except for the osd_op_threads. I have altered that to 20 instead of the default 2.

 

When looking to the processes on my OSD servers, you can see the expected processes running:

 

[root@XXXX ~]# ps ajxf | grep ceph-osd

 2497 25505 25504  2476 pts/0    25504 S+       0   0:00                      \_ grep --color=auto ceph-osd

    1 10051 10051 10051 ?           -1 Ssl    167 15584:14 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph

    1 11587 11587 11587 ?           -1 Ssl    167 14991:09 /usr/bin/ceph-osd -f --cluster ceph --id 4 --setuser ceph --setgroup ceph

    1 12551 12551 12551 ?           -1 Ssl    167 14687:16 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph

    1 18895 18895 18895 ?           -1 Ssl    167 3052:43 /usr/bin/ceph-osd -f --cluster ceph --id 22 --setuser ceph --setgroup ceph

    1 20788 20788 20788 ?           -1 Ssl    167 3314:31 /usr/bin/ceph-osd -f --cluster ceph --id 23 --setuser ceph --setgroup ceph

    1 27220 27220 27220 ?           -1 Ssl    167 2240:37 /usr/bin/ceph-osd -f --cluster ceph --id 26 --setuser ceph --setgroup ceph

 

 

When looking at the amount of threads that are being used for ceph osd id 5 for instance, you can see this:

 

[root@XXXX ~]# ps huH p 12551 | wc -l

349

 

I would expect that this number is variable depending on the load on the cluster. When increasing osd_op_threads to 25, I am seeing 354 threads for that osd id. So the increase is correct but what are all the other threads? Is there any easy way for mee to see if the configured max op threads is currently being reached? Or is there any other bottleneck that I am overlooking?

 

Any clear view on this would be appreciated.

 

Kind regards,

 

Davie De Smet

 

 

Davie De Smet

Director Technical Operations and Customer Services, Nomadesk

davie.desmet@xxxxxxxxxxxx

+32 9 240 10 31 (Office)

 

Join Nomadesk:  Facebook | Twitter

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux