Hi Ceph-users, I am having some trouble in finding the bottleneck in my CephFS Infernalis setup. I am running 5 OSD servers which all have 6 OSD's each (so I have 30 OSD's in total). Each OSD is a physical disk (non SSD) and each OSD
has it's journal stored on the first partition of it's own disk. I have 3 mon servers and 2 MDS servers which are setup in active / passive mode. All servers have a redundant 10G NIC's configuration. I am monitoring all resources of each server (cpu/memory/network/disk usage) and I would expect that my first bottleneck would be the OSD
disk speed but looking at my graphs, that is not the case. I have plenty of CPU / Memory / Network / Disk speed left but I still am not able to get a better performance. The ceph cluster states that it is healthy. I have all setting default except for the osd_op_threads.
I have altered that to 20 instead of the default 2. When looking to the processes on my OSD servers, you can see the expected processes running: [root@XXXX ~]# ps ajxf | grep ceph-osd 2497 25505 25504 2476 pts/0 25504 S+ 0 0:00 \_ grep --color=auto ceph-osd 1 10051 10051 10051 ? -1 Ssl 167 15584:14 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph 1 11587 11587 11587 ? -1 Ssl 167 14991:09 /usr/bin/ceph-osd -f --cluster ceph --id 4 --setuser ceph --setgroup ceph 1 12551 12551 12551 ? -1 Ssl 167 14687:16 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph 1 18895 18895 18895 ? -1 Ssl 167 3052:43 /usr/bin/ceph-osd -f --cluster ceph --id 22 --setuser ceph --setgroup ceph 1 20788 20788 20788 ? -1 Ssl 167 3314:31 /usr/bin/ceph-osd -f --cluster ceph --id 23 --setuser ceph --setgroup ceph 1 27220 27220 27220 ? -1 Ssl 167 2240:37 /usr/bin/ceph-osd -f --cluster ceph --id 26 --setuser ceph --setgroup ceph When looking at the amount of threads that are being used for ceph osd id 5 for instance, you can see this: [root@XXXX ~]# ps huH p 12551 | wc -l 349 I would expect that this number is variable depending on the load on the cluster. When increasing osd_op_threads to 25, I am seeing 354
threads for that osd id. So the increase is correct but what are all the other threads? Is there any easy way for mee to see if the configured max op threads is currently being reached? Or is there any other bottleneck that I am overlooking? Any clear view on this would be appreciated. Kind regards, Davie De Smet Davie De Smet Director Technical Operations and Customer Services, Nomadesk +32 9 240 10 31 (Office) Join Nomadesk: Facebook | Twitter |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com