Re: High CPU usage by ceph-mgr in 14.2.6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I can report similar results, although it's probably not just due to cluster size.

Our cluster has 1248 OSDs at the moment and we have three active MDSs to spread the metadata operations evenly. However, I noticed that it isn't spread evenly at all. Usually, it's just one MDS (in our case mds.1) which handles most of the load and slowing down the others as a result. What we see is a significantly higher latency curve for this one MDS than for the other two. All MDSs operate at 100-150% CPU utilisation when multiple clients (we have up to 320) are actively reading or writing data (note: we have quite an uneven data distribution, so directory pinning isn't really an option).

In the end, it turned out that some clients were running updatedb processes which tried to index the CephFS. After fixing that, the constant request load went down and with it the CPU load on the MDSs, but the underlying problem isn't solved of course. We just don't have any clients constantly operating on some of our largest directories anymore.


On 29/01/2020 20:28, Neha Ojha wrote:
Hi Joe,

Can you grab a wallclock profiler dump from the mgr process and share
it with us? This was useful for us to get to the root cause of the
issue in 14.2.5.

Quoting Mark's suggestion from " High CPU usage by
ceph-mgr in 14.2.5" below.

If you can get a wallclock profiler on the mgr process we might be able
to figure out specifics of what's taking so much time (ie processing
pg_summary or something else).  Assuming you have gdb with the python
bindings and the ceph debug packages installed, if you (are anyone)
could try gdbpmp on the 100% mgr process that would be fantastic.


https://github.com/markhpc/gdbpmp


gdbpmp.py -p`pidof ceph-mgr` -n 1000 -o mgr.gdbpmp


If you want to view the results:


gdbpmp.py -i mgr.gdbpmp -t 1

Thanks,
Neha



On Wed, Jan 29, 2020 at 7:35 AM <jbardgett@xxxxxxxxxxx> wrote:
Modules that are normally enabled:

ceph mgr module ls | jq -r '.enabled_modules'
[
   "dashboard",
   "prometheus",
   "restful"
]

We did test with all modules disabled, restarted the mgrs and saw no difference.

Joe
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux