Re: High CPU usage by ceph-mgr in 14.2.6

Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> · Thu, 30 Jan 2020 09:35:07 +0100

I can report similar results, although it's probably not just due to 
cluster size.

Our cluster has 1248 OSDs at the moment and we have three active MDSs to 
spread the metadata operations evenly. However, I noticed that it isn't 
spread evenly at all. Usually, it's just one MDS (in our case mds.1) 
which handles most of the load and slowing down the others as a result. 
What we see is a significantly higher latency curve for this one MDS 
than for the other two. All MDSs operate at 100-150% CPU utilisation 
when multiple clients (we have up to 320) are actively reading or 
writing data (note: we have quite an uneven data distribution, so 
directory pinning isn't really an option).

In the end, it turned out that some clients were running updatedb 
processes which tried to index the CephFS. After fixing that, the 
constant request load went down and with it the CPU load on the MDSs, 
but the underlying problem isn't solved of course. We just don't have 
any clients constantly operating on some of our largest directories anymore.

On 29/01/2020 20:28, Neha Ojha wrote:
Hi Joe,

Can you grab a wallclock profiler dump from the mgr process and share
it with us? This was useful for us to get to the root cause of the
issue in 14.2.5.

Quoting Mark's suggestion from " High CPU usage by
ceph-mgr in 14.2.5" below.

If you can get a wallclock profiler on the mgr process we might be able
to figure out specifics of what's taking so much time (ie processing
pg_summary or something else).  Assuming you have gdb with the python
bindings and the ceph debug packages installed, if you (are anyone)
could try gdbpmp on the 100% mgr process that would be fantastic.

https://github.com/markhpc/gdbpmp

gdbpmp.py -p`pidof ceph-mgr` -n 1000 -o mgr.gdbpmp

If you want to view the results:

gdbpmp.py -i mgr.gdbpmp -t 1

Thanks,
Neha

On Wed, Jan 29, 2020 at 7:35 AM <jbardgett@xxxxxxxxxxx> wrote:
Modules that are normally enabled:

ceph mgr module ls | jq -r '.enabled_modules'
[
   "dashboard",
   "prometheus",
   "restful"
]

We did test with all modules disabled, restarted the mgrs and saw no difference.

Joe
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx