ms_dispatcher of ceph-mgr 100% cpu on pacific 16.2.7

Wout van Heeswijk <wout@xxxxxxxx> · Fri, 16 Sep 2022 01:33:36 +0000

Hi Everyone,

We have a cluster that of which the manager is not working nicely. The mgrs are all very slow to respond. This initially caused them to continuously fail over.

We've disabled most of the modules. 

We’ve set the following which seemed to improve the situation a little bit but the problem came back.

ms_async_op_threads = 10
ms_async_max_op_threads = 16
mgr_stats_period = 10

However, the ms_dispatch thread is at 99.9% cpu all the time. If we fail the manager it will be 99.9% on the new mgr. We has restarted all mon and mgr daemons.

The perf dump shows an extreme amount of get_or_fail_fail entries.

"throttle-mgr_mon_messsages": {
        "val": 128,
        "max": 128,
        "get_started": 0,
        "get": 1191,
        "get_sum": 1191,
        "get_or_fail_fail": 188691955,
        "get_or_fail_success": 1191,
        "take": 0,
        "take_sum": 0,
        "put": 1191,
        "put_sum": 1191,
        "wait": {
            "avgcount": 0,
            "sum": 0.000000000,
            "avgtime": 0.000000000
        }

Thanks,
Wout
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx