Hi Everyone, We have a cluster that of which the manager is not working nicely. The mgrs are all very slow to respond. This initially caused them to continuously fail over. We've disabled most of the modules. We’ve set the following which seemed to improve the situation a little bit but the problem came back. ms_async_op_threads = 10 ms_async_max_op_threads = 16 mgr_stats_period = 10 However, the ms_dispatch thread is at 99.9% cpu all the time. If we fail the manager it will be 99.9% on the new mgr. We has restarted all mon and mgr daemons. The perf dump shows an extreme amount of get_or_fail_fail entries. "throttle-mgr_mon_messsages": { "val": 128, "max": 128, "get_started": 0, "get": 1191, "get_sum": 1191, "get_or_fail_fail": 188691955, "get_or_fail_success": 1191, "take": 0, "take_sum": 0, "put": 1191, "put_sum": 1191, "wait": { "avgcount": 0, "sum": 0.000000000, "avgtime": 0.000000000 } Thanks, Wout _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx