Hi, happy new year to you! I'm running a multinode cluster with 3 MGR nodes. The issue I'm facing now is that ceph balancer <argument> runs for minutes or, in worst case, hangs. I have documented the runtime of the following executions: root@ld3955:~# date && time ceph balancer status Mon Dec 23 10:06:12 CET 2019 { "active": true, "plans": [], "mode": "upmap" } real 1m45,045s user 0m0,315s sys 0m0,026s root@ld3955:~# date && time ceph balancer status Tue Jan 7 08:11:24 CET 2020 ^CInterrupted Traceback (most recent call last): File "/usr/bin/ceph", line 1263, in <module> retval = main() File "/usr/bin/ceph", line 1194, in main verbose) File "/usr/bin/ceph", line 619, in new_style_command ret, outbuf, outs = do_command(parsed_args, target, cmdargs, sigdict, inbuf, verbose) File "/usr/bin/ceph", line 593, in do_command return ret, '', '' UnboundLocalError: local variable 'ret' referenced before assignment real 102m44,084s user 0m2,404s sys 0m1,065s root@ld3955:~# date && time ceph balancer off Tue Jan 7 09:57:36 CET 2020 real 1m45,371s user 0m0,358s sys 0m0,013s root@ld3955:~# date && time ceph balancer on Tue Jan 7 14:57:03 CET 2020 real 0m0,452s user 0m0,284s sys 0m0,020s root@ld3955:~# date && time ceph balancer status Tue Jan 7 14:57:11 CET 2020 { "active": true, "plans": [], "mode": "upmap" } real 1m52,902s user 0m0,301s sys 0m0,042s root@ld3955:~# date && time ceph balancer off Wed Jan 8 08:49:26 CET 2020 ^CInterrupted Traceback (most recent call last): File "/usr/bin/ceph", line 1263, in <module> retval = main() File "/usr/bin/ceph", line 1194, in main verbose) File "/usr/bin/ceph", line 619, in new_style_command ret, outbuf, outs = do_command(parsed_args, target, cmdargs, sigdict, inbuf, verbose) File "/usr/bin/ceph", line 593, in do_command return ret, '', '' UnboundLocalError: local variable 'ret' referenced before assignment real 14m29,097s user 0m0,579s sys 0m0,157s In correlation with this finding I have identified that active MGR node is using +100% CPU, to be pricise 108-120%. To workaround this issue I must stop the active MRG node service and wait until another node becomes active. What's the issue with MGR service here? Should I open a bug report? Regards _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx