On multiple clusters we are seeing the mgr hang frequently when the balancer is enabled. It seems that the balancer is getting caught in some kind of infinite loop which chews up all the CPU for the mgr which causes problems with other modules like prometheus (we don't have the devicehealth module enabled yet). I've been able to reproduce the issue doing an offline balance as well using the osdmaptool: osdmaptool --debug-osd 10 osd.map --upmap balance-upmaps.sh --upmap-pool default.rgw.buckets.data --upmap-max 100 It seems to loop over the same group of PGs of ~7,000 PGs over and over again like this without finding any new upmaps that can be added: 2019-11-19 16:39:11.131518 7f85a156f300 10 trying 24.d91 2019-11-19 16:39:11.138035 7f85a156f300 10 trying 24.2e3c 2019-11-19 16:39:11.144162 7f85a156f300 10 trying 24.176b 2019-11-19 16:39:11.149671 7f85a156f300 10 trying 24.ac6 2019-11-19 16:39:11.155115 7f85a156f300 10 trying 24.2cb2 2019-11-19 16:39:11.160508 7f85a156f300 10 trying 24.129c 2019-11-19 16:39:11.166287 7f85a156f300 10 trying 24.181f 2019-11-19 16:39:11.171737 7f85a156f300 10 trying 24.3cb1 2019-11-19 16:39:11.177260 7f85a156f300 10 24.2177 already has pg_upmap_items [368,271] 2019-11-19 16:39:11.177268 7f85a156f300 10 trying 24.2177 2019-11-19 16:39:11.182590 7f85a156f300 10 trying 24.a4 2019-11-19 16:39:11.188053 7f85a156f300 10 trying 24.2583 2019-11-19 16:39:11.193545 7f85a156f300 10 24.93e already has pg_upmap_items [80,27] 2019-11-19 16:39:11.193553 7f85a156f300 10 trying 24.93e 2019-11-19 16:39:11.198858 7f85a156f300 10 trying 24.e67 2019-11-19 16:39:11.204224 7f85a156f300 10 trying 24.16d9 2019-11-19 16:39:11.209844 7f85a156f300 10 trying 24.11dc 2019-11-19 16:39:11.215303 7f85a156f300 10 trying 24.1f3d 2019-11-19 16:39:11.221074 7f85a156f300 10 trying 24.2a57 While this cluster is running Luminous (12.2.12), I've reproduced the loop using the same osdmap on Nautilus (14.2.4). Is there somewhere I can privately upload the osdmap for someone to troubleshoot the problem? Thanks, Bryan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx