Hi, On 2019-11-20 15:55, thoralf schulze wrote:
hi, we were able to track this down to the auto balancer: disabling the auto balancer and cleaning out old (and probably not very meaningful) upmap-entries via ceph osd rm-pg-upmap-items brought back stable mgr daemons and an usable dashboard.
I can confirm that, in our case I see this on a 14.2.4 cluster (which has started its life with an earlier Nautilus version, and developed this issue over the past weeks) and doing: ceph balancer off has been sufficient to make the mgrs stable again (i.e. I left the upmap-items in place). Interestingly, we did not see this with Luminous or Mimic on different clusters (which however have a more stable number of OSDs). @devs: If there's any more info needed to track this down, please let us know. Cheers, Oliver
the not-so-sensible upmap-entries might or might not have been caused by us updating from mimic to nautilus - it's too late to debug this now. this seems to be consistent with bryan stillwell's findings ("mgr hangs with upmap balancer"). thank you very much & with kind regards, thoralf. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com