Hi Thoralf, there have been several reports about Ceph mgr modules (not just the dashboard) experiencing hangs and freezes recently. The thread "mgr daemons becoming unresponsive" might give you some additional insight. Is the "device health metrics" module enabled on your cluster? Could you try disabling it to see if that fixes the issue? Lenz On 11/13/19 4:01 PM, thoralf schulze wrote: > the dashboard of our moderatly used cluster with 3 mon/mgr-nodes gets > stuck about 30 seconds after a mgr becomes active. the dashboard is not > usable anymore (ie: the mgr damon does not respond to http requests > anymore), although it comes back from the dead occasionally for a few > seconds. the same happens to the prometheus module: grafana only shows a > few data points here and there. > > other mgr-related stuff (eg., ceph pg dump) continues to work just fine. > forcing a switchover to another mgr or enabling / disabling mgr modules > helps for a short while, until the whole gets stuck again. > > a mgr log with debugging enabled for both mgr and mgrc at level 20 can > be found at > http://www.user.tu-berlin.de/thoralf.schulze/ceph-mgr-2019113.log.xz - > in this case, the hang occurred shortly before 14:55. > > any hints would be greatly appreciated … > > thank you very much & with kind regards, -- SUSE Software Solutions Germany GmbH - Maxfeldstr. 5 - 90409 Nuernberg GF: Felix Imendörffer, HRB 36809 (AG Nürnberg)
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com