Hi,
last week we successfully upgraded from Nautilus to Pacific, and since
today I'm experiencing failing MGR daemons. The pods are still running
but stopped logging. The standby MGRs take over until all MGRs become
unresponsive, we currently have three MGRs. I'm not sure if [1] is the
exact thing I'm facing here but it looks like a deadlock to me. I
commented the tracker issue but since it's been marked as resolved I'm
not sure if anybody will read my comment. I noticed the same (also
today) in a customer cluster upgraded from Octopus to Pacific about
two months ago (16.2.9). The only thing I did in those clusters today
was to browse the dashboard to compare log settings.
I read somewhere that the prometheus module could play a role in this,
but it's not enabled in our cluster (while it is running in the
customer cluster).
Please let me know if you need more information on this.
Thanks,
Eugen
Our current versions are:
ceph01:~ # ceph versions
{
"mon": {
"ceph version 16.2.10
(45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)": 3
},
"mgr": {
"ceph version 16.2.10
(45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)": 2
},
"osd": {
"ceph version 16.2.10
(45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)": 35
},
"mds": {
"ceph version 16.2.10
(45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)": 3
},
"rgw": {
"ceph version 16.2.10
(45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)": 1
},
"overall": {
"ceph version 16.2.10
(45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)": 44
}
}
[1] https://tracker.ceph.com/issues/55687
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx