I found a solution here: https://www.reddit.com/r/ceph/comments/ut9lag/recover_from_module_progress_has_failed/ Turns out, you can just fail over the MGR and it will reset the progress module: `ceph mgr fail` Now the cluster is healthy and can be upgraded to a version where this issue is fixed. On 5/5/2022 12:49 PM, Kuhring, Mathias wrote: > Dear Ceph community, > > We are having an issue with the MGR progress module: > > Module 'progress' has failed: ('e7fb29e3-9caf-4b20-b930-cee8474526bb',) > > We are currently on ceph version 16.2.7 > (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable). > > I'm aware that there are already issues and a fix, which I assume > corresponds to our problem: > https://tracker.ceph.com/issues/54267 > https://tracker.ceph.com/issues/53803 > https://github.com/ceph/ceph/pull/44672 > > But I'm unsure how and when to apply this fix. > > First, there hasn't been a pacific release since version 16.2.7 (which > we are already on). > So would I need to update using a git branch based image like this? > > ceph orch upgrade start --image quay.io/ceph-ci/ceph:pacific > > Second, do I actually need to upgrade right away? > Or can I somehow first reset the progress module or the events dictionary, > which leads to the race condition (missing event) mentioned in the fix. > > In particular since it is highly recommended to upgrade in a healthy state, > I would be hoping to fix/reset this issue temporarily before applying > the upgrade. > > I tried resetting progress as follows without success: > > 0|0[root@osd-1 ~]# ceph progress > Error EIO: Module 'progress' has experienced an error and cannot > handle commands: ('e7fb29e3-9caf-4b20-b930-cee8474526bb',) > 0|5[root@osd-1 ~]# ceph progress clear > Error EIO: Module 'progress' has experienced an error and cannot > handle commands: ('e7fb29e3-9caf-4b20-b930-cee8474526bb',) > 0|5[root@osd-1 ~]# ceph progress off > Error EIO: Module 'progress' has experienced an error and cannot > handle commands: ('e7fb29e3-9caf-4b20-b930-cee8474526bb',) > 0|5[root@osd-1 ~]# ceph mgr module disable progress > Error EINVAL: module 'progress' cannot be disabled (always-on) > > Is there any other way to get rid of this event? > > Thank you very much for your input. > > Best, Mathias > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Mathias Kuhring Dr. rer. nat. Bioinformatician HPC & Core Unit Bioinformatics Berlin Institute of Health at Charité (BIH) E-Mail: mathias.kuhring@xxxxxxxxxxxxxx Mobile: +49 172 3475576 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx