Re: [ext] Recover from "Module 'progress' has failed"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I found a solution here:
https://www.reddit.com/r/ceph/comments/ut9lag/recover_from_module_progress_has_failed/

Turns out, you can just fail over the MGR and it will reset the progress 
module: `ceph mgr fail`

Now the cluster is healthy and can be upgraded to a version where this 
issue is fixed.


On 5/5/2022 12:49 PM, Kuhring, Mathias wrote:
> Dear Ceph community,
>
> We are having an issue with the MGR progress module:
>
>       Module 'progress' has failed: ('e7fb29e3-9caf-4b20-b930-cee8474526bb',)
>
> We are currently on ceph version 16.2.7
> (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable).
>
> I'm aware that there are already issues and a fix, which I assume
> corresponds to our problem:
>       https://tracker.ceph.com/issues/54267
>       https://tracker.ceph.com/issues/53803
>       https://github.com/ceph/ceph/pull/44672
>
> But I'm unsure how and when to apply this fix.
>
> First, there hasn't been a pacific release since version 16.2.7 (which
> we are already on).
> So would I need to update using a git branch based image like this?
>
>       ceph orch upgrade start --image quay.io/ceph-ci/ceph:pacific
>
> Second, do I actually need to upgrade right away?
> Or can I somehow first reset the progress module or the events dictionary,
> which leads to the race condition (missing event) mentioned in the fix.
>
> In particular since it is highly recommended to upgrade in a healthy state,
> I would be hoping to fix/reset this issue temporarily before applying
> the upgrade.
>
> I tried resetting progress as follows without success:
>
>       0|0[root@osd-1 ~]# ceph progress
>       Error EIO: Module 'progress' has experienced an error and cannot
> handle commands: ('e7fb29e3-9caf-4b20-b930-cee8474526bb',)
>       0|5[root@osd-1 ~]# ceph progress clear
>       Error EIO: Module 'progress' has experienced an error and cannot
> handle commands: ('e7fb29e3-9caf-4b20-b930-cee8474526bb',)
>       0|5[root@osd-1 ~]# ceph progress off
>       Error EIO: Module 'progress' has experienced an error and cannot
> handle commands: ('e7fb29e3-9caf-4b20-b930-cee8474526bb',)
>       0|5[root@osd-1 ~]# ceph mgr module disable progress
>       Error EINVAL: module 'progress' cannot be disabled (always-on)
>
> Is there any other way to get rid of this event?
>
> Thank you very much for your input.
>
> Best, Mathias
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
Mathias Kuhring

Dr. rer. nat.
Bioinformatician
HPC & Core Unit Bioinformatics
Berlin Institute of Health at Charité (BIH)

E-Mail:  mathias.kuhring@xxxxxxxxxxxxxx
Mobile: +49 172 3475576

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux