Re: MDS not becoming active after migrating to cephadm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



By saying upgrade, I mean upgrade from the non-dockerized 16.2.5 to cephadm version 16.2.6. So I think you need to disable standby-replay and reduce the number of ranks to 1, then stop all the non-dockerized mds, deploy new mds with cephadm. Only scaling back up after you finish the migration. Did you also tried that?

In fact, similar issue has been reported several times on this list when upgrade mds to 16.2.6, e.g. [1]. I have faced that too. So I’m pretty confident that you are facing the same issue.

[1]: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/KQ5A5OWRIUEOJBC7VILBGDIKPQGJQIWN/

在 2021年10月4日,19:00,Petr Belyaev <p.belyaev@xxxxxxxxx> 写道:

 Hi Weiwen,

Yes, we did that during the upgrade. In fact, we did that multiple times even after the upgrade to see if it will resolve the issue (disabling hot standby, scaling everything down to a single MDS, swapping it with the new one, scaling back up).

The upgrade itself went fine, problems started during the migration to cephadm (which was done after migrating everything to Pacific).
It only occurs when using dockerized MDS. Non-dockerized MDS nodes, also Pacific, everything runs fine.

Petr

On 4 Oct 2021, at 12:43, 胡 玮文 <huww98@xxxxxxxxxxx<mailto:huww98@xxxxxxxxxxx>> wrote:

Hi Petr,

Please read https://docs.ceph.com/en/latest/cephfs/upgrading/ for MDS upgrade procedure.

In short, when upgrading to 16.2.6, you need to disable standby-replay and reduce the number of ranks to 1.

Weiwen Hu

从 Windows 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>发送

发件人: Petr Belyaev<mailto:p.belyaev@xxxxxxxxx>
发送时间: 2021年10月4日 18:00
收件人: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
主题:  MDS not becoming active after migrating to cephadm

Hi,

We’ve recently upgraded from Nautilus to Pacific, and tried moving our services to cephadm/ceph orch.
For some reason, MDS nodes deployed through orch never become active (or at least standby-replay). Non-dockerized MDS nodes can still be deployed and work fine. Non-dockerized mds version is 16.2.6, docker image version is 16.2.5-387-g7282d81d (came as a default).

In the MDS log, the only related message is monitors assigning MDS as standby. Increasing the log level does not help much, it only adds beacon messages.
Monitor log also contains no differences compared to a non-dockerized MDS startup.
Mds metadata command output is identical to that of a non-dockerized MDS.

The only difference I can see in the log is the value in curly braces after the node name, e.g. mds.storage{0:1234ff}. For dockerized MDS, the first value is ffffffff, for non-dockerized it’s zero. Compat flags are identical.

Could someone please advise me why the dockerized MDS is being stuck as a standby? Maybe some config values missing or smth?

Best regards,
Petr
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux