Re: OSD crashes during upgrade mimic->octopus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/6/22 17:22, Frank Schilder wrote:


Just for the general audience. In the past we did cluster maintenance by setting "ceph fs set FS down true" (freezing all client IO in D-state), waited for all MDSes becoming standby and doing the job. After that, we set "ceph fs set FS down false", the MDSes started again, all clients connected more or less instantaneously and continued exactly at the point where they were frozen.

This time, a huge number of clients just crashed instead of freezing and of the few ones that remained up only a small number reconnected. This is in our experience very unusual behaviour. Was there a change or are we looking at a potential bug here?

There is a strict MDS maintenance dance you have to perform [1]. In order to avoid MDS committing suicide for example. We would just have the last remaining "up:active" MDS restart. And as soon as it became up:active again all clients would reconnect virtually instantly. Even if rejoining had taken 3.5 minutes. Especially for MDSes I would not deviate from best practices.

Gr. Stefan

[1]: https://docs.ceph.com/en/octopus/cephfs/upgrading/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux