Hi Dan and Etienne, that's interesting. moving this to dev@ Am 22.09.21 um 12:26 schrieb Dan van der Ster:
Yeah you don't want to deal with backfilling while the cluster is upgrading. At best it can delay the upgrade, at worst mixed version backfilling has (rarely) caused issues in the past. We additionally `set noin` and disable the balancer: `ceph balancer off`. The former prevents broken osds from re-entering the cluster, and both of these similarly prevent backfilling from starting mid-upgrade.
From my point of view, I'd rather keep the safety net (in this case the balancer) enabled in order to avoid having to deal with unbalanced OSDs, in case the upgrade takes longer, even if that means some data movement during an upgrade.
Especially for the big majority of the clusters out there, I prefer to avoid having full OSDs. Is that reasonable?
.. Dan On Wed, 22 Sep 2021, 12:18 Etienne Menguy, <etienne.menguy@xxxxxxxx> wrote:Hello, From my experience, I see three reasons : - You don’t want to recover data if you already have them on a down OSD, rebalancing can have a big impact on performance - If upgrade/maintenance goes wrong you will want to focus on this issue and not have to deal with things done by Ceph meanwhile. - During an upgrade you have an ‘unusual’ cluster with different versions. It’s supposed to work, but you probably want to keep it ‘boring’. - Etienne Menguy etienne.menguy@xxxxxxxxOn 22 Sep 2021, at 11:51, Francois Legrand <fleg@xxxxxxxxxxxxxx> wrote: Hello everybody, I have a "stupid" question. Why is it recommended in the docs to set theosd flag to noout during an upgrade/maintainance (and especially during an osd upgrade/maintainance) ?In my understanding, if an osd goes down, after a while (600s bydefault) it's marked out and the cluster will start to rebuild it's content elsewhere in the cluster to maintain the redondancy of the datas. This generate some transfer and load on other osds, but that's not a big deal !As soon as the osd is back, it's marked in again and ceph is able todetermine which data is back and stop the recovery to reuse the unchanged datas which are back. Generally, the recovery is as fast as with noout flag (because with noout, the data modified during the down period still have be copied to the back osd).Thus is there an other reason apart from limiting the transfer andothers osds load durind the downtime ?F _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx