Dear CephFS devs, We noticed a new warning about multi-MDS upgrades linked at the top of https://docs.ceph.com/en/latest/releases/pacific/#upgrading-from-octopus-or-nautilus (the relevant tracker and PR are at https://tracker.ceph.com/issues/53597 and https://github.com/ceph/ceph/pull/44335) Motivated by that issue, and by our operational experience, I'd like to propose standardizing a multi-MDS upgrade procedure which does not require decreasing to max_mds 1. 1. First, note that I'm aware that the current upgrade procedure is there because mds-to-mds comms are not versioned, so all MDSs need to run the same version. So I realize we cannot restart each MDS one by one. 2. Based on our operational experience, clusters that have a workload requiring several active MDS's can't easily decrease to 1 mds -- the single MDS can't handle the full metadata load, not to mention the extra export / import work needed to decrease from many to one MDS. (Our largest CephFS has 4 MDSs each with 100GB active cache and are all busy -- decreasing to 1 mds would be highly disruptive to our users). 3. With Patrick's agreement, when we upgraded that cluster from Nautilus to Octopus several months ago we *did not* decrease to a single active MDS: We upgraded the rpms on all MDSs, then stopped all actives and standbys, then started them all. The "downtime" was roughly equivalent to restarting a single MDS. This should be easily orchestratable via cephadm. What do you think about validating, testing, documenting (3) ? IMHO this would make large CephFS cluster upgrades much less scary! Cheers, Dan _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx