Slightly different, yes. It should be configurable (and not the default). On Thu, May 19, 2022 at 1:13 PM Sage Weil <sage@xxxxxxxxxxxx> wrote: > > This would mean implementing a different MDS upgrade/restart sequence in cephadm, right? > > On Thu, May 19, 2022, 9:47 AM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote: >> >> Hi Dan, >> >> On Wed, May 18, 2022 at 10:44 AM Dan van der Ster <dvanders@xxxxxxxxx> wrote: >> > >> > Dear CephFS devs, >> > >> > We noticed a new warning about multi-MDS upgrades linked at the top of >> > https://docs.ceph.com/en/latest/releases/pacific/#upgrading-from-octopus-or-nautilus >> > (the relevant tracker and PR are at >> > https://tracker.ceph.com/issues/53597 and >> > https://github.com/ceph/ceph/pull/44335) >> > >> > Motivated by that issue, and by our operational experience, I'd like >> > to propose standardizing a multi-MDS upgrade procedure which does not >> > require decreasing to max_mds 1. >> > >> > 1. First, note that I'm aware that the current upgrade procedure is >> > there because mds-to-mds comms are not versioned, so all MDSs need to >> > run the same version. So I realize we cannot restart each MDS one by >> > one. >> > >> > 2. Based on our operational experience, clusters that have a workload >> > requiring several active MDS's can't easily decrease to 1 mds -- the >> > single MDS can't handle the full metadata load, not to mention the >> > extra export / import work needed to decrease from many to one MDS. >> > (Our largest CephFS has 4 MDSs each with 100GB active cache and are >> > all busy -- decreasing to 1 mds would be highly disruptive to our >> > users). >> > >> > 3. With Patrick's agreement, when we upgraded that cluster from >> > Nautilus to Octopus several months ago we *did not* decrease to a >> > single active MDS: We upgraded the rpms on all MDSs, then stopped all >> > actives and standbys, then started them all. The "downtime" was >> > roughly equivalent to restarting a single MDS. This should be easily >> > orchestratable via cephadm. >> > >> > What do you think about validating, testing, documenting (3) ? >> > IMHO this would make large CephFS cluster upgrades much less scary! >> >> My only concern is that this upgrade procedure is not really tested at >> all (beyond your experience). We should of course make sure this is >> well tested in teuthology before it can be any kind of default upgrade >> behavior. I've created a ticket here: >> https://tracker.ceph.com/issues/55715 >> >> -- >> Patrick Donnelly, Ph.D. >> He / Him / His >> Principal Software Engineer >> Red Hat, Inc. >> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D >> >> _______________________________________________ >> Dev mailing list -- dev@xxxxxxx >> To unsubscribe send an email to dev-leave@xxxxxxx -- Patrick Donnelly, Ph.D. He / Him / His Principal Software Engineer Red Hat, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx