Alternate Multi-MDS Upgrade Procedure

Dan van der Ster <dvanders@xxxxxxxxx> · Wed, 18 May 2022 16:43:33 +0200

Dear CephFS devs,

We noticed a new warning about multi-MDS upgrades linked at the top of
https://docs.ceph.com/en/latest/releases/pacific/#upgrading-from-octopus-or-nautilus
(the relevant tracker and PR are at
https://tracker.ceph.com/issues/53597 and
https://github.com/ceph/ceph/pull/44335)

Motivated by that issue, and by our operational experience, I'd like
to propose standardizing a multi-MDS upgrade procedure which does not
require decreasing to max_mds 1.

1. First, note that I'm aware that the current upgrade procedure is
there because mds-to-mds comms are not versioned, so all MDSs need to
run the same version. So I realize we cannot restart each MDS one by
one.

2. Based on our operational experience, clusters that have a workload
requiring several active MDS's can't easily decrease to 1 mds -- the
single MDS can't handle the full metadata load, not to mention the
extra export / import work needed to decrease from many to one MDS.
(Our largest CephFS has 4 MDSs each with 100GB active cache and are
all busy -- decreasing to 1 mds would be highly disruptive to our
users).

3. With Patrick's agreement, when we upgraded that cluster from
Nautilus to Octopus several months ago we *did not* decrease to a
single active MDS: We upgraded the rpms on all MDSs, then stopped all
actives and standbys, then started them all. The "downtime" was
roughly equivalent to restarting a single MDS. This should be easily
orchestratable via cephadm.

What do you think about validating, testing, documenting (3) ?
IMHO this would make large CephFS cluster upgrades much less scary!

Cheers, Dan
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx