Re: Alternate Multi-MDS Upgrade Procedure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This would mean implementing a different MDS upgrade/restart sequence in cephadm, right?

On Thu, May 19, 2022, 9:47 AM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
Hi Dan,

On Wed, May 18, 2022 at 10:44 AM Dan van der Ster <dvanders@xxxxxxxxx> wrote:
>
> Dear CephFS devs,
>
> We noticed a new warning about multi-MDS upgrades linked at the top of
> https://docs.ceph.com/en/latest/releases/pacific/#upgrading-from-octopus-or-nautilus
> (the relevant tracker and PR are at
> https://tracker.ceph.com/issues/53597 and
> https://github.com/ceph/ceph/pull/44335)
>
> Motivated by that issue, and by our operational experience, I'd like
> to propose standardizing a multi-MDS upgrade procedure which does not
> require decreasing to max_mds 1.
>
> 1. First, note that I'm aware that the current upgrade procedure is
> there because mds-to-mds comms are not versioned, so all MDSs need to
> run the same version. So I realize we cannot restart each MDS one by
> one.
>
> 2. Based on our operational experience, clusters that have a workload
> requiring several active MDS's can't easily decrease to 1 mds -- the
> single MDS can't handle the full metadata load, not to mention the
> extra export / import work needed to decrease from many to one MDS.
> (Our largest CephFS has 4 MDSs each with 100GB active cache and are
> all busy -- decreasing to 1 mds would be highly disruptive to our
> users).
>
> 3. With Patrick's agreement, when we upgraded that cluster from
> Nautilus to Octopus several months ago we *did not* decrease to a
> single active MDS: We upgraded the rpms on all MDSs, then stopped all
> actives and standbys, then started them all. The "downtime" was
> roughly equivalent to restarting a single MDS. This should be easily
> orchestratable via cephadm.
>
> What do you think about validating, testing, documenting (3) ?
> IMHO this would make large CephFS cluster upgrades much less scary!

My only concern is that this upgrade procedure is not really tested at
all  (beyond your experience). We should of course make sure this is
well tested in teuthology before it can be any kind of default upgrade
behavior. I've created a ticket here:
https://tracker.ceph.com/issues/55715

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux