On Wed, 17 Oct 2018, Gregory Farnum wrote: > I was given a cluster recently where they'd managed to enable upmap > despite clients not supporting it, and then on removing the upmap > features the clients were triggering monitor crashes. > > While trying to establish what had happened, I discovered that most of > the controls in this area have become pretty stale or broken! > > So far as I can tell: > > * Admins can invoke "osd set -require-min-compat-client", and unless > you override it tries check against already-connected clients Right. The check against connected clients is imperfect, but better than nothing. The FeatureMap tracks entity_type -> features -> count on each mon, and those are shared by the lease ack messages to teh leader mon (who is the only one with a full view of all connected clients). > * but the check makes use of ceph_release_features() (which I've > recently noted to be a bit weak) and also only checks against the > entities which are connected to the local monitor making the change. IIRC the issue there is that there aren't actually any new client/cluster (i.e., features reflected in the kernel client) features that we've introduced in mimic, so it can't tell the two apart. We will have the msgr2 feature bit for nautilus, though, so I think mimic will be the only oddity here. > * As best I can tell, setting a min_compat_client does *not* actually > require newly-connected entities to have the required feature set! Correct. It's a guard against making cluster changes that will break the stated min_compat_client target, but doesn't actually lock out old clients in and of itself. We could change that, I suppose.. I can't rembmer if we had a broader discussion about it leading up to luminous? > * The OSDMonitor DOES require any connecting entity satisfy the > features demanded by OSDMap::get_features() (via > update_msgr_features(), called from OSDMonitor::update_from_paxos()) > * but this function has not been reliably updated? It adds > CEPH_FEATUREMASK_SERVER_KRAKEN for OSDs (but not others!) and is > configured to add CEPH_FEATUREMASK_CEPHX_V2 when appropriate, but has > nothing from luminous (like, say...UPMAP). I think there just weren't obvious OSDMap-specific features in luminous or mimic to add. There *are* crushmap features (the weight-sets), covered by if (crush->has_incompat_choose_args()) { features |= CEPH_FEATUREMASK_CRUSH_CHOOSE_ARGS; } at the top of the function. And in master there is the CEPHX_v2 requirement for osds if require_osd_release >= nautilus (doesn't affect clients). > Am I misunderstanding something here? Am I correct in thinking that we > just need to wire up require_min_compat_client to > OSDMap::get_features(), or is there another enforcement mechanism > we're looking for? It depends on the goal... if we want to gate actual client connections based on require_min_compat_client, then yeah, I think that would do it. Maybe that's a good idea based on the principle of least surprise (because a change other than changing min_compat_client won't affect clients' ability to connect). Which sounds pretty reasonable, when I think about it now! sage