On Wed, Oct 17, 2018 at 10:47 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > I was given a cluster recently where they'd managed to enable upmap > despite clients not supporting it, and then on removing the upmap > features the clients were triggering monitor crashes. > > While trying to establish what had happened, I discovered that most of > the controls in this area have become pretty stale or broken! > > So far as I can tell: > > * Admins can invoke "osd set -require-min-compat-client", and unless > you override it tries check against already-connected clients > * but the check makes use of ceph_release_features() (which I've > recently noted to be a bit weak) and also only checks against the > entities which are connected to the local monitor making the change. > > * As best I can tell, setting a min_compat_client does *not* actually > require newly-connected entities to have the required feature set! > * The OSDMonitor DOES require any connecting entity satisfy the > features demanded by OSDMap::get_features() (via > update_msgr_features(), called from OSDMonitor::update_from_paxos()) > * but this function has not been reliably updated? It adds > CEPH_FEATUREMASK_SERVER_KRAKEN for OSDs (but not others!) and is > configured to add CEPH_FEATUREMASK_CEPHX_V2 when appropriate, but has > nothing from luminous (like, say...UPMAP). > > Am I misunderstanding something here? Am I correct in thinking that we > just need to wire up require_min_compat_client to > OSDMap::get_features(), or is there another enforcement mechanism > we're looking for? > -Greg I think it depends on whether you view min-compat-client as something that just takes away some of the power to shoot yourself in the foot by disallowing certain commands or an actual enforcement mechanism. I've always understood it to be the former. If you just set it to luminous, older clients can still connect because you haven't actually enabled any incompatible features. Once you do (e.g. install an upmap exception), older clients are denied handshake. However, if you run "osd set-require-min-compat-client" with no clients connected (or when all connected clients happen to meet the new feature mask), there is nothing stopping you from connecting a bunch of older clients later and then accidentally enabling upmap on them. Because existing connections are never cut, we end up in a state where a now incompatible client can continue talking to the cluster, but can't open any new connections. This is a long standing problem and we've run into it before luminous. AFAIK min-compat-client wasn't intended to be the solution; it was more of a last minute usability improvement. I don't think making "osd set-require-min-compat-client REL" require all feature bits of REL from clients is the right move. While not so in the luminous case, we've been relying on being able to cherry pick individual feature bits in the kernel client for a long time. Thanks, Ilya