Re: client feature requirements appear very broken

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 17 Oct 2018, Gregory Farnum wrote:
> I was given a cluster recently where they'd managed to enable upmap
> despite clients not supporting it, and then on removing the upmap
> features the clients were triggering monitor crashes.
> 
> While trying to establish what had happened, I discovered that most of
> the controls in this area have become pretty stale or broken!
> 
> So far as I can tell:
> 
> * Admins can invoke "osd set -require-min-compat-client", and unless
> you override it tries check against already-connected clients

Right. The check against connected clients is imperfect, but better than 
nothing.  The FeatureMap tracks entity_type -> features -> count on each 
mon, and those are shared by the lease ack messages to teh leader 
mon (who is the only one with a full view of all connected clients).

>   * but the check makes use of ceph_release_features() (which I've
> recently noted to be a bit weak) and also only checks against the
> entities which are connected to the local monitor making the change.

IIRC the issue there is that there aren't actually any new client/cluster 
(i.e., features reflected in the kernel client) features that we've 
introduced in mimic, so it can't tell the two apart.  We will have the 
msgr2 feature bit for nautilus, though, so I think mimic will be the only 
oddity here.

> * As best I can tell, setting a min_compat_client does *not* actually
> require newly-connected entities to have the required feature set!

Correct.  It's a guard against making cluster changes that will break the 
stated min_compat_client target, but doesn't actually lock out old clients 
in and of itself.  We could change that, I suppose.. I can't rembmer if we 
had a broader discussion about it leading up to luminous?

> * The OSDMonitor DOES require any connecting entity satisfy the
> features demanded by OSDMap::get_features() (via
> update_msgr_features(), called from OSDMonitor::update_from_paxos())
>   * but this function has not been reliably updated? It adds
> CEPH_FEATUREMASK_SERVER_KRAKEN for OSDs (but not others!) and is
> configured to add CEPH_FEATUREMASK_CEPHX_V2 when appropriate, but has
> nothing from luminous (like, say...UPMAP).

I think there just weren't obvious OSDMap-specific features in luminous or 
mimic to add.  There *are* crushmap features (the weight-sets), covered by

  if (crush->has_incompat_choose_args()) {
    features |= CEPH_FEATUREMASK_CRUSH_CHOOSE_ARGS;
  }

at the top of the function.  And in master there is the CEPHX_v2 
requirement for osds if require_osd_release >= nautilus (doesn't affect 
clients).


> Am I misunderstanding something here? Am I correct in thinking that we
> just need to wire up require_min_compat_client to
> OSDMap::get_features(), or is there another enforcement mechanism
> we're looking for?

It depends on the goal... if we want to gate actual client connections 
based on require_min_compat_client, then yeah, I think that would do it.  
Maybe that's a good idea based on the principle of least surprise (because 
a change other than changing min_compat_client won't affect clients' 
ability to connect).

Which sounds pretty reasonable, when I think about it now!

sage



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux