Re: client feature requirements appear very broken

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 17, 2018 at 10:47 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
> I was given a cluster recently where they'd managed to enable upmap
> despite clients not supporting it, and then on removing the upmap
> features the clients were triggering monitor crashes.
>
> While trying to establish what had happened, I discovered that most of
> the controls in this area have become pretty stale or broken!
>
> So far as I can tell:
>
> * Admins can invoke "osd set -require-min-compat-client", and unless
> you override it tries check against already-connected clients
>   * but the check makes use of ceph_release_features() (which I've
> recently noted to be a bit weak) and also only checks against the
> entities which are connected to the local monitor making the change.
>
> * As best I can tell, setting a min_compat_client does *not* actually
> require newly-connected entities to have the required feature set!
> * The OSDMonitor DOES require any connecting entity satisfy the
> features demanded by OSDMap::get_features() (via
> update_msgr_features(), called from OSDMonitor::update_from_paxos())
>   * but this function has not been reliably updated? It adds
> CEPH_FEATUREMASK_SERVER_KRAKEN for OSDs (but not others!) and is
> configured to add CEPH_FEATUREMASK_CEPHX_V2 when appropriate, but has
> nothing from luminous (like, say...UPMAP).
>
> Am I misunderstanding something here? Am I correct in thinking that we
> just need to wire up require_min_compat_client to
> OSDMap::get_features(), or is there another enforcement mechanism
> we're looking for?
> -Greg

I think it depends on whether you view min-compat-client as something
that just takes away some of the power to shoot yourself in the foot by
disallowing certain commands or an actual enforcement mechanism.

I've always understood it to be the former.  If you just set it to
luminous, older clients can still connect because you haven't actually
enabled any incompatible features.  Once you do (e.g. install an upmap
exception), older clients are denied handshake.

However, if you run "osd set-require-min-compat-client" with no clients
connected (or when all connected clients happen to meet the new feature
mask), there is nothing stopping you from connecting a bunch of older
clients later and then accidentally enabling upmap on them.  Because
existing connections are never cut, we end up in a state where a now
incompatible client can continue talking to the cluster, but can't open
any new connections.

This is a long standing problem and we've run into it before luminous.
AFAIK min-compat-client wasn't intended to be the solution; it was more
of a last minute usability improvement.

I don't think making "osd set-require-min-compat-client REL" require
all feature bits of REL from clients is the right move.  While not so
in the luminous case, we've been relying on being able to cherry pick
individual feature bits in the kernel client for a long time.

Thanks,

                Ilya



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux