Re: configuring mgr modules while mgr is still initializing

Sage Weil <sweil@xxxxxxxxxx> · Wed, 27 Feb 2019 16:47:04 +0000 (UTC)

On Wed, 27 Feb 2019, Jeff Layton wrote:
> On Thu, 2019-02-21 at 16:19 +1100, Tim Serong wrote:
> > On 02/21/2019 01:37 AM, Sage Weil wrote:
> > > Hi Tim,
> > > 
> > > On Wed, 20 Feb 2019, Tim Serong wrote:
> > > > Hi Sebastian, Juan,
> > > > 
> > > > To follow up from the orchestrator call today, I've got DeepSea
> > > > configuring the deepsea orchestrator module during cluster deployment,
> > > > immediately after the mgr daemons themselves are started.  The problem I
> > > > had was that mgr doesn't know about the available modules yet until a
> > > > few seconds after it starts up for the first time.
> > > > 
> > > > DeepSea is (effectively) doing approximately this during deployment:
> > > > 
> > > > # systemctl start ceph-mgr@$(hostname)
> > > > # ceph mgr module enable orchestrator_cli
> > > > # ceph mgr module enable deepsea
> > > > # ceph orchestrator set backend deepsea
> > > > # ceph deepsea config-set salt_api_url $URL
> > > > # ceph deepsea config-set salt_api_username $USERNAME
> > > > # ceph deepsea config-set salt_api_password $PASSWORD
> > > > 
> > > > The `ceph mgr module enable` invocations would immediately fail with
> > > > "all mgr daemons do not support module [...], pass --force to force
> > > > enablement", and the subsequent module commands would fail because the
> > > > modules weren't enabled.
> > > > 
> > > > We can try using --force, e.g.:
> > > > 
> > > > # ceph mgr module enable orchestrator_cli --force
> > > > 
> > > > But then the module still isn't loaded quickly enough at this point
> > > > (remember, it's only a fraction of a second after ceph-mgr starts for
> > > > the first time), so the subsequent "ceph orchestrator set backend
> > > > deepsea" and "ceph deepsea config-set" commands will still fail with "no
> > > > valid command".
> > > > 
> > > > I've got two ways around this.  One is to wait until mgr is reported as
> > > > being available:
> > > > 
> > > > # systemctl start ceph-mgr@$(hostname)
> > > > # while [ "$(ceph mgr dump | jq '.available')" != "true" ] ; \
> > > >         do echo sleeping 1>&2 ; sleep 1 ; done
> > > > # ceph mgr module enable orchestrator_cli
> > > > # ceph mgr module enable deepsea
> > > > # ceph orchestrator set backend deepsea
> > > > # ceph deepsea config-set salt_api_url $URL
> > > > # ceph deepsea config-set salt_api_username $USERNAME
> > > > # ceph deepsea config-set salt_api_password $PASSWORD
> > > > 
> > > > This works fine (it takes about 4 seconds in my dev/test environment for
> > > > mgr to become available), but of course really needs to be tweaked to
> > > > break out of that loop if mgr never becomes available in a reasonable time.
> > > > 
> > > > Another option is to cheat a bit and force the modules to load, then
> > > > write config keys directly:
> > > > 
> > > > # ceph mgr module enable orchestrator_cli --force:
> > > > # ceph mgr module enable deepsea --force:
> > > > # ceph config-key set \
> > > >       config/mgr/mgr/orchestrator_cli/orchestrator deepsea
> > > > # ceph config-key set config/mgr/mgr/deepsea/salt_api_url $URL
> > > > # ceph config-key set config/mgr/mgr/deepsea/salt_api_username $USERNAME
> > > > # ceph config-key set config/mgr/mgr/deepsea/salt_api_password $PASSWORD
> > > > 
> > > > This works too, no irritating loop is necessary, but then DeepSea, an
> > > > external deployment tool, suddenly knows internal details of both the
> > > > orchestrator CLI module and deepsea module (i.e. the config keys), so
> > > > those pieces are no longer black boxes anymore, and if they change in
> > > > Ceph upstream, DeepSea's deployment breaks.
> > > > 
> > > > What's preferable here?  (I'm leaning towards the loop option)  Am I
> > > > missing any other options?
> > > 
> > > Jeff Layton ran into the same thing a few weeks back.  We went with 
> > > the loop for the time being.
> > 
> > Thanks Sage, I'll run with that too immediately.
> > 
> > > I think the other two options are:
> > > 
> > > 1. Make 'ceph config set ...' block if the mgr is not available and it is 
> > > a mgr option.  This seems less than ideal (wouldn't expect that command to 
> > > fail) and is probably also racy, since the mgr restarts itself after teh 
> > > module enable command but the mon doesn't realize that right away.
> > 
> > Yeah, that's possibly more trouble than it's worth.
> > 
> > > 2. Make a built-in wait loop (ceph mgr wait-until-available [timeout]) 
> > > command that's coded into the CLI, so that every user doesn't have to do 
> > > this.
> > 
> > That sounds good to me.
> > 
> 
> For the record, the problem I hit was slightly different.
> 
> We were enabling some of the orchestrator modules that register new cli
> commands, and immediately trying to feed them the expected commands
> after the "mgr module enable" command returned. It takes a few seconds
> for the module to get plugged in fully, so those often fail to be
> recognized immediately.

The config options and CLI commands are both propagated to the mon from 
the just-started mgr module at the same time (and then included in the 
mgrmap), so the same solution applies at least.

> A better solution than looping would sure be nice, but I'm not sure
> "ceph mgr wait-until-available" is sufficient. The mgr would probably be
> "available" even if not all of its modules were finished initialization,
> right?
> 
> We would want that command to not just tell us whether the mgr is
> "available" but that it has completed initializing all of the modules
> that are enabled too.

Good point... it's also racy in that you might restart, and then run 
the wait-until-available command before we realize the old daemon is gone 
and dead.  Maybe what we really want is

 ceph mgr wait-for-module <module>

which loops until that module is available according to the mgrmap.  That 
will work after enabling a new module or at cluster create time when 
we're just waiting for the first mgr to start with the initial modules.

s