On Tue, 2019-01-15 at 02:53 +0000, Sage Weil wrote: > On Mon, 14 Jan 2019, Jeff Layton wrote: > > We've hit a problem in rook recently: > > > > -------------------8<-------------------- > > 2019-01-14 20:39:00.564293 I | exec: Running command: ceph auth get-or-create-key mgr.a mon allow profile mgr mds allow * osd allow * --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json --out-file /tmp/836429485 > > 2019-01-14 20:39:01.238342 I | exec: Running command: ceph mgr module enable orchestrator_cli --force --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json --out-file /tmp/203752488 > > 2019-01-14 20:39:02.068475 I | exec: Running command: ceph mgr module enable rook --force --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json --out-file /tmp/700464999 > > 2019-01-14 20:39:05.448569 I | exec: Running command: ceph orchestrator set backend rook --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json --out-file /tmp/837955994 > > 2019-01-14 20:39:06.177648 I | exec: no valid command found; 10 closest matches: > > osd ls {<int[0-]>} > > osd getmap {<int[0-]>} > > osd tree {<int[0-]>} {up|down|in|out|destroyed [up|down|in|out|destroyed...]} > > osd tree-from {<int[0-]>} <bucket> {up|down|in|out|destroyed [up|down|in|out|destroyed...]} > > osd stat > > osd dump {<int[0-]>} > > mon feature set <feature_name> {--yes-i-really-mean-it} > > mon set-rank <name> <int> > > mon remove <name> > > mon feature ls {--with-value} > > Error EINVAL: invalid command > > 2019-01-14 20:39:06.177959 E | op-mgr: failed to enable orchestrator modules. failed to set rook as the orchestrator backend. exit status 22 > > 2019-01-14 20:39:06.178104 I | exec: Running command: ceph mgr module enable prometheus --force --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json --out-file /tmp/330007089 > > 2019-01-14 20:39:07.111348 I | exec: Running command: ceph mgr module enable dashboard --force --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json --out-file /tmp/595433948 > > 2019-01-14 20:39:14.528265 I | exec: Running command: ceph dashboard create-self-signed-cert --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json --out-file /tmp/463039371 > > 2019-01-14 20:39:14.928036 I | exec: no valid command found; 10 closest matches: > > osd ls {<int[0-]>} > > osd getmap {<int[0-]>} > > osd tree {<int[0-]>} {up|down|in|out|destroyed [up|down|in|out|destroyed...]} > > osd tree-from {<int[0-]>} <bucket> {up|down|in|out|destroyed [up|down|in|out|destroyed...]} > > osd stat > > osd dump {<int[0-]>} > > mon feature set <feature_name> {--yes-i-really-mean-it} > > mon set-rank <name> <int> > > mon remove <name> > > mon feature ls {--with-value} > > Error EINVAL: invalid command > > -------------------8<-------------------- > > > > I came back after the fact and these modules were activated. I think the > > "mgr module enable" command is returning too quickly, before the module > > gets fully plugged in. > > > > We can work around this to some degree by expecting this and retrying, > > but it would be nice if there were some way to have those enable > > commands not return until the job is done. > > > > Is that possible? > > It's possible, but I'm not sure making the CLI command block is the best > idea, since it could wait indefinitely for the mgr daemon to restart (or > for one to start at all). > > Instead, I think the caller in this case can do a loop like > > while ! ceph mgr dump | jq '.available_modules' | grep rook ; do sleep 5 ; done > > (or something more elegant with jq). > Thanks Sage. That's sort of nasty, but I guess we can live with it. -- Jeff Layton <jlayton@xxxxxxxxxx>