Re: Cannot Create MGR

Georgios Dimitrakakis <giorgis@xxxxxxxxxxxx> · Wed, 28 Feb 2018 21:49:58 +0200

I am still trying to figure what is the problem here...

Initially the cluster was updated ok...

# ceph health detail
HEALTH_WARN noout flag(s) set; all OSDs are running luminous or later 
but require_osd_release < luminous; no active mgr
noout flag(s) set
all OSDs are running luminous or later but require_osd_release < 
luminous

While I removed the "noout" flag and set the "ceph osd 
require-osd-release luminous" I had on another window and was following 
the status

# ceph -w
  cluster:
    id:     d357a551-5b7a-4501-8d8f-009c63b2c972
    health: HEALTH_WARN
            all OSDs are running luminous or later but 
require_osd_release < luminous
            no active mgr

  services:
    mon: 1 daemons, quorum nefelus-controller
    mgr: no daemons active
    osd: 2 osds: 2 up, 2 in

  data:
    pools:   11 pools, 152 pgs
    objects: 9754 objects, 33754 MB
    usage:   67495 MB used, 3648 GB / 3714 GB avail
    pgs:     152 active+clean

2018-02-28 19:03:20.105027 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:03:24.101868 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:03:25.103605 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:03:29.815572 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
2671 B/s rd, 89 op/s
2018-02-28 19:03:34.105263 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
4472 B/s rd, 240 op/s
2018-02-28 19:03:35.108174 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
9020 B/s rd, 538 op/s
2018-02-28 19:03:39.104781 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
7598 B/s rd, 453 op/s
2018-02-28 19:03:40.108741 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
9020 B/s rd, 538 op/s
2018-02-28 19:03:44.105574 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
7598 B/s rd, 453 op/s
2018-02-28 19:03:45.107522 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
6696 B/s rd, 471 op/s
2018-02-28 19:03:49.106530 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
3958 B/s rd, 269 op/s
2018-02-28 19:03:50.110731 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:03:54.107816 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:03:55.109359 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:03:59.108575 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:00.110692 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:04.109099 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:05.111035 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:09.110238 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:10.112094 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:14.111468 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:15.113370 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:19.112223 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:20.116135 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:24.113174 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:25.114808 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:28.172510 mon.nefelus-controller [INF] setting 
require_min_compat_client to currently required firefly
2018-02-28 19:04:33.243221 osd.0 [INF] 4.6 scrub updated 
num_legacy_snapsets from 14 -> 0
^C#

After some time and since I wasn't getting any input I stopped it 
"CTRL-C"
I am very curious what are the above last lines about the "firefly" and 
the "num_legacy_snapsets" and if they mean something bad.

After that any further attempts shown that my pools, PGs, data etc. 
were gone...

# ceph -w
  cluster:
    id:     d357a551-5b7a-4501-8d8f-009c63b2c972
    health: HEALTH_WARN
            no active mgr

  services:
    mon: 1 daemons, quorum nefelus-controller
    mgr: no daemons active
    osd: 2 osds: 2 up, 2 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:

I would really need some help here to put everything back online...

Thanks,

G.

OK...now this is getting crazy...

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:

Where has gone everything??

What's happening here?

G.

Indeed John,

you are right! I have updated "ceph-deploy" (which was installed via
"pip" that's why wasn't updated with the rest ceph packages) but now
it complaints that keys are missing

$ ceph-deploy mgr create controller
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/user/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.0): /usr/bin/ceph-deploy mgr
create controller
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  mgr                           :
[('controller', 'controller')]
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : create
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       :
<ceph_deploy.conf.cephdeploy.Conf instance at 0x1d42bd8>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  func                          : <function
mgr at 0x1cce500>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.mgr][DEBUG ] Deploying mgr, cluster ceph hosts
controller:controller
[ceph_deploy][ERROR ] RuntimeError: bootstrap-mgr keyring not found;
run 'gatherkeys'

and I cannot get the keys...

$ ceph-deploy gatherkeys controller
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/user/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.0): /usr/bin/ceph-deploy
gatherkeys controller
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       :
<ceph_deploy.conf.cephdeploy.Conf instance at 0x199f290>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  mon                           : 
['controller']
[ceph_deploy.cli][INFO  ]  func                          : <function
gatherkeys at 0x198b2a8>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.gatherkeys][INFO  ] Storing keys in temp directory
/tmp/tmpPQ895t
[controller][DEBUG ] connection detected need for sudo
[controller][DEBUG ] connected to host: controller
[controller][DEBUG ] detect platform information from remote host
[controller][DEBUG ] detect machine type
[controller][DEBUG ] get remote short hostname
[controller][DEBUG ] fetch remote file
[ceph_deploy.gatherkeys][WARNIN] No mon key found in host: 
controller
[ceph_deploy.gatherkeys][ERROR ] Failed to connect to 
host:controller
[ceph_deploy.gatherkeys][INFO  ] Destroy temp directory 
/tmp/tmpPQ895t
[ceph_deploy][ERROR ] RuntimeError: Failed to connect any mon

On Wed, Feb 28, 2018 at 5:21 PM, Georgios Dimitrakakis
<giorgis@xxxxxxxxxxxx> wrote:
All,

I have updated my test ceph cluster from Jewer (10.2.10) to 
Luminous
(12.2.4) using CentOS packages.

I have updated all packages, restarted all services with the 
proper order
but I get a warning that the Manager Daemon doesn't exist.

Here is the output:

# ceph -s
  cluster:
    id:     d357a551-5b7a-4501-8d8f-009c63b2c972
    health: HEALTH_WARN
            no active mgr

  services:
    mon: 1 daemons, quorum controller
    mgr: no daemons active
    osd: 2 osds: 2 up, 2 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:

While at the same time the system service is up and running

# systemctl status ceph-mgr.target
● ceph-mgr.target - ceph target allowing to start/stop all 
ceph-mgr@.service
instances at once
   Loaded: loaded (/usr/lib/systemd/system/ceph-mgr.target; 
enabled; vendor
preset: enabled)
   Active: active since Wed 2018-02-28 18:57:13 EET; 12min ago

I understand that I have to add a new MGR but when I try to do it 
via
"ceph-deploy" it fails with the following error:

# ceph-deploy mgr create controller
usage: ceph-deploy [-h] [-v | -q] [--version] [--username 
USERNAME]
                   [--overwrite-conf] [--cluster NAME] 
[--ceph-conf
CEPH_CONF]
                   COMMAND ...
ceph-deploy: error: argument COMMAND: invalid choice: 'mgr' 
(choose from
'new', 'install', 'rgw', 'mon', 'mds', 'gatherkeys', 'disk', 
'osd', 'admin',
'repo', 'config', 'uninstall', 'purge', 'purgedata', 'calamari',
'forgetkeys', 'pkg')

You probably have an older version of ceph-deploy, from before it 
knew
how to create mgr daemons.

John

where "controller" is the node where ceph monitor is already 
running.

Any ideas why I cannot do it via "ceph-deploy" and what do I have 
to do to
have it back in a healthy state?

I am running CentOS 7.4.1708 (Core).

Best,

G.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com