Hi Team,
There is one 5 node ceph cluster which we have upgraded from Luminous to Nautilus and everything was going well until yesterday when we noticed that the ceph osd's are marked down and not recognized by the monitors as running eventhough the osd processes are running.
We noticed that the admin.keyring and the mon.keyring are missing in the nodes which we have recreated it with the below commands.
ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds allow
ceph-authtool --create_keyring /etc/ceph/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
In logs we find the below lines.
2019-11-08 09:01:50.525 7ff61722b700 0 log_channel(audit) log [DBG] : from='client.? 10.50.11.44:0/2398064782' entity='client.admin' cmd=[{"prefix": "df", "format": "json"}]: dispatch
2019-11-08 09:02:37.686 7ff61722b700 0 log_channel(cluster) log [INF] : mon.cn1 calling monitor election
2019-11-08 09:02:37.686 7ff61722b700 1 mon.cn1@0(electing).elector(31157) init, last seen epoch 31157, mid-election, bumping
2019-11-08 09:02:37.688 7ff61722b700 -1 mon.cn1@0(electing) e3 failed to get devid for : udev_device_new_from_subsystem_sysname failed on ''
2019-11-08 09:02:37.770 7ff61722b700 0 log_channel(cluster) log [INF] : mon.cn1 is new leader, mons cn1,cn2,cn3,cn4,cn5 in quorum (ranks 0,1,2,3,4)
2019-11-08 09:02:37.857 7ff613a24700 0 log_channel(cluster) log [DBG] : monmap e3: 5 mons at {cn1=[v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0],cn2=[v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0],cn3=[v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0],cn4=[v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0],cn5=[v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0]}
2019-11-08 09:02:37.686 7ff61722b700 0 log_channel(cluster) log [INF] : mon.cn1 calling monitor election
2019-11-08 09:02:37.686 7ff61722b700 1 mon.cn1@0(electing).elector(31157) init, last seen epoch 31157, mid-election, bumping
2019-11-08 09:02:37.688 7ff61722b700 -1 mon.cn1@0(electing) e3 failed to get devid for : udev_device_new_from_subsystem_sysname failed on ''
2019-11-08 09:02:37.770 7ff61722b700 0 log_channel(cluster) log [INF] : mon.cn1 is new leader, mons cn1,cn2,cn3,cn4,cn5 in quorum (ranks 0,1,2,3,4)
2019-11-08 09:02:37.857 7ff613a24700 0 log_channel(cluster) log [DBG] : monmap e3: 5 mons at {cn1=[v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0],cn2=[v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0],cn3=[v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0],cn4=[v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0],cn5=[v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0]}
# ceph mon dump
dumped monmap epoch 3
epoch 3
fsid 9dbf207a-561c-48ba-892d-3e79b86be12f
last_changed 2019-09-03 07:53:39.031174
created 2019-08-23 18:30:55.970279
min_mon_release 14 (nautilus)
0: [v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0] mon.cn1
1: [v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0] mon.cn2
2: [v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0] mon.cn3
3: [v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0] mon.cn4
4: [v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0] mon.cn5
dumped monmap epoch 3
epoch 3
fsid 9dbf207a-561c-48ba-892d-3e79b86be12f
last_changed 2019-09-03 07:53:39.031174
created 2019-08-23 18:30:55.970279
min_mon_release 14 (nautilus)
0: [v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0] mon.cn1
1: [v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0] mon.cn2
2: [v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0] mon.cn3
3: [v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0] mon.cn4
4: [v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0] mon.cn5
# ceph -s
cluster:
id: 9dbf207a-561c-48ba-892d-3e79b86be12f
health: HEALTH_WARN
85 osds down
3 hosts (72 osds) down
1 nearfull osd(s)
1 pool(s) nearfull
Reduced data availability: 2048 pgs inactive
too few PGs per OSD (17 < min 30)
1/5 mons down, quorum cn2,cn3,cn4,cn5
services:
mon: 5 daemons, quorum cn2,cn3,cn4,cn5 (age 57s), out of quorum: cn1
mgr: cn1(active, since 73m), standbys: cn2, cn3, cn4, cn5
osd: 120 osds: 35 up, 120 in; 909 remapped pgs
data:
pools: 1 pools, 2048 pgs
objects: 0 objects, 0 B
usage: 176 TiB used, 260 TiB / 437 TiB avail
pgs: 100.000% pgs unknown
2048 unknown
cluster:
id: 9dbf207a-561c-48ba-892d-3e79b86be12f
health: HEALTH_WARN
85 osds down
3 hosts (72 osds) down
1 nearfull osd(s)
1 pool(s) nearfull
Reduced data availability: 2048 pgs inactive
too few PGs per OSD (17 < min 30)
1/5 mons down, quorum cn2,cn3,cn4,cn5
services:
mon: 5 daemons, quorum cn2,cn3,cn4,cn5 (age 57s), out of quorum: cn1
mgr: cn1(active, since 73m), standbys: cn2, cn3, cn4, cn5
osd: 120 osds: 35 up, 120 in; 909 remapped pgs
data:
pools: 1 pools, 2048 pgs
objects: 0 objects, 0 B
usage: 176 TiB used, 260 TiB / 437 TiB avail
pgs: 100.000% pgs unknown
2048 unknown
The osd logs show the below logs.
2019-11-08 09:05:33.332 7fd1a36eed80 0 _get_class not permitted to load kvs
2019-11-08 09:05:33.332 7fd1a36eed80 0 _get_class not permitted to load lua
2019-11-08 09:05:33.337 7fd1a36eed80 0 _get_class not permitted to load sdk
2019-11-08 09:05:33.337 7fd1a36eed80 0 osd.0 1795 crush map has features 432629308056666112, adjusting msgr requires for clients
2019-11-08 09:05:33.337 7fd1a36eed80 0 osd.0 1795 crush map has features 432629308056666112 was 8705, adjusting msgr requires for mons
2019-11-08 09:05:33.337 7fd1a36eed80 0 osd.0 1795 crush map has features 1009090060360105984, adjusting msgr requires for osds
2019-11-08 09:05:33.332 7fd1a36eed80 0 _get_class not permitted to load lua
2019-11-08 09:05:33.337 7fd1a36eed80 0 _get_class not permitted to load sdk
2019-11-08 09:05:33.337 7fd1a36eed80 0 osd.0 1795 crush map has features 432629308056666112, adjusting msgr requires for clients
2019-11-08 09:05:33.337 7fd1a36eed80 0 osd.0 1795 crush map has features 432629308056666112 was 8705, adjusting msgr requires for mons
2019-11-08 09:05:33.337 7fd1a36eed80 0 osd.0 1795 crush map has features 1009090060360105984, adjusting msgr requires for osds
Please let us know what might be the issue. There seems to be no network issues in any of the servers public and private interfaces.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com