Re: Fwd: OSD's not coming up in Nautilus

huang jun <hjwsm1989@xxxxxxxxx> · Fri, 8 Nov 2019 19:18:38 +0800

the osd.0 is still in down state after restart? if so, maybe the
problem is in mon,
can you set the leader mon's debug_mon=20 and restart one of the down
state osd.
and then attach the mon log file.

nokia ceph <nokiacephusers@xxxxxxxxx> 于2019年11月8日周五 下午6:38写道：
>
> Hi,
>
>
>
> Below is the status of the OSD after restart.
>
>
>
> # systemctl status ceph-osd@0.service
>
> ● ceph-osd@0.service - Ceph object storage daemon osd.0
>
>    Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: disabled)
>
>   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>
>            └─90-ExecStart_NUMA.conf
>
>    Active: active (running) since Fri 2019-11-08 10:32:51 UTC; 1min 1s ago
>
>   Process: 219213 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)  Main PID: 219218 (ceph-osd)
>
>    CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>
>            └─219218 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
>
>
>
> Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object storage daemon osd.0...
>
> Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object storage daemon osd.0.
>
> Nov 08 10:33:03 cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:03.785 7f9adeed4d80 -1 osd.0 1795 log_to_monitors {default=true} Nov 08 10:33:05 cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:05.474 7f9ad14df700 -1 osd.0 1795 set_numa_affinity unable to identify public interface 'dss-client' numa n...r directory
>
> Hint: Some lines were ellipsized, use -l to show in full.
>
>
>
>
>
> And I have attached the logs in the file in this mail while this restart was initiated.
>
>
>
>
> On Fri, Nov 8, 2019 at 3:59 PM huang jun <hjwsm1989@xxxxxxxxx> wrote:
>>
>> try to restart some of the down osds in 'ceph osd tree', and to see
>> what happened?
>>
>> nokia ceph <nokiacephusers@xxxxxxxxx> 于2019年11月8日周五 下午6:24写道：
>> >
>> > Adding my official mail id
>> >
>> > ---------- Forwarded message ---------
>> > From: nokia ceph <nokiacephusers@xxxxxxxxx>
>> > Date: Fri, Nov 8, 2019 at 3:57 PM
>> > Subject: OSD's not coming up in Nautilus
>> > To: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
>> >
>> >
>> > Hi Team,
>> >
>> > There is one 5 node ceph cluster which we have upgraded from Luminous to Nautilus and everything was going well until yesterday when we noticed that the ceph osd's are marked down and not recognized by the monitors as running eventhough the osd processes are running.
>> >
>> > We noticed that the admin.keyring and the mon.keyring are missing in the nodes which we have recreated it with the below commands.
>> >
>> > ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds allow
>> >
>> > ceph-authtool --create_keyring /etc/ceph/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
>> >
>> > In logs we find the below lines.
>> >
>> > 2019-11-08 09:01:50.525 7ff61722b700  0 log_channel(audit) log [DBG] : from='client.? 10.50.11.44:0/2398064782' entity='client.admin' cmd=[{"prefix": "df", "format": "json"}]: dispatch
>> > 2019-11-08 09:02:37.686 7ff61722b700  0 log_channel(cluster) log [INF] : mon.cn1 calling monitor election
>> > 2019-11-08 09:02:37.686 7ff61722b700  1 mon.cn1@0(electing).elector(31157) init, last seen epoch 31157, mid-election, bumping
>> > 2019-11-08 09:02:37.688 7ff61722b700 -1 mon.cn1@0(electing) e3 failed to get devid for : udev_device_new_from_subsystem_sysname failed on ''
>> > 2019-11-08 09:02:37.770 7ff61722b700  0 log_channel(cluster) log [INF] : mon.cn1 is new leader, mons cn1,cn2,cn3,cn4,cn5 in quorum (ranks 0,1,2,3,4)
>> > 2019-11-08 09:02:37.857 7ff613a24700  0 log_channel(cluster) log [DBG] : monmap e3: 5 mons at {cn1=[v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0],cn2=[v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0],cn3=[v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0],cn4=[v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0],cn5=[v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0]}
>> >
>> >
>> >
>> > # ceph mon dump
>> > dumped monmap epoch 3
>> > epoch 3
>> > fsid 9dbf207a-561c-48ba-892d-3e79b86be12f
>> > last_changed 2019-09-03 07:53:39.031174
>> > created 2019-08-23 18:30:55.970279
>> > min_mon_release 14 (nautilus)
>> > 0: [v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0] mon.cn1
>> > 1: [v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0] mon.cn2
>> > 2: [v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0] mon.cn3
>> > 3: [v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0] mon.cn4
>> > 4: [v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0] mon.cn5
>> >
>> >
>> > # ceph -s
>> >   cluster:
>> >     id:     9dbf207a-561c-48ba-892d-3e79b86be12f
>> >     health: HEALTH_WARN
>> >             85 osds down
>> >             3 hosts (72 osds) down
>> >             1 nearfull osd(s)
>> >             1 pool(s) nearfull
>> >             Reduced data availability: 2048 pgs inactive
>> >             too few PGs per OSD (17 < min 30)
>> >             1/5 mons down, quorum cn2,cn3,cn4,cn5
>> >
>> >   services:
>> >     mon: 5 daemons, quorum cn2,cn3,cn4,cn5 (age 57s), out of quorum: cn1
>> >     mgr: cn1(active, since 73m), standbys: cn2, cn3, cn4, cn5
>> >     osd: 120 osds: 35 up, 120 in; 909 remapped pgs
>> >
>> >   data:
>> >     pools:   1 pools, 2048 pgs
>> >     objects: 0 objects, 0 B
>> >     usage:   176 TiB used, 260 TiB / 437 TiB avail
>> >     pgs:     100.000% pgs unknown
>> >              2048 unknown
>> >
>> >
>> > The osd logs show the below logs.
>> >
>> > 2019-11-08 09:05:33.332 7fd1a36eed80  0 _get_class not permitted to load kvs
>> > 2019-11-08 09:05:33.332 7fd1a36eed80  0 _get_class not permitted to load lua
>> > 2019-11-08 09:05:33.337 7fd1a36eed80  0 _get_class not permitted to load sdk
>> > 2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features 432629308056666112, adjusting msgr requires for clients
>> > 2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features 432629308056666112 was 8705, adjusting msgr requires for mons
>> > 2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features 1009090060360105984, adjusting msgr requires for osds
>> >
>> > Please let us know what might be the issue. There seems to be no network issues in any of the servers public and private interfaces.
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com