Re: OSD fail to authenticate after node outage

Eugen Block <eblock@xxxxxx> · Fri, 10 Feb 2023 07:59:51 +0000

Hi,

I believe this question already has been answered on [1]. The failing  
OSDs had an old monmap and were able to start after modifying their  
config.

[1]  
https://stackoverflow.com/questions/75366436/ceph-osd-authenticate-timed-out-after-node-restart

Zitat von tsmgeek@xxxxxxxxx:

Release: 16.2.7 (pacific)
Infra: 4 x Nodes (4xOSD HDD), 3 x Nodes (mon/mds, 1 x OSD NVMe)

We recently had a couple of node which went offline unexpectedly  
triggering a rebalance which is still ongoing.
The OSDs on the restarted node are marked as down and they keep  
showing in the log `authenticated timed out`, after a period of time  
they get marked `autoout`.
We tried setting `noout` on the cluster which has stopped them being  
marked out but they still never authenticate.
We can access all the ceph tooling from those nodes which indicates  
connection to mons.
The node keyring/time are both in sync.
We are at a loss to why we can not get the OSDs to authenticate.

Any help would be apreciated.

```
  cluster:
    id:     d5126e5a-882e-11ec-954e-90e2baec3d2c
    health: HEALTH_WARN
            7 failed cephadm daemon(s)
            2 stray daemon(s) not managed by cephadm
            insufficient standby MDS daemons available
            nodown,noout flag(s) set
            8 osds down
            2 hosts (8 osds) down
            Degraded data redundancy: 195930251/392039621 objects  
degraded (49.977%), 160 pgs degraded, 160 pgs undersized
            2 pgs not deep-scrubbed in time

  services:
    mon: 3 daemons, quorum ceph5,ceph7,ceph6 (age 38h)
    mgr: ceph2.tofizp(active, since 9M), standbys: ceph1.vnkagp
    mds: 3/3 daemons up
    osd: 19 osds: 11 up (since 38h), 19 in (since 45h); 5 remapped pgs
         flags nodown,noout

  data:
    volumes: 1/1 healthy
    pools:   6 pools, 257 pgs
    objects: 102.94M objects, 67 TiB
    usage:   68 TiB used, 50 TiB / 118 TiB avail
    pgs:     195930251/392039621 objects degraded (49.977%)
             3205811/392039621 objects misplaced (0.818%)
             155 active+undersized+degraded
             97  active+clean
             3   active+undersized+degraded+remapped+backfill_wait
             2   active+undersized+degraded+remapped+backfilling

  io:
    client:   511 B/s rd, 102 KiB/s wr, 0 op/s rd, 2 op/s wr
    recovery: 13 MiB/s, 16 objects/s
```
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx