ceph osd continously fails

Amudhan P <amudhan83@xxxxxxxxx> · Wed, 11 Aug 2021 17:53:58 +0530

Hi,
I am using ceph version 15.2.7 in 4 node cluster my OSD's is
continuously stopping and even if I start again it stops after some time. I
couldn't find anything from the log.
I have set norecover and nobackfil as soon as I unset norecover OSD starts
to fail.

 cluster:
    id:     b6437922-3edf-11eb-adc2-0cc47a5ec98a
    health: HEALTH_ERR
            1/6307061 objects unfound (0.000%)
            noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub
flag(s) set
            19 osds down
            62477 scrub errors
            Reduced data availability: 75 pgs inactive, 12 pgs down, 57 pgs
peering, 90 pgs stale
            Possible data damage: 1 pg recovery_unfound, 7 pgs inconsistent
            Degraded data redundancy: 3090660/12617416 objects degraded
(24.495%), 394 pgs degraded, 399 pgs undersized
            5 pgs not deep-scrubbed in time
            127 daemons have recently crashed

  data:
    pools:   4 pools, 833 pgs
    objects: 6.31M objects, 23 TiB
    usage:   47 TiB used, 244 TiB / 291 TiB avail
    pgs:     9.004% pgs not active
             3090660/12617416 objects degraded (24.495%)
             315034/12617416 objects misplaced (2.497%)
             1/6307061 objects unfound (0.000%)
             368 active+undersized+degraded
             299 active+clean
             56  stale+peering
             24  stale+active+clean
             15  active+recovery_wait
             12  active+undersized+remapped
             11  active+undersized+degraded+remapped+backfill_wait
             11  down
             7   active+recovery_wait+degraded
             7   active+clean+remapped
             5   active+clean+remapped+inconsistent
             5   stale+activating+undersized
             4   active+recovering+degraded
             2   stale+active+recovery_wait+degraded
             1   active+recovery_unfound+undersized+degraded+remapped
             1   stale+remapped+peering
             1   stale+activating
             1   stale+down
             1   active+remapped+backfill_wait
             1   active+undersized+remapped+inconsistent
             1
active+undersized+degraded+remapped+inconsistent+backfill_wait

what needs to be done to recover this?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx