Re: ceph mons and osds are down

Michel Niyoyita <micou12@xxxxxxxxx> · Tue, 22 Feb 2022 16:40:23 +0200

Actually one of my colleagues tried to reboot all nodes and he did not
prepare the node like setting noout , norecover ......, once all node are
up the cluster is no longer accessible and above are messages we are
getting. I did not remove any osd . except are marked down.
below is my ceph.conf:

mon initial members = ceph-mon1,ceph-mon2,ceph-mon3
mon_allow_pool_delete = True
mon_clock_drift_allowed = 0.5
mon_max_pg_per_osd = 400
mon_osd_allow_primary_affinity = 1
mon_pg_warn_max_object_skew = 0
mon_pg_warn_max_per_osd = 0
mon_pg_warn_min_per_osd = 0
osd pool default crush rule = -1
osd_pool_default_min_size = 1
osd_pool_default_size = 2
public network = 0.0.0.0/0

On Tue, Feb 22, 2022 at 4:32 PM <ashley@xxxxxxxxxxxxxx> wrote:

> You have 1 OSD offline, has this disk failed or you aware of what has
> caused this to go offline?
> Shows you have 10 OSD’s but only 7in, have you removed the other 3? Was
> the data fully drained off these first?
>
> I see you have 11 Pool’s what are these setup as, type and min/max size?
>
> > On 22 Feb 2022, at 14:15, Michel Niyoyita <micou12@xxxxxxxxx> wrote:
> >
> > Dear Ceph Users,
> >
> > Kindly help me to repair my cluster is down from yesterday up to now I am
> > not able to make it up and running . below are some findings:
> >
> >    id:     6ad86187-2738-42d8-8eec-48b2a43c298f
> >    health: HEALTH_ERR
> >            mons are allowing insecure global_id reclaim
> >            1/3 mons down, quorum ceph-mon1,ceph-mon3
> >            10/32332 objects unfound (0.031%)
> >            1 osds down
> >            3 scrub errors
> >            Reduced data availability: 124 pgs inactive, 60 pgs down, 411
> > pgs stale
> >            Possible data damage: 9 pgs recovery_unfound, 1 pg
> > backfill_unfound, 1 pg inconsistent
> >            Degraded data redundancy: 6009/64664 objects degraded
> (9.293%),
> > 55 pgs degraded, 80 pgs undersized
> >            11 pgs not deep-scrubbed in time
> >            5 slow ops, oldest one blocked for 1638 sec, osd.9 has slow
> ops
> >
> >  services:
> >    mon: 3 daemons, quorum ceph-mon1,ceph-mon3 (age 3h), out of quorum:
> > ceph-mon2
> >    mgr: ceph-mon1(active, since 9h), standbys: ceph-mon2
> >    osd: 10 osds: 6 up (since 7h), 7 in (since 9h); 43 remapped pgs
> >
> >  data:
> >    pools:   11 pools, 560 pgs
> >    objects: 32.33k objects, 159 GiB
> >    usage:   261 GiB used, 939 GiB / 1.2 TiB avail
> >    pgs:     11.429% pgs unknown
> >             10.714% pgs not active
> >             6009/64664 objects degraded (9.293%)
> >             1384/64664 objects misplaced (2.140%)
> >             10/32332 objects unfound (0.031%)
> >             245 stale+active+clean
> >             70  active+clean
> >             64  unknown
> >             48  stale+down
> >             45  stale+active+undersized+degraded
> >             37  stale+active+clean+remapped
> >             28  stale+active+undersized
> >             12  down
> >             2   stale+active+recovery_unfound+degraded
> >             2   stale+active+recovery_unfound+undersized+degraded
> >             2
>  stale+active+recovery_unfound+undersized+degraded+remapped
> >             2   active+recovery_unfound+undersized+degraded+remapped
> >             1   active+clean+inconsistent
> >             1   stale+active+recovery_unfound+degraded+remapped
> >             1
>  stale+active+backfill_unfound+undersized+degraded+remapped
> >
> > If someone faced same issue please help me.
> >
> > Best Regards.
> >
> > Michel
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx