Re: pg xyz is stuck undersized for long time

Amudhan P <amudhan83@xxxxxxxxx> · Mon, 9 Nov 2020 07:46:40 +0530

Hi Frank,

You said only one OSD is down but in ceph status shows more than 20 OSD is
down.

Regards,
Amudhan

On Sun 8 Nov, 2020, 12:13 AM Frank Schilder, <frans@xxxxxx> wrote:

> Hi all,
>
> I moved the crush location of 8 OSDs and rebalancing went on happily
> (misplaced objects only). Today, osd.1 crashed, restarted and rejoined the
> cluster. However, it seems not to re-join some PGs it was a member of. I
> have now undersized PGs for no real reason I would believe:
>
> PG_DEGRADED Degraded data redundancy: 52173/2268789087 objects degraded
> (0.002%), 2 pgs degraded, 7 pgs undersized
>     pg 11.52 is stuck undersized for 663.929664, current state
> active+undersized+remapped+backfilling, last acting
> [237,60,2147483647,74,233,232,292,86]
>
> The up and acting sets are:
>
>     "up": [
>         237,
>         2,
>         74,
>         289,
>         233,
>         232,
>         292,
>         86
>     ],
>     "acting": [
>         237,
>         60,
>         2147483647,
>         74,
>         233,
>         232,
>         292,
>         86
>     ],
>
> How can I get the PG to complete peering and osd.1 to join? I have an
> unreasonable number of degraded objects where the missing part is on this
> OSD.
>
> For completeness, here the cluster status:
>
> # ceph status
>   cluster:
>     id:     ...
>     health: HEALTH_ERR
>             noout,norebalance flag(s) set
>             1 large omap objects
>             35815902/2268938858 objects misplaced (1.579%)
>             Degraded data redundancy: 46122/2268938858 objects degraded
> (0.002%), 2 pgs degraded, 7 pgs undersized
>             Degraded data redundancy (low space): 28 pgs backfill_toofull
>
>   services:
>     mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
>     mgr: ceph-01(active), standbys: ceph-03, ceph-02
>     mds: con-fs2-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby-replay
>     osd: 299 osds: 275 up, 275 in; 301 remapped pgs
>          flags noout,norebalance
>
>   data:
>     pools:   11 pools, 3215 pgs
>     objects: 268.8 M objects, 675 TiB
>     usage:   854 TiB used, 1.1 PiB / 1.9 PiB avail
>     pgs:     46122/2268938858 objects degraded (0.002%)
>              35815902/2268938858 objects misplaced (1.579%)
>              2907 active+clean
>              219  active+remapped+backfill_wait
>              47   active+remapped+backfilling
>              28   active+remapped+backfill_wait+backfill_toofull
>              6    active+clean+scrubbing+deep
>              5    active+undersized+remapped+backfilling
>              2    active+undersized+degraded+remapped+backfilling
>              1    active+clean+scrubbing
>
>   io:
>     client:   13 MiB/s rd, 196 MiB/s wr, 2.82 kop/s rd, 1.81 kop/s wr
>     recovery: 57 MiB/s, 14 objects/s
>
> Thanks and best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx