pg xyz is stuck undersized for long time

Frank Schilder <frans@xxxxxx> · Sat, 7 Nov 2020 18:43:03 +0000

Hi all,

I moved the crush location of 8 OSDs and rebalancing went on happily (misplaced objects only). Today, osd.1 crashed, restarted and rejoined the cluster. However, it seems not to re-join some PGs it was a member of. I have now undersized PGs for no real reason I would believe:

PG_DEGRADED Degraded data redundancy: 52173/2268789087 objects degraded (0.002%), 2 pgs degraded, 7 pgs undersized
    pg 11.52 is stuck undersized for 663.929664, current state active+undersized+remapped+backfilling, last acting [237,60,2147483647,74,233,232,292,86]

The up and acting sets are:

    "up": [
        237,
        2,
        74,
        289,
        233,
        232,
        292,
        86
    ],
    "acting": [
        237,
        60,
        2147483647,
        74,
        233,
        232,
        292,
        86
    ],

How can I get the PG to complete peering and osd.1 to join? I have an unreasonable number of degraded objects where the missing part is on this OSD.

For completeness, here the cluster status:

# ceph status
  cluster:
    id:     ...
    health: HEALTH_ERR
            noout,norebalance flag(s) set
            1 large omap objects
            35815902/2268938858 objects misplaced (1.579%)
            Degraded data redundancy: 46122/2268938858 objects degraded (0.002%), 2 pgs degraded, 7 pgs undersized
            Degraded data redundancy (low space): 28 pgs backfill_toofull

  services:
    mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
    mgr: ceph-01(active), standbys: ceph-03, ceph-02
    mds: con-fs2-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby-replay
    osd: 299 osds: 275 up, 275 in; 301 remapped pgs
         flags noout,norebalance

  data:
    pools:   11 pools, 3215 pgs
    objects: 268.8 M objects, 675 TiB
    usage:   854 TiB used, 1.1 PiB / 1.9 PiB avail
    pgs:     46122/2268938858 objects degraded (0.002%)
             35815902/2268938858 objects misplaced (1.579%)
             2907 active+clean
             219  active+remapped+backfill_wait
             47   active+remapped+backfilling
             28   active+remapped+backfill_wait+backfill_toofull
             6    active+clean+scrubbing+deep
             5    active+undersized+remapped+backfilling
             2    active+undersized+degraded+remapped+backfilling
             1    active+clean+scrubbing

  io:
    client:   13 MiB/s rd, 196 MiB/s wr, 2.82 kop/s rd, 1.81 kop/s wr
    recovery: 57 MiB/s, 14 objects/s

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx