Re: pg xyz is stuck undersized for long time

Frank Schilder <frans@xxxxxx> · Mon, 9 Nov 2020 14:03:20 +0000

My PGs are healthy now, but the underlying problem itself is not fixed. I was interested if someone knew a fast fix to get the PGs complete right away.

The down OSDs have been shut down a long time ago and are sitting in a different crush root. It was 1 OSD in an HDD pool that I'm re-organising right now, which was temporarily down (1 out of the 275).

I should have mentioned that I know that a long-standing bug in ceph is the reason for this partial data loss (https://tracker.ceph.com/issues/46847). I thought I had a fully functional workaround, but it turned out that I was wrong. My workaround fixes all incomplete PGs, except PGs that are in the state "backfilling" at the time of OSD restart.

I will file a new tracker item as this looks like a catastrophic bug. Any cluster that is rebalancing, either after adding disks, increasing pg[p]_num on a pool or similar operations is in danger. You will find many threads related to this problem, but the actual underlying bug has never been addressed completely. Some people actually lost data due to this, in particular, EC pools can become damaged beyond repair. From all the threads I found, this seems to be the one and only long-standing bug in ceph/rados that can cause data loss. A lot of clusters are affected, people are mostly just lucky. Reports date back to Luminous all the way up to Nautilus.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Amudhan P <amudhan83@xxxxxxxxx>
Sent: 09 November 2020 03:16:40
To: Frank Schilder
Cc: ceph-users
Subject: Re:  pg xyz is stuck undersized for long time

Hi Frank,

You said only one OSD is down but in ceph status shows more than 20 OSD is down.

Regards,
Amudhan

On Sun 8 Nov, 2020, 12:13 AM Frank Schilder, <frans@xxxxxx<mailto:frans@xxxxxx>> wrote:
Hi all,

I moved the crush location of 8 OSDs and rebalancing went on happily (misplaced objects only). Today, osd.1 crashed, restarted and rejoined the cluster. However, it seems not to re-join some PGs it was a member of. I have now undersized PGs for no real reason I would believe:

PG_DEGRADED Degraded data redundancy: 52173/2268789087 objects degraded (0.002%), 2 pgs degraded, 7 pgs undersized
    pg 11.52 is stuck undersized for 663.929664, current state active+undersized+remapped+backfilling, last acting [237,60,2147483647,74,233,232,292,86]

The up and acting sets are:

    "up": [
        237,
        2,
        74,
        289,
        233,
        232,
        292,
        86
    ],
    "acting": [
        237,
        60,
        2147483647,
        74,
        233,
        232,
        292,
        86
    ],

How can I get the PG to complete peering and osd.1 to join? I have an unreasonable number of degraded objects where the missing part is on this OSD.

For completeness, here the cluster status:

# ceph status
  cluster:
    id:     ...
    health: HEALTH_ERR
            noout,norebalance flag(s) set
            1 large omap objects
            35815902/2268938858 objects misplaced (1.579%)
            Degraded data redundancy: 46122/2268938858 objects degraded (0.002%), 2 pgs degraded, 7 pgs undersized
            Degraded data redundancy (low space): 28 pgs backfill_toofull

  services:
    mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
    mgr: ceph-01(active), standbys: ceph-03, ceph-02
    mds: con-fs2-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby-replay
    osd: 299 osds: 275 up, 275 in; 301 remapped pgs
         flags noout,norebalance

  data:
    pools:   11 pools, 3215 pgs
    objects: 268.8 M objects, 675 TiB
    usage:   854 TiB used, 1.1 PiB / 1.9 PiB avail
    pgs:     46122/2268938858 objects degraded (0.002%)
             35815902/2268938858 objects misplaced (1.579%)
             2907 active+clean
             219  active+remapped+backfill_wait
             47   active+remapped+backfilling
             28   active+remapped+backfill_wait+backfill_toofull
             6    active+clean+scrubbing+deep
             5    active+undersized+remapped+backfilling
             2    active+undersized+degraded+remapped+backfilling
             1    active+clean+scrubbing

  io:
    client:   13 MiB/s rd, 196 MiB/s wr, 2.82 kop/s rd, 1.81 kop/s wr
    recovery: 57 MiB/s, 14 objects/s

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx