dealing with unfound pg in 4:2 ec pool

"Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx> · Thu, 30 Sep 2021 20:01:16 +0000

Hi,

If I set the min size of the pool to 4, will this pg be recovered? Or how I can take out the cluster from health error like this?
Mark as lost seems risky based on some maillist experience, even if marked lost after you still have issue, so curious what is the way to take the cluster out from this and let it recover:

Example problematic pg:
dumped pgs_brief
PG_STAT  STATE                                                 UP                   UP_PRIMARY  ACTING                              ACTING_PRIMARY
28.5b    active+recovery_unfound+undersized+degraded+remapped    [18,33,10,0,48,1]          18  [2147483647,2147483647,29,21,4,47]              29

Cluster state:
  cluster:
    id:     5a07ec50-4eee-4336-aa11-46ca76edcc24
    health: HEALTH_ERR
            10 OSD(s) experiencing BlueFS spillover
            4/1055070542 objects unfound (0.000%)
            noout flag(s) set
            Possible data damage: 2 pgs recovery_unfound
            Degraded data redundancy: 64150765/6329079237 objects degraded (1.014%), 10 pgs degraded, 26 pgs undersized
            4 pgs not deep-scrubbed in time

  services:
    mon: 3 daemons, quorum mon-2s01,mon-2s02,mon-2s03 (age 2M)
    mgr: mon-2s01(active, since 2M), standbys: mon-2s03, mon-2s02
    osd: 49 osds: 49 up (since 36m), 49 in (since 4d); 28 remapped pgs
         flags noout
    rgw: 3 daemons active (mon-2s01.rgw0, mon-2s02.rgw0, mon-2s03.rgw0)

  task status:

  data:
    pools:   9 pools, 425 pgs
    objects: 1.06G objects, 66 TiB
    usage:   158 TiB used, 465 TiB / 623 TiB avail
    pgs:     64150765/6329079237 objects degraded (1.014%)
             38922319/6329079237 objects misplaced (0.615%)
             4/1055070542 objects unfound (0.000%)
             393 active+clean
             13  active+undersized+remapped+backfill_wait
             8   active+undersized+degraded+remapped+backfill_wait
             3   active+clean+scrubbing
             3   active+undersized+remapped+backfilling
             2   active+recovery_unfound+undersized+degraded+remapped
             2   active+remapped+backfill_wait
             1   active+clean+scrubbing+deep

  io:
    client:   181 MiB/s rd, 9.4 MiB/s wr, 5.38k op/s rd, 2.42k op/s wr
    recovery: 23 MiB/s, 389 objects/s

Thank you.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx