Re: dealing with unfound pg in 4:2 ec pool

Eugen Block <eblock@xxxxxx> · Fri, 01 Oct 2021 07:44:32 +0000

Hi,

I'm not sure if setting min_size to 4 would also fix the PGs, but the  
client IO would probably be restored. Marking it as lost is the last  
straw according to this list, luckily I haven't been in such a  
situation yet. So give it a try with min_size = 4 but don't forget to  
increase after the PGs are recovered. But keep in mind that if you  
decrease min_size and you lose another OSD you could face data loss.  
Are your OSDs still crashing unexpected?

Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>:

Hi,

If I set the min size of the pool to 4, will this pg be recovered?  
Or how I can take out the cluster from health error like this?
Mark as lost seems risky based on some maillist experience, even if  
marked lost after you still have issue, so curious what is the way  
to take the cluster out from this and let it recover:

Example problematic pg:
dumped pgs_brief
PG_STAT  STATE                                                 UP     
               UP_PRIMARY  ACTING                               
ACTING_PRIMARY
28.5b    active+recovery_unfound+undersized+degraded+remapped     
[18,33,10,0,48,1]          18  [2147483647,2147483647,29,21,4,47]     
          29

Cluster state:
  cluster:
    id:     5a07ec50-4eee-4336-aa11-46ca76edcc24
    health: HEALTH_ERR
            10 OSD(s) experiencing BlueFS spillover
            4/1055070542 objects unfound (0.000%)
            noout flag(s) set
            Possible data damage: 2 pgs recovery_unfound
            Degraded data redundancy: 64150765/6329079237 objects  
degraded (1.014%), 10 pgs degraded, 26 pgs undersized
            4 pgs not deep-scrubbed in time

  services:
    mon: 3 daemons, quorum mon-2s01,mon-2s02,mon-2s03 (age 2M)
    mgr: mon-2s01(active, since 2M), standbys: mon-2s03, mon-2s02
    osd: 49 osds: 49 up (since 36m), 49 in (since 4d); 28 remapped pgs
         flags noout
    rgw: 3 daemons active (mon-2s01.rgw0, mon-2s02.rgw0, mon-2s03.rgw0)

  task status:

  data:
    pools:   9 pools, 425 pgs
    objects: 1.06G objects, 66 TiB
    usage:   158 TiB used, 465 TiB / 623 TiB avail
    pgs:     64150765/6329079237 objects degraded (1.014%)
             38922319/6329079237 objects misplaced (0.615%)
             4/1055070542 objects unfound (0.000%)
             393 active+clean
             13  active+undersized+remapped+backfill_wait
             8   active+undersized+degraded+remapped+backfill_wait
             3   active+clean+scrubbing
             3   active+undersized+remapped+backfilling
             2   active+recovery_unfound+undersized+degraded+remapped
             2   active+remapped+backfill_wait
             1   active+clean+scrubbing+deep

  io:
    client:   181 MiB/s rd, 9.4 MiB/s wr, 5.38k op/s rd, 2.42k op/s wr
    recovery: 23 MiB/s, 389 objects/s

Thank you.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx