Now the rebalance started to continue but I always have 2 pg which is in degraded state, what is weird that the up and acting osds are totally different :/ PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 28.3c active+undersized+degraded+remapped+backfill_wait [17,33,48,25,1,0] 17 [29,15,14,2147483647,26,44] 29 28.47 active+undersized+degraded+remapped+backfill_wait [21,19,48,25,27,0] 21 [2147483647,45,17,25,12,30] 45 Is it normal? Backfilling still ongoing but now the degraded objects counter is increasing rather than decreasing ... Ehhh -----Original Message----- From: Eugen Block <eblock@xxxxxx> Sent: Friday, October 1, 2021 2:45 PM To: ceph-users@xxxxxxx Subject: Re: dealing with unfound pg in 4:2 ec pool Email received from the internet. If in doubt, don't click any link nor open any attachment ! ________________________________ Hi, I'm not sure if setting min_size to 4 would also fix the PGs, but the client IO would probably be restored. Marking it as lost is the last straw according to this list, luckily I haven't been in such a situation yet. So give it a try with min_size = 4 but don't forget to increase after the PGs are recovered. But keep in mind that if you decrease min_size and you lose another OSD you could face data loss. Are your OSDs still crashing unexpected? Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>: > Hi, > > If I set the min size of the pool to 4, will this pg be recovered? > Or how I can take out the cluster from health error like this? > Mark as lost seems risky based on some maillist experience, even if > marked lost after you still have issue, so curious what is the way to > take the cluster out from this and let it recover: > > Example problematic pg: > dumped pgs_brief > PG_STAT STATE UP > UP_PRIMARY ACTING > ACTING_PRIMARY > 28.5b active+recovery_unfound+undersized+degraded+remapped > [18,33,10,0,48,1] 18 [2147483647,2147483647,29,21,4,47] > 29 > > Cluster state: > cluster: > id: 5a07ec50-4eee-4336-aa11-46ca76edcc24 > health: HEALTH_ERR > 10 OSD(s) experiencing BlueFS spillover > 4/1055070542 objects unfound (0.000%) > noout flag(s) set > Possible data damage: 2 pgs recovery_unfound > Degraded data redundancy: 64150765/6329079237 objects > degraded (1.014%), 10 pgs degraded, 26 pgs undersized > 4 pgs not deep-scrubbed in time > > services: > mon: 3 daemons, quorum mon-2s01,mon-2s02,mon-2s03 (age 2M) > mgr: mon-2s01(active, since 2M), standbys: mon-2s03, mon-2s02 > osd: 49 osds: 49 up (since 36m), 49 in (since 4d); 28 remapped pgs > flags noout > rgw: 3 daemons active (mon-2s01.rgw0, mon-2s02.rgw0, > mon-2s03.rgw0) > > task status: > > data: > pools: 9 pools, 425 pgs > objects: 1.06G objects, 66 TiB > usage: 158 TiB used, 465 TiB / 623 TiB avail > pgs: 64150765/6329079237 objects degraded (1.014%) > 38922319/6329079237 objects misplaced (0.615%) > 4/1055070542 objects unfound (0.000%) > 393 active+clean > 13 active+undersized+remapped+backfill_wait > 8 active+undersized+degraded+remapped+backfill_wait > 3 active+clean+scrubbing > 3 active+undersized+remapped+backfilling > 2 active+recovery_unfound+undersized+degraded+remapped > 2 active+remapped+backfill_wait > 1 active+clean+scrubbing+deep > > io: > client: 181 MiB/s rd, 9.4 MiB/s wr, 5.38k op/s rd, 2.42k op/s wr > recovery: 23 MiB/s, 389 objects/s > > > Thank you. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx