Hi,
I'm not sure if setting min_size to 4 would also fix the PGs, but the
client IO would probably be restored. Marking it as lost is the last
straw according to this list, luckily I haven't been in such a
situation yet. So give it a try with min_size = 4 but don't forget to
increase after the PGs are recovered. But keep in mind that if you
decrease min_size and you lose another OSD you could face data loss.
Are your OSDs still crashing unexpected?
Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>:
Hi,
If I set the min size of the pool to 4, will this pg be recovered?
Or how I can take out the cluster from health error like this?
Mark as lost seems risky based on some maillist experience, even if
marked lost after you still have issue, so curious what is the way
to take the cluster out from this and let it recover:
Example problematic pg:
dumped pgs_brief
PG_STAT STATE UP
UP_PRIMARY ACTING
ACTING_PRIMARY
28.5b active+recovery_unfound+undersized+degraded+remapped
[18,33,10,0,48,1] 18 [2147483647,2147483647,29,21,4,47]
29
Cluster state:
cluster:
id: 5a07ec50-4eee-4336-aa11-46ca76edcc24
health: HEALTH_ERR
10 OSD(s) experiencing BlueFS spillover
4/1055070542 objects unfound (0.000%)
noout flag(s) set
Possible data damage: 2 pgs recovery_unfound
Degraded data redundancy: 64150765/6329079237 objects
degraded (1.014%), 10 pgs degraded, 26 pgs undersized
4 pgs not deep-scrubbed in time
services:
mon: 3 daemons, quorum mon-2s01,mon-2s02,mon-2s03 (age 2M)
mgr: mon-2s01(active, since 2M), standbys: mon-2s03, mon-2s02
osd: 49 osds: 49 up (since 36m), 49 in (since 4d); 28 remapped pgs
flags noout
rgw: 3 daemons active (mon-2s01.rgw0, mon-2s02.rgw0, mon-2s03.rgw0)
task status:
data:
pools: 9 pools, 425 pgs
objects: 1.06G objects, 66 TiB
usage: 158 TiB used, 465 TiB / 623 TiB avail
pgs: 64150765/6329079237 objects degraded (1.014%)
38922319/6329079237 objects misplaced (0.615%)
4/1055070542 objects unfound (0.000%)
393 active+clean
13 active+undersized+remapped+backfill_wait
8 active+undersized+degraded+remapped+backfill_wait
3 active+clean+scrubbing
3 active+undersized+remapped+backfilling
2 active+recovery_unfound+undersized+degraded+remapped
2 active+remapped+backfill_wait
1 active+clean+scrubbing+deep
io:
client: 181 MiB/s rd, 9.4 MiB/s wr, 5.38k op/s rd, 2.42k op/s wr
recovery: 23 MiB/s, 389 objects/s
Thank you.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx