| |||||||
| |||||||
| |||||||
| |||||||
| |||||||
|
This email originates outside Virgin Pulse.
On 3/13/20 4:09 PM, Peter Eisch wrote:
> Full cluster is 14.2.8.
>
> I had some OSD drop overnight which results now in 4 inactive PGs. The
> pools had three participant (2 ssd, 1 sas) OSDs. In each pool at least 1
> ssd and 1 sas OSD is working without issue. I’ve ‘ceph pg repair <pg>’
> but it doesn’t seem to make any changes.
>
> PG_AVAILABILITY Reduced data availability: 4 pgs inactive, 4 pgs incomplete
> pg 10.2e is incomplete, acting [59,67]
> pg 10.c3 is incomplete, acting [62,105]
> pg 10.f3 is incomplete, acting [62,59]
> pg 10.1d5 is incomplete, acting [87,106]
>
> Using `ceph pg <pg> query` I can see the OSD in each case of the ones
> which failed. Respectively they are:
> pg 10.2e participants: 59, 68, 77, 143
> pg 10.c3 participants: 60, 62, 85, 102, 105, 106
> pg 10.f3 participants: 59, 64, 75, 107
> pg 10.1d5 participants: 64, 77, 87, 106
>
> The OSDs which are now down/out and have been removed from the crush map
> and removed the auth are:
> 62, 64, 68
>
> Of course I have lots of reports of slow OSDs now from OSDs worried
> about the inactive PGs.
>
> How do I properly kick these PGs to have them drop their usage of the
> OSDs which no longer exist?
You don't. Because those OSDs hold the data you need.
Why did you remove them from the CRUSHMap, OSDMap and auth? As you need
these to rebuild the PGs.
Wido
The drives failed at a hardware level. I've replaced OSDs with this by either planned migration or failure in previous instances without issue. I didn't realize all the replicated copies were on just one drive in each pool.
What should my actions have been in this case?
pool 10 volumes' replicated size 2 min_size 1 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512 autoscale_mode warn last_change 47570 lfor 0/0/40781 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
Crush rule 1:
rule ssd_by_host {
id 1
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 0 type host
step emit
}
peter
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx