Full cluster is 14.2.8.
I had some OSD drop overnight which results now in 4 inactive PGs. The pools had three participant (2 ssd, 1 sas) OSDs. In each pool at least 1 ssd and 1 sas OSD is working without issue. I’ve ‘ceph pg repair <pg>’ but it doesn’t seem to make any changes.
PG_AVAILABILITY Reduced data availability: 4 pgs inactive, 4 pgs incomplete
pg 10.2e is incomplete, acting [59,67]
pg 10.c3 is incomplete, acting [62,105]
pg 10.f3 is incomplete, acting [62,59]
pg 10.1d5 is incomplete, acting [87,106]
Using `ceph pg <pg> query` I can see the OSD in each case of the ones which failed. Respectively they are:
pg 10.2e participants: 59, 68, 77, 143
pg 10.c3 participants: 60, 62, 85, 102, 105, 106
pg 10.f3 participants: 59, 64, 75, 107
pg 10.1d5 participants: 64, 77, 87, 106
The OSDs which are now down/out and have been removed from the crush map and removed the auth are:
62, 64, 68
Of course I have lots of reports of slow OSDs now from OSDs worried about the inactive PGs.
How do I properly kick these PGs to have them drop their usage of the OSDs which no longer exist?
Thanks for you thoughts on this,
peter
I had some OSD drop overnight which results now in 4 inactive PGs. The pools had three participant (2 ssd, 1 sas) OSDs. In each pool at least 1 ssd and 1 sas OSD is working without issue. I’ve ‘ceph pg repair <pg>’ but it doesn’t seem to make any changes.
PG_AVAILABILITY Reduced data availability: 4 pgs inactive, 4 pgs incomplete
pg 10.2e is incomplete, acting [59,67]
pg 10.c3 is incomplete, acting [62,105]
pg 10.f3 is incomplete, acting [62,59]
pg 10.1d5 is incomplete, acting [87,106]
Using `ceph pg <pg> query` I can see the OSD in each case of the ones which failed. Respectively they are:
pg 10.2e participants: 59, 68, 77, 143
pg 10.c3 participants: 60, 62, 85, 102, 105, 106
pg 10.f3 participants: 59, 64, 75, 107
pg 10.1d5 participants: 64, 77, 87, 106
The OSDs which are now down/out and have been removed from the crush map and removed the auth are:
62, 64, 68
Of course I have lots of reports of slow OSDs now from OSDs worried about the inactive PGs.
How do I properly kick these PGs to have them drop their usage of the OSDs which no longer exist?
Thanks for you thoughts on this,
peter
| |||||||
| |||||||
| |||||||
| |||||||
| |||||||
|
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx