Hi all,
I have a small 2-node cluster with 40 OSDs, using erasure coding 4+1
I lost osd38, and now I have 39 incomplete PGs.
---
PG_AVAILABILITY Reduced data availability: 39 pgs inactive, 39 pgs incomplete
pg 22.2 is incomplete, acting [19,33,10,8,29] (reducing pool media min_size from 5 may help; search ceph.com/docs for 'incomplete')
pg 22.f is incomplete, acting [17,9,23,14,15] (reducing pool media min_size from 5 may help; search ceph.com/docs for 'incomplete')
pg 22.12 is incomplete, acting [7,33,10,31,29] (reducing pool media min_size from 5 may help; search ceph.com/docs for 'incomplete')
pg 22.13 is incomplete, acting [23,0,15,33,13] (reducing pool media min_size from 5 may help; search ceph.com/docs for 'incomplete')
pg 22.23 is incomplete, acting [29,17,18,15,12] (reducing pool media min_size from 5 may help; search ceph.com/docs for 'incomplete')
<snip>
---
My EC profile is below:
---
root@prod1:~# ceph osd erasure-code-profile get ec-41-profile
crush-device-class=
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=1
plugin=jerasure
technique=reed_sol_van
w=8
---
When I query one of the incomplete PGs, I see this:
---
"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2018-12-11 20:46:11.645796",
"comment": "not enough complete instances of this PG"
},
---
And this:
---
"probing_osds": [
"0(4)",
"7(2)",
"9(1)",
"11(4)",
"22(3)",
"29(2)",
"36(0)"
],
"down_osds_we_would_probe": [
38
],
"peering_blocked_by": []
},
---
I have set this in /etc/ceph/ceph.conf to no effect:
osd_find_best_info_ignore_history_les = true
As a result of the incomplete PGs, I/O is currently frozen to at last part of my cephfs.
I expected to be able to tolerate the loss of an OSD without issue, is there anything I can do to restore these incomplete PGs?
I have a small 2-node cluster with 40 OSDs, using erasure coding 4+1
I lost osd38, and now I have 39 incomplete PGs.
---
PG_AVAILABILITY Reduced data availability: 39 pgs inactive, 39 pgs incomplete
pg 22.2 is incomplete, acting [19,33,10,8,29] (reducing pool media min_size from 5 may help; search ceph.com/docs for 'incomplete')
pg 22.f is incomplete, acting [17,9,23,14,15] (reducing pool media min_size from 5 may help; search ceph.com/docs for 'incomplete')
pg 22.12 is incomplete, acting [7,33,10,31,29] (reducing pool media min_size from 5 may help; search ceph.com/docs for 'incomplete')
pg 22.13 is incomplete, acting [23,0,15,33,13] (reducing pool media min_size from 5 may help; search ceph.com/docs for 'incomplete')
pg 22.23 is incomplete, acting [29,17,18,15,12] (reducing pool media min_size from 5 may help; search ceph.com/docs for 'incomplete')
<snip>
---
My EC profile is below:
---
root@prod1:~# ceph osd erasure-code-profile get ec-41-profile
crush-device-class=
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=1
plugin=jerasure
technique=reed_sol_van
w=8
---
When I query one of the incomplete PGs, I see this:
---
"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2018-12-11 20:46:11.645796",
"comment": "not enough complete instances of this PG"
},
---
And this:
---
"probing_osds": [
"0(4)",
"7(2)",
"9(1)",
"11(4)",
"22(3)",
"29(2)",
"36(0)"
],
"down_osds_we_would_probe": [
38
],
"peering_blocked_by": []
},
---
I have set this in /etc/ceph/ceph.conf to no effect:
osd_find_best_info_ignore_history_les = true
As a result of the incomplete PGs, I/O is currently frozen to at last part of my cephfs.
I expected to be able to tolerate the loss of an OSD without issue, is there anything I can do to restore these incomplete PGs?
When I bring back a new osd38, I see:
---
"probing_osds": [
"4(2)",
"11(3)",
"22(1)",
"24(1)",
"26(2)",
"36(4)",
"38(1)",
"39(0)"
],
"down_osds_we_would_probe": [],
"peering_blocked_by": []
},
{
"name": "Started",
"enter_time": "2018-12-11 21:06:35.307379"
}
---
But my recovery state is still:
---
"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2018-12-11 21:06:35.320292",
"comment": "not enough complete instances of this PG"
},
---
Any ideas?
Thanks!
D
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com