I don't want to "rescue" any OSDs. I want to clean the incomplete PGs to make CEPH proceed with PG re-creation and making those groups active again.
In my case which OSDs should I start with the "osd_find_best_info_ignore_history_les" option?
This is the part of query output from one of the groups to be cleared:
"probing_osds": [ "54(1)", "81(2)", "103(0)", "103(1)", "118(9)", "126(3)", "129(4)", "141(1)", "142(2)", "147(7)", "150(1)", "153(8)", "159(0)","165(6)", "168(5)", "171(0)","174(3)","177(9)","180(5)","262(2)","291(5)","313(1)","314(8)","315(7)","316(0)","318(6)"],
"down_osds_we_would_probe": [4,88,91,94,112,133]
"down_osds_we_would_probe": [4,88,91,94,112,133]
Maks
wt., 28 sie 2018 o 15:20 Paul Emmerich <paul.emmerich@xxxxxxxx> napisał(a):
I don't think it's documented.
It won't affect PGs that are active+clean.
Takes effect during peering, easiest to set it in ceph.conf and
restart the OSDs on *all* OSDs that you want to rescue.
Important to not forget to unset it afterwards
Paul
2018-08-28 13:21 GMT+02:00 Maks Kowalik <maks_kowalik@xxxxxxxxx>:
> Thank you for answering.
> Where is this option documented?
> Do I set it in the config file, or using "tell osd.number" or admin-daemon?
> Do I set it on the primary OSD of the up set, on all OSDs of the up set, or
> maybe on all historical peers holding the shards of a particular group?
> Is this option dangerous to other groups on those OSDs (currently an OSD
> holds about 160 pgs)?
>
> Maks
>
> wt., 28 sie 2018 o 12:12 Paul Emmerich <paul.emmerich@xxxxxxxx> napisał(a):
>>
>> No need to delete it, that situation should be mostly salvagable by
>> setting osd_find_best_info_ignore_history_les temporarily on the
>> affected OSDs.
>> That should cause you to "just" lose some writes resulting in inconsistent
>> data.
>>
>>
>> Paul
>>
>> 2018-08-28 11:08 GMT+02:00 Maks Kowalik <maks_kowalik@xxxxxxxxx>:
>> > What is the correct procedure for re-creating an incomplete placement
>> > group
>> > that belongs to an erasure coded pool?
>> > I'm facing a situation when too many shards of 3 PGs were lost during
>> > OSD
>> > crashes, and taking the data loss was decided, but can't force ceph to
>> > recreate those PGs. The query output shows:
>> > "peering_blocked_by_detail": [
>> > {"detail": "peering_blocked_by_history_les_bound"}
>> > What was tried:
>> > 1. manual deletion of all shards appearing in "peers" secion of PG query
>> > output
>> > 2. marking all shards as complete using ceph-objectstore-tool
>> > 3. deleting peering history from OSDs keeping the shards
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Paul Emmerich
>>
>> Looking for help with your Ceph cluster? Contact us at https://croit.io
>>
>> croit GmbH
>> Freseniusstr. 31h
>> 81247 München
>> www.croit.io
>> Tel: +49 89 1896585 90
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com