Re: clearing unfound objects

Two Spirit <twospirit6905@xxxxxxxxx> · Wed, 13 Sep 2017 08:46:35 -0700

You the man. I'm not sure how you figured that out yet. I've got a
little reading to do. Is this considered a bug that the MDS is stuck
and unable to self heal?

On Tue, Sep 12, 2017 at 6:54 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> On Tue, 12 Sep 2017, Two Spirit wrote:
>> I attached the complete output with the previous email.
>>
>> ...
>>     "objects": [
>>         {
>>             "oid": {
>>                 "oid": "200.0000052d",
>
> This is an MDS journal object.. so the MDS is stuck replaying its journal
> because it is unfound.
>
> In this case I would do 'revert'.
>
> sage
>
>
>>                 "key": "",
>>                 "snapid": -2,
>>                 "hash": 2728386690,
>>                 "max": 0,
>>                 "pool": 6,
>>                 "namespace": ""
>>             },
>>             "need": "1496'15853",
>>             "have": "0'0",
>>             "flags": "none",
>>             "locations": []
>>         }
>>
>>
>> So it goes Filename -> OID -> PG -> OSD? So if I trace down
>> "200.0000052d" I should be able to clear the problem? I seem to get
>> files in the lost+found directory think from fsck. Does the deep
>> scrubbing eventually clear these after a week or will they always
>> require manual intervention?
>>
>> On Tue, Sep 12, 2017 at 3:48 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
>> > On Tue, 12 Sep 2017, Two Spirit wrote:
>> >> >On Tue, 12 Sep 2017, Two Spirit wrote:
>> >> >> I don't have any OSDs that are down, so the 1 unfound object I think
>> >> >> needs to be manually cleared. I ran across a webpage a while ago that
>> >> >> talked about how to clear it, but if you have a reference, would save
>> >> >> me a little time.
>> >> >
>> >> >http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound
>> >>
>> >> Thanks. That was the page I had read earlier.
>> >>
>> >> I've attached the full outputs to this mail and show just clips below.
>> >>
>> >> # ceph health detail
>> >> OBJECT_UNFOUND 1/731529 unfound (0.000%)
>> >>     pg 6.2 has 1 unfound objects
>> >>
>> >> There looks like one number that shouldn't be there...
>> >> # ceph pg 6.2 list_missing
>> >> {
>> >>     "offset": {
>> >> ...
>> >>         "pool": -9223372036854775808,
>> >>         "namespace": ""
>> >>     },
>> >> ...
>> >
>> > I think you've snipped out the bit that has the name of the unfound
>> > object?
>> >
>> > sage
>> >
>> >>
>> >> # ceph -s
>> >>     osd: 6 osds: 6 up, 6 in; 10 remapped pgs
>> >>
>> >> This shows under the pg query that something believes that osd "2" is
>> >> down, but all OSDs are up, as seen in the previous ceph -s command.
>> >> # ceph pg 6.2 query
>> >>     "recovery_state": [
>> >>         {
>> >>             "name": "Started/Primary/Active",
>> >>             "enter_time": "2017-09-12 10:33:11.193486",
>> >>             "might_have_unfound": [
>> >>                 {
>> >>                     "osd": "0",
>> >>                     "status": "already probed"
>> >>                 },
>> >>                 {
>> >>                     "osd": "1",
>> >>                     "status": "already probed"
>> >>                 },
>> >>                 {
>> >>                     "osd": "2",
>> >>                     "status": "osd is down"
>> >>                 },
>> >>                 {
>> >>                     "osd": "4",
>> >>                     "status": "already probed"
>> >>                 },
>> >>                 {
>> >>                     "osd": "5",
>> >>                     "status": "already probed"
>> >>                 }
>> >>
>> >>
>> >> If i go to a couple other OSDs, and run the same command,
>> >> the osd "2" is listed as "already probed". They are not in sync. I
>> >> double checked that all the OSDs were up on all 3 times I ran the
>> >> command.
>> >>
>> >> Now. my question to debug this to figure out if I want to
>> >> "revert|delete", is what in the heck are these file(s)/object(s)
>> >> associated with the pg? I assume this might be in the MDS, but I'd
>> >> like to see a file name associated with this to make a further
>> >> determination of what I should do.  I don't have enough information at
>> >> this point to figure out how I should recover.
>> >>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html