Re: clearing unfound objects

Two Spirit <twospirit6905@xxxxxxxxx> · Tue, 12 Sep 2017 17:07:54 -0700

I attached the complete output with the previous email.

...
    "objects": [
        {
            "oid": {
                "oid": "200.0000052d",
                "key": "",
                "snapid": -2,
                "hash": 2728386690,
                "max": 0,
                "pool": 6,
                "namespace": ""
            },
            "need": "1496'15853",
            "have": "0'0",
            "flags": "none",
            "locations": []
        }

So it goes Filename -> OID -> PG -> OSD? So if I trace down
"200.0000052d" I should be able to clear the problem? I seem to get
files in the lost+found directory think from fsck. Does the deep
scrubbing eventually clear these after a week or will they always
require manual intervention?

On Tue, Sep 12, 2017 at 3:48 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> On Tue, 12 Sep 2017, Two Spirit wrote:
>> >On Tue, 12 Sep 2017, Two Spirit wrote:
>> >> I don't have any OSDs that are down, so the 1 unfound object I think
>> >> needs to be manually cleared. I ran across a webpage a while ago that
>> >> talked about how to clear it, but if you have a reference, would save
>> >> me a little time.
>> >
>> >http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound
>>
>> Thanks. That was the page I had read earlier.
>>
>> I've attached the full outputs to this mail and show just clips below.
>>
>> # ceph health detail
>> OBJECT_UNFOUND 1/731529 unfound (0.000%)
>>     pg 6.2 has 1 unfound objects
>>
>> There looks like one number that shouldn't be there...
>> # ceph pg 6.2 list_missing
>> {
>>     "offset": {
>> ...
>>         "pool": -9223372036854775808,
>>         "namespace": ""
>>     },
>> ...
>
> I think you've snipped out the bit that has the name of the unfound
> object?
>
> sage
>
>>
>> # ceph -s
>>     osd: 6 osds: 6 up, 6 in; 10 remapped pgs
>>
>> This shows under the pg query that something believes that osd "2" is
>> down, but all OSDs are up, as seen in the previous ceph -s command.
>> # ceph pg 6.2 query
>>     "recovery_state": [
>>         {
>>             "name": "Started/Primary/Active",
>>             "enter_time": "2017-09-12 10:33:11.193486",
>>             "might_have_unfound": [
>>                 {
>>                     "osd": "0",
>>                     "status": "already probed"
>>                 },
>>                 {
>>                     "osd": "1",
>>                     "status": "already probed"
>>                 },
>>                 {
>>                     "osd": "2",
>>                     "status": "osd is down"
>>                 },
>>                 {
>>                     "osd": "4",
>>                     "status": "already probed"
>>                 },
>>                 {
>>                     "osd": "5",
>>                     "status": "already probed"
>>                 }
>>
>>
>> If i go to a couple other OSDs, and run the same command,
>> the osd "2" is listed as "already probed". They are not in sync. I
>> double checked that all the OSDs were up on all 3 times I ran the
>> command.
>>
>> Now. my question to debug this to figure out if I want to
>> "revert|delete", is what in the heck are these file(s)/object(s)
>> associated with the pg? I assume this might be in the MDS, but I'd
>> like to see a file name associated with this to make a further
>> determination of what I should do.  I don't have enough information at
>> this point to figure out how I should recover.
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html