Re: Inconsistent PG with 1 inconsistent object not referenced in the log

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I replaced the object with rados as suggested, and right after forced a
deep scrub which got us back to HEALTH_OK

However, now we are on another inconsistent PG status. For the same rbd
image, but a different object. The object that was also mentioned in the
previous inconsistent PG. But, now its worse because we have a
data_digest mismatch. I wondered if this tells anything about the
previous substitution, or I should just go the same path replacing this
object with rados.


pg 0.186 is active+clean+inconsistent, acting [36,26,44]

rados list-inconsistent-obj 0.186
{
    "epoch": 30586,
    "inconsistents": [
        {
            "object": {
                "name": "rbd_data.15cec2ae8944a.000000000004db0e",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 5493833
            },
            "errors": [
                "object_info_inconsistency",
                "data_digest_mismatch",
                "attr_value_mismatch"
            ],
            "union_shard_errors": [
                "data_digest_mismatch_oi"
            ],
            "selected_object_info":
"0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(30587'5493833
client.1246390.0:1 dirty|data_digest|omap_digest s 4194304 uv 5493833 dd
264b7d0d od ffffffff alloc_hint [0 0])",
            "shards": [
                {
                    "osd": 26,
                    "errors": [
                        "data_digest_mismatch_oi"
                    ],
                    "size": 4194304,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x7dd0d0bd",
                    "object_info":
"0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
7dd0d0bd od ffffffff alloc_hint [4194304 4194304])",
                    "attrs": [
                        {
                            "name": "_",
                            "value":
"EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAwNGRiMGX+\/\/\/\/\/\/\/\/\/4Zx7B4AAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABREFQAAAAAAEl3AADi5FMAAAAAAEl3AAACAhUAAAAEJAAAAAAAAABtf4IAAAAAAAAAAAAAAEAAAAAAAPpfSVvV\/VMKAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4uRTAAAAAAAAAAAAAAAAAAA0AAAA+l9JW4x6Rw290NB9\/\/\/\/\/wAAQAAAAAAAAABAAAAAAAAAAAAA",
                            "Base64": true
                        },
                        {
                            "name": "snapset",
                            "value":
"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==",
                            "Base64": true
                        }
                    ]
                },
                {
                    "osd": 36,
                    "errors": [
                        "data_digest_mismatch_oi"
                    ],
                    "size": 4194304,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x7dd0d0bd",
                    "object_info":
"0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
7dd0d0bd od ffffffff alloc_hint [4194304 4194304])",
                    "attrs": [
                        {
                            "name": "_",
                            "value":
"EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAwNGRiMGX+\/\/\/\/\/\/\/\/\/4Zx7B4AAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABREFQAAAAAAEl3AADi5FMAAAAAAEl3AAACAhUAAAAEJAAAAAAAAABtf4IAAAAAAAAAAAAAAEAAAAAAAPpfSVvV\/VMKAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4uRTAAAAAAAAAAAAAAAAAAA0AAAA+l9JW4x6Rw290NB9\/\/\/\/\/wAAQAAAAAAAAABAAAAAAAAAAAAA",
                            "Base64": true
                        },
                    ]
                },
                {
                    "osd": 44,
                    "errors": [],
                    "size": 4194304,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x264b7d0d",
                    "object_info":
"0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(30587'5493833
client.1246390.0:1 dirty|data_digest|omap_digest s 4194304 uv 5493833 dd
264b7d0d od ffffffff alloc_hint [0 0])",
                    "attrs": [
                        {
                            "name": "_",
                            "value":
"EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABJ1FMAAAAAAHt3AAD0eE4AAAAAALxoAAACAhUAAAAItgQTAAAAAAABAAAAAAAAAAAAAAAAAEAAAAAAAIbaUVtSy\/8jAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAASdRTAAAAAAAAAAAAAAAAAAA0AAAAhtpRW\/VN8CQNfUsm\/\/\/\/\/wAAAAAAAAAAAAAAAAAAAAAAAAAA",
                            "Base64": true
                        },
                        {
                            "name": "snapset",
                            "value":
"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==",
                            "Base64": true
                        }
                    ]
                }
            ]
        }
    ]
}


On 20/07/18 00:27, Brad Hubbard wrote:
> On Fri, Jul 20, 2018 at 1:05 AM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>
>>
>> On 19/07/18 03:25, Brad Hubbard wrote:
>>> On Wed, Jul 18, 2018 at 6:25 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>>> Ah ok. Then I think it confirms what you are saying. Here it is:
>>>>
>>>> $ rados list-inconsistent-obj 0.190
>>>> {"epoch":30579,"inconsistents":[{"object":{"name":"rbd_data.15cec2ae8944a.000000000015c7d6","nspace":"","locator":"","snap":"head","version":5498082},"errors":["object_info_inconsistency","attr_value_mismatch"],"union_shard_errors":[],"selected_object_info":"0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>>> 7dd0d0bd od ffffffff alloc_hint [4194304
>>>> 4194304])","shards":[{"osd":16,"errors":[],"size":4194304,"object_info":"0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(26812'5142772
>>>> client.1044166.0:393154060 dirty|data_digest|omap_digest s 4194304 uv
>>>> 5142772 dd 264b7d0d od ffffffff alloc_hint [0
>>>> 0])","attrs":[{"name":"_","value":"DwgMAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAAD0eE4AAAAAALxoAADzeE4AAAAAALxoAAACAhUAAAAIxu4PAAAAAAAMDm8XAAAAAAAAAAAAAEAAAAAAAOJmPVsEa24SAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9HhOAAAAAAAAAAAAAAAAAAA0AAAA4mY9W1Q\/lBwNfUsm\/\/\/\/\/w==","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]},{"osd":37,"errors":[],"size":4194304,"object_info":"0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(26812'5142772
>>>> client.1044166.0:393154060 dirty|data_digest|omap_digest s 4194304 uv
>>>> 5142772 dd 264b7d0d od ffffffff alloc_hint [0
>>>> 0])","attrs":[{"name":"_","value":"DwgMAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAAD0eE4AAAAAALxoAADzeE4AAAAAALxoAAACAhUAAAAIxu4PAAAAAAAMDm8XAAAAAAAAAAAAAEAAAAAAAOJmPVsEa24SAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9HhOAAAAAAAAAAAAAAAAAAA0AAAA4mY9W1Q\/lBwNfUsm\/\/\/\/\/w==","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]},{"osd":44,"errors":[],"size":4194304,"object_info":"0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>>> 7dd0d0bd od ffffffff alloc_hint [4194304
>>>> 4194304])","attrs":[{"name":"_","value":"EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAwNGRiMGX+\/\/\/\/\/\/\/\/\/4Zx7B4AAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABREFQAAAAAAEl3AADi5FMAAAAAAEl3AAACAhUAAAAEJAAAAAAAAABtf4IAAAAAAAAAAAAAAEAAAAAAAPpfSVvV\/VMKAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4uRTAAAAAAAAAAAAAAAAAAA0AAAA+l9JW4x6Rw290NB9\/\/\/\/\/wAAQAAAAAAAAABAAAAAAAAAAAAA","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]}]}]}
>>>>
>>>>
>>>> To determine which is the right version of the object, is there no
>>>> timestamp that can tell us? maybe the object got updated to osd.37 and
>>>> osd.16 while osd.44 was down, and there comes the missmatch? because
>>>> otherwise, shouldn't the authoritative osd be leading?
>>>
>>> The primary will be serving IO requests so the version on osd 37 is
>>> what will be read by clients so I guess going with that is reasonable.
>>>
>>
>> OK good.
>>
>>> The version  on osd 44 was actually modified after the others (epoch
>>> 30537, as opposed to epoch 26812) but the sizes are all the same so
>>> the difference may be trivial (metadat only perhaps) and, according to
>>> the last request id (osd.36.0:8552301) came from another osd (36)
>>> which is kind of unexpected. Is there, or was there, a cache tier
>>> involved?
>>
>> Ah OK, very interesting! No, no cache tier involved. So at one point
>> osd.36 was part of the PG set?
> 
> Maybe, all we know is that the last request came from osd.36 which is
> unusual because changes in this context generally only come from
> clients. A cache tier might explain it which is why I mentioned it.
> 
>>
>>>
>>> If you want to go with the version that is currently being used (37
>>> and 16) you can just quiesce the rbd image clients and do a rados get,
>>> then a rados put of the object. I would suggest taking a backup of the
>>> object from osd 44 using the ceph-objectstore-tool although, as I
>>> said, that version will not be being used so I doubt you will miss it.
>>>
>>
>> Great, will do that. Thanks a lot for help.
> 
> yw.
> 
>>
>>>>
>>>> Regards,
>>>> Ana
>>>>
>>>>
>>>> On 18/07/18 05:24, Brad Hubbard wrote:
>>>>> OK. What I *meant* to ask for was the output of "rados
>>>>> list-inconsistent-obj 0.190" (might still be worth posting that but it
>>>>> should just confirm findings below).
>>>>>
>>>>>
>>>>> The relevant lines from the log are below.
>>>>>
>>>>> 2018-07-16 12:24:45.940910 7fb422340700 2 osd.37 pg_epoch: 30554
>>>>> pg[0.190( v 30554'5390084 (30537'5387075,30554'5390084]
>>>>> local-les=30554 n=4123 ec=1 les/c/f 30554/30554/0 30552/30553/30542)
>>>>> [37,44,16] r=0 lpr=30553 crt=30554'5390079 lcod 30554'5390083 mlcod
>>>>> 30554'5390083 active+clean+scrubbing+deep+inconsistent+repair] 0.190
>>>>> shard 16: soid 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head
>>>>> data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44,
>>>>> data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
>>>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
>>>>> mismatch '_' 2018-07-16 12:24:45.940941 7fb422340700 -1
>>>>> log_channel(cluster) log [ERR] : 0.190 shard 16: soid
>>>>> 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest
>>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest
>>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
>>>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
>>>>> mismatch '_' 2018-07-16 12:24:45.940957 7fb422340700 -1
>>>>> log_channel(cluster) log [ERR] : 0.190 shard 37: soid
>>>>> 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest
>>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest
>>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
>>>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
>>>>> mismatch '_'
>>>>>
>>>>> They show that osd 44 has been chosen as the authoritative shard and
>>>>> and it has a data digest for this object of 0x7dd0d0bd and that the
>>>>> data digest in the authoritative object info is also 0x7dd0d0bd.
>>>>>
>>>>> Shard 16 however, has a data digest of 0x264b7d0d and so does shard 37
>>>>> so the data for this object on osds 16 and 37 is different to that on
>>>>> osd 44.
>>>>>
>>>>> Basically, you'll need to pick which is the "right" copy of the object
>>>>> (I can't tell you) quiesce traffic to/from that object (rbd image) and
>>>>> get/put that object back into the cluster to fix the mismatch. Since
>>>>> this appears to be an rbd image this could potentially result in an
>>>>> image that needs an fsck or equivalent IIUC.
>>>>>
>>>>>
>>>>> On Tue, Jul 17, 2018 at 10:06 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> Hi Brad,
>>>>>>
>>>>>> Here it is:
>>>>>>
>>>>>> {
>>>>>>     "state": "active+clean+inconsistent",
>>>>>>     "snap_trimq": "[]",
>>>>>>     "epoch": 30581,
>>>>>>     "up": [
>>>>>>         37,
>>>>>>         44,
>>>>>>         16
>>>>>>     ],
>>>>>>     "acting": [
>>>>>>         37,
>>>>>>         44,
>>>>>>         16
>>>>>>     ],
>>>>>>     "actingbackfill": [
>>>>>>         "16",
>>>>>>         "37",
>>>>>>         "44"
>>>>>>     ],
>>>>>>     "info": {
>>>>>>         "pgid": "0.190",
>>>>>>         "last_update": "30581'5420535",
>>>>>>         "last_complete": "30581'5420535",
>>>>>>         "log_tail": "30581'5417484",
>>>>>>         "last_user_version": 5420535,
>>>>>>         "last_backfill": "MAX",
>>>>>>         "last_backfill_bitwise": 0,
>>>>>>         "purged_snaps": "[]",
>>>>>>         "history": {
>>>>>>             "epoch_created": 1,
>>>>>>             "last_epoch_started": 30580,
>>>>>>             "last_epoch_clean": 30581,
>>>>>>             "last_epoch_split": 0,
>>>>>>             "last_epoch_marked_full": 0,
>>>>>>             "same_up_since": 30578,
>>>>>>             "same_interval_since": 30579,
>>>>>>             "same_primary_since": 30565,
>>>>>>             "last_scrub": "30554'5390240",
>>>>>>             "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>             "last_deep_scrub": "30554'5390240",
>>>>>>             "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>             "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>>>>>         },
>>>>>>         "stats": {
>>>>>>             "version": "30581'5420535",
>>>>>>             "reported_seq": "5155553",
>>>>>>             "reported_epoch": "30581",
>>>>>>             "state": "active+clean+inconsistent",
>>>>>>             "last_fresh": "2018-07-17 12:02:13.002428",
>>>>>>             "last_change": "2018-07-16 13:37:24.020403",
>>>>>>             "last_active": "2018-07-17 12:02:13.002428",
>>>>>>             "last_peered": "2018-07-17 12:02:13.002428",
>>>>>>             "last_clean": "2018-07-17 12:02:13.002428",
>>>>>>             "last_became_active": "2018-07-16 13:37:13.173821",
>>>>>>             "last_became_peered": "2018-07-16 13:37:13.173821",
>>>>>>             "last_unstale": "2018-07-17 12:02:13.002428",
>>>>>>             "last_undegraded": "2018-07-17 12:02:13.002428",
>>>>>>             "last_fullsized": "2018-07-17 12:02:13.002428",
>>>>>>             "mapping_epoch": 30578,
>>>>>>             "log_start": "30581'5417484",
>>>>>>             "ondisk_log_start": "30581'5417484",
>>>>>>             "created": 1,
>>>>>>             "last_epoch_clean": 30581,
>>>>>>             "parent": "0.0",
>>>>>>             "parent_split_bits": 0,
>>>>>>             "last_scrub": "30554'5390240",
>>>>>>             "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>             "last_deep_scrub": "30554'5390240",
>>>>>>             "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>             "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>>>>>             "log_size": 3051,
>>>>>>             "ondisk_log_size": 3051,
>>>>>>             "stats_invalid": false,
>>>>>>             "dirty_stats_invalid": false,
>>>>>>             "omap_stats_invalid": false,
>>>>>>             "hitset_stats_invalid": false,
>>>>>>             "hitset_bytes_stats_invalid": false,
>>>>>>             "pin_stats_invalid": true,
>>>>>>             "stat_sum": {
>>>>>>                 "num_bytes": 16946139153,
>>>>>>                 "num_objects": 4148,
>>>>>>                 "num_object_clones": 0,
>>>>>>                 "num_object_copies": 12444,
>>>>>>                 "num_objects_missing_on_primary": 0,
>>>>>>                 "num_objects_missing": 0,
>>>>>>                 "num_objects_degraded": 0,
>>>>>>                 "num_objects_misplaced": 0,
>>>>>>                 "num_objects_unfound": 0,
>>>>>>                 "num_objects_dirty": 4148,
>>>>>>                 "num_whiteouts": 0,
>>>>>>                 "num_read": 6895104,
>>>>>>                 "num_read_kb": 292185552,
>>>>>>                 "num_write": 10032749,
>>>>>>                 "num_write_kb": 185167701,
>>>>>>                 "num_scrub_errors": 1,
>>>>>>                 "num_shallow_scrub_errors": 1,
>>>>>>                 "num_deep_scrub_errors": 0,
>>>>>>                 "num_objects_recovered": 103598,
>>>>>>                 "num_bytes_recovered": 424107954567,
>>>>>>                 "num_keys_recovered": 110,
>>>>>>                 "num_objects_omap": 1,
>>>>>>                 "num_objects_hit_set_archive": 0,
>>>>>>                 "num_bytes_hit_set_archive": 0,
>>>>>>                 "num_flush": 0,
>>>>>>                 "num_flush_kb": 0,
>>>>>>                 "num_evict": 0,
>>>>>>                 "num_evict_kb": 0,
>>>>>>                 "num_promote": 0,
>>>>>>                 "num_flush_mode_high": 0,
>>>>>>                 "num_flush_mode_low": 0,
>>>>>>                 "num_evict_mode_some": 0,
>>>>>>                 "num_evict_mode_full": 0,
>>>>>>                 "num_objects_pinned": 0
>>>>>>             },
>>>>>>             "up": [
>>>>>>                 37,
>>>>>>                 44,
>>>>>>                 16
>>>>>>             ],
>>>>>>             "acting": [
>>>>>>                 37,
>>>>>>                 44,
>>>>>>                 16
>>>>>>             ],
>>>>>>             "blocked_by": [],
>>>>>>             "up_primary": 37,
>>>>>>             "acting_primary": 37
>>>>>>         },
>>>>>>         "empty": 0,
>>>>>>         "dne": 0,
>>>>>>         "incomplete": 0,
>>>>>>         "last_epoch_started": 30580,
>>>>>>         "hit_set_history": {
>>>>>>             "current_last_update": "0'0",
>>>>>>             "history": []
>>>>>>         }
>>>>>>     },
>>>>>>     "peer_info": [
>>>>>>         {
>>>>>>             "peer": "16",
>>>>>>             "pgid": "0.190",
>>>>>>             "last_update": "30581'5420535",
>>>>>>             "last_complete": "30581'5420535",
>>>>>>             "log_tail": "30537'5387475",
>>>>>>             "last_user_version": 5390577,
>>>>>>             "last_backfill": "MAX",
>>>>>>             "last_backfill_bitwise": 1,
>>>>>>             "purged_snaps": "[]",
>>>>>>             "history": {
>>>>>>                 "epoch_created": 1,
>>>>>>                 "last_epoch_started": 30580,
>>>>>>                 "last_epoch_clean": 30581,
>>>>>>                 "last_epoch_split": 0,
>>>>>>                 "last_epoch_marked_full": 0,
>>>>>>                 "same_up_since": 30578,
>>>>>>                 "same_interval_since": 30579,
>>>>>>                 "same_primary_since": 30565,
>>>>>>                 "last_scrub": "30554'5390240",
>>>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>                 "last_deep_scrub": "30554'5390240",
>>>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>>>>>             },
>>>>>>             "stats": {
>>>>>>                 "version": "30570'5390575",
>>>>>>                 "reported_seq": "5139870",
>>>>>>                 "reported_epoch": "30576",
>>>>>>                 "state": "active+undersized+degraded+inconsistent",
>>>>>>                 "last_fresh": "2018-07-16 13:36:40.284756",
>>>>>>                 "last_change": "2018-07-16 13:36:40.284277",
>>>>>>                 "last_active": "2018-07-16 13:36:40.284756",
>>>>>>                 "last_peered": "2018-07-16 13:36:40.284756",
>>>>>>                 "last_clean": "2018-07-16 13:36:23.558224",
>>>>>>                 "last_became_active": "2018-07-16 13:36:40.284277",
>>>>>>                 "last_became_peered": "2018-07-16 13:36:40.284277",
>>>>>>                 "last_unstale": "2018-07-16 13:36:40.284756",
>>>>>>                 "last_undegraded": "2018-07-16 13:36:40.203248",
>>>>>>                 "last_fullsized": "2018-07-16 13:36:40.203248",
>>>>>>                 "mapping_epoch": 30578,
>>>>>>                 "log_start": "30537'5387475",
>>>>>>                 "ondisk_log_start": "30537'5387475",
>>>>>>                 "created": 1,
>>>>>>                 "last_epoch_clean": 30576,
>>>>>>                 "parent": "0.0",
>>>>>>                 "parent_split_bits": 0,
>>>>>>                 "last_scrub": "30554'5390240",
>>>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>                 "last_deep_scrub": "30554'5390240",
>>>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>>>>>                 "log_size": 3100,
>>>>>>                 "ondisk_log_size": 3100,
>>>>>>                 "stats_invalid": false,
>>>>>>                 "dirty_stats_invalid": false,
>>>>>>                 "omap_stats_invalid": false,
>>>>>>                 "hitset_stats_invalid": false,
>>>>>>                 "hitset_bytes_stats_invalid": false,
>>>>>>                 "pin_stats_invalid": true,
>>>>>>                 "stat_sum": {
>>>>>>                     "num_bytes": 16841281553,
>>>>>>                     "num_objects": 4123,
>>>>>>                     "num_object_clones": 0,
>>>>>>                     "num_object_copies": 12369,
>>>>>>                     "num_objects_missing_on_primary": 0,
>>>>>>                     "num_objects_missing": 0,
>>>>>>                     "num_objects_degraded": 4123,
>>>>>>                     "num_objects_misplaced": 0,
>>>>>>                     "num_objects_unfound": 0,
>>>>>>                     "num_objects_dirty": 4123,
>>>>>>                     "num_whiteouts": 0,
>>>>>>                     "num_read": 6870027,
>>>>>>                     "num_read_kb": 291425720,
>>>>>>                     "num_write": 9972836,
>>>>>>                     "num_write_kb": 184701865,
>>>>>>                     "num_scrub_errors": 1,
>>>>>>                     "num_shallow_scrub_errors": 1,
>>>>>>                     "num_deep_scrub_errors": 0,
>>>>>>                     "num_objects_recovered": 103596,
>>>>>>                     "num_bytes_recovered": 424099565959,
>>>>>>                     "num_keys_recovered": 110,
>>>>>>                     "num_objects_omap": 1,
>>>>>>                     "num_objects_hit_set_archive": 0,
>>>>>>                     "num_bytes_hit_set_archive": 0,
>>>>>>                     "num_flush": 0,
>>>>>>                     "num_flush_kb": 0,
>>>>>>                     "num_evict": 0,
>>>>>>                     "num_evict_kb": 0,
>>>>>>                     "num_promote": 0,
>>>>>>                     "num_flush_mode_high": 0,
>>>>>>                     "num_flush_mode_low": 0,
>>>>>>                     "num_evict_mode_some": 0,
>>>>>>                     "num_evict_mode_full": 0,
>>>>>>                     "num_objects_pinned": 0
>>>>>>                 },
>>>>>>                 "up": [
>>>>>>                     37,
>>>>>>                     44,
>>>>>>                     16
>>>>>>                 ],
>>>>>>                 "acting": [
>>>>>>                     37,
>>>>>>                     44,
>>>>>>                     16
>>>>>>                 ],
>>>>>>                 "blocked_by": [],
>>>>>>                 "up_primary": 37,
>>>>>>                 "acting_primary": 37
>>>>>>             },
>>>>>>             "empty": 0,
>>>>>>             "dne": 0,
>>>>>>             "incomplete": 0,
>>>>>>             "last_epoch_started": 30580,
>>>>>>             "hit_set_history": {
>>>>>>                 "current_last_update": "0'0",
>>>>>>                 "history": []
>>>>>>             }
>>>>>>         },
>>>>>>         {
>>>>>>             "peer": "44",
>>>>>>             "pgid": "0.190",
>>>>>>             "last_update": "30581'5420535",
>>>>>>             "last_complete": "30570'5390575",
>>>>>>             "log_tail": "30537'5387475",
>>>>>>             "last_user_version": 5390575,
>>>>>>             "last_backfill": "MAX",
>>>>>>             "last_backfill_bitwise": 1,
>>>>>>             "purged_snaps": "[]",
>>>>>>             "history": {
>>>>>>                 "epoch_created": 1,
>>>>>>                 "last_epoch_started": 30580,
>>>>>>                 "last_epoch_clean": 30581,
>>>>>>                 "last_epoch_split": 0,
>>>>>>                 "last_epoch_marked_full": 0,
>>>>>>                 "same_up_since": 30578,
>>>>>>                 "same_interval_since": 30579,
>>>>>>                 "same_primary_since": 30565,
>>>>>>                 "last_scrub": "30554'5390240",
>>>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>                 "last_deep_scrub": "30554'5390240",
>>>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>>>>>             },
>>>>>>             "stats": {
>>>>>>                 "version": "30568'5390574",
>>>>>>                 "reported_seq": "5139846",
>>>>>>                 "reported_epoch": "30570",
>>>>>>                 "state": "active+undersized+degraded+inconsistent",
>>>>>>                 "last_fresh": "2018-07-16 13:36:07.003551",
>>>>>>                 "last_change": "2018-07-16 13:36:07.002580",
>>>>>>                 "last_active": "2018-07-16 13:36:07.003551",
>>>>>>                 "last_peered": "2018-07-16 13:36:07.003551",
>>>>>>                 "last_clean": "2018-07-16 13:35:50.922619",
>>>>>>                 "last_became_active": "2018-07-16 13:36:07.002580",
>>>>>>                 "last_became_peered": "2018-07-16 13:36:07.002580",
>>>>>>                 "last_unstale": "2018-07-16 13:36:07.003551",
>>>>>>                 "last_undegraded": "2018-07-16 13:36:05.922413",
>>>>>>                 "last_fullsized": "2018-07-16 13:36:05.922413",
>>>>>>                 "mapping_epoch": 30578,
>>>>>>                 "log_start": "30537'5387475",
>>>>>>                 "ondisk_log_start": "30537'5387475",
>>>>>>                 "created": 1,
>>>>>>                 "last_epoch_clean": 30570,
>>>>>>                 "parent": "0.0",
>>>>>>                 "parent_split_bits": 0,
>>>>>>                 "last_scrub": "30554'5390240",
>>>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>                 "last_deep_scrub": "30554'5390240",
>>>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>>>>>                 "log_size": 3099,
>>>>>>                 "ondisk_log_size": 3099,
>>>>>>                 "stats_invalid": false,
>>>>>>                 "dirty_stats_invalid": false,
>>>>>>                 "omap_stats_invalid": false,
>>>>>>                 "hitset_stats_invalid": false,
>>>>>>                 "hitset_bytes_stats_invalid": false,
>>>>>>                 "pin_stats_invalid": true,
>>>>>>                 "stat_sum": {
>>>>>>                     "num_bytes": 16841281553,
>>>>>>                     "num_objects": 4123,
>>>>>>                     "num_object_clones": 0,
>>>>>>                     "num_object_copies": 12369,
>>>>>>                     "num_objects_missing_on_primary": 0,
>>>>>>                     "num_objects_missing": 0,
>>>>>>                     "num_objects_degraded": 4123,
>>>>>>                     "num_objects_misplaced": 0,
>>>>>>                     "num_objects_unfound": 0,
>>>>>>                     "num_objects_dirty": 4123,
>>>>>>                     "num_whiteouts": 0,
>>>>>>                     "num_read": 6870027,
>>>>>>                     "num_read_kb": 291425720,
>>>>>>                     "num_write": 9972832,
>>>>>>                     "num_write_kb": 184701853,
>>>>>>                     "num_scrub_errors": 1,
>>>>>>                     "num_shallow_scrub_errors": 1,
>>>>>>                     "num_deep_scrub_errors": 0,
>>>>>>                     "num_objects_recovered": 103594,
>>>>>>                     "num_bytes_recovered": 424091177351,
>>>>>>                     "num_keys_recovered": 110,
>>>>>>                     "num_objects_omap": 1,
>>>>>>                     "num_objects_hit_set_archive": 0,
>>>>>>                     "num_bytes_hit_set_archive": 0,
>>>>>>                     "num_flush": 0,
>>>>>>                     "num_flush_kb": 0,
>>>>>>                     "num_evict": 0,
>>>>>>                     "num_evict_kb": 0,
>>>>>>                     "num_promote": 0,
>>>>>>                     "num_flush_mode_high": 0,
>>>>>>                     "num_flush_mode_low": 0,
>>>>>>                     "num_evict_mode_some": 0,
>>>>>>                     "num_evict_mode_full": 0,
>>>>>>                     "num_objects_pinned": 0
>>>>>>                 },
>>>>>>                 "up": [
>>>>>>                     37,
>>>>>>                     44,
>>>>>>                     16
>>>>>>                 ],
>>>>>>                 "acting": [
>>>>>>                     37,
>>>>>>                     44,
>>>>>>                     16
>>>>>>                 ],
>>>>>>                 "blocked_by": [],
>>>>>>                 "up_primary": 37,
>>>>>>                 "acting_primary": 37
>>>>>>             },
>>>>>>             "empty": 0,
>>>>>>             "dne": 0,
>>>>>>             "incomplete": 0,
>>>>>>             "last_epoch_started": 30580,
>>>>>>             "hit_set_history": {
>>>>>>                 "current_last_update": "0'0",
>>>>>>                 "history": []
>>>>>>             }
>>>>>>         }
>>>>>>     ],
>>>>>>     "recovery_state": [
>>>>>>         {
>>>>>>             "name": "Started\/Primary\/Active",
>>>>>>             "enter_time": "2018-07-16 13:37:13.050211",
>>>>>>             "might_have_unfound": [
>>>>>>                 {
>>>>>>                     "osd": "16",
>>>>>>                     "status": "already probed"
>>>>>>                 },
>>>>>>                 {
>>>>>>                     "osd": "44",
>>>>>>                     "status": "already probed"
>>>>>>                 }
>>>>>>             ],
>>>>>>             "recovery_progress": {
>>>>>>                 "backfill_targets": [],
>>>>>>                 "waiting_on_backfill": [],
>>>>>>                 "last_backfill_started": "MIN",
>>>>>>                 "backfill_info": {
>>>>>>                     "begin": "MIN",
>>>>>>                     "end": "MIN",
>>>>>>                     "objects": []
>>>>>>                 },
>>>>>>                 "peer_backfill_info": [],
>>>>>>                 "backfills_in_flight": [],
>>>>>>                 "recovering": [],
>>>>>>                 "pg_backend": {
>>>>>>                     "pull_from_peer": [],
>>>>>>                     "pushing": []
>>>>>>                 }
>>>>>>             },
>>>>>>             "scrub": {
>>>>>>                 "scrubber.epoch_start": "0",
>>>>>>                 "scrubber.active": 0,
>>>>>>                 "scrubber.state": "INACTIVE",
>>>>>>                 "scrubber.start": "MIN",
>>>>>>                 "scrubber.end": "MIN",
>>>>>>                 "scrubber.subset_last_update": "0'0",
>>>>>>                 "scrubber.deep": false,
>>>>>>                 "scrubber.seed": 0,
>>>>>>                 "scrubber.waiting_on": 0,
>>>>>>                 "scrubber.waiting_on_whom": []
>>>>>>             }
>>>>>>         },
>>>>>>         {
>>>>>>             "name": "Started",
>>>>>>             "enter_time": "2018-07-16 13:37:11.980264"
>>>>>>         }
>>>>>>     ],
>>>>>>     "agent_state": {}
>>>>>> }
>>>>>>
>>>>>>
>>>>>> On 17/07/18 02:19, Brad Hubbard wrote:
>>>>>>> Can we see a pg query of 0.190 ?
>>>>>>>
>>>>>>> On Tue, Jul 17, 2018 at 1:05 AM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> We have a cluster that was running hammer (0.94.10). We hit a bug where
>>>>>>>> right after seemingly fixing an inconsistent PG, the primary OSD would
>>>>>>>> crash and restart. Next deep-scrub will again return inconsistent PG.
>>>>>>>>
>>>>>>>> We filled in a bug issue
>>>>>>>> https://tracker.ceph.com/issues/24652#change-115654 that was closed
>>>>>>>> since it was a known bug fixed in newer versions of Ceph.
>>>>>>>>
>>>>>>>> Now the cluster is running jewel (10.2.11). There is again one
>>>>>>>> inconsistent PG with 1 error which not able to fix and with no
>>>>>>>> reference to the inconsistent object.
>>>>>>>>
>>>>>>>>
>>>>>>>> scrub 0 missing, 1 inconsistent objects
>>>>>>>> scrub 1 errors
>>>>>>>>
>>>>>>>>
>>>>>>>> We have the logs with debug level 20 while repairing the PG. The one for
>>>>>>>> the primary OSD is: 94e20123-fcda-49d7-98a2-919507dfbc92
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Kind regards,
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ana Avilés
>>>>>>>> Greenhost - sustainable hosting & digital security
>>>>>>>> E: ana@xxxxxxxxxxxx
>>>>>>>> T: +31 20 4890444
>>>>>>>> W: https://greenhost.nl
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ana Avilés
>>>>>> Greenhost - sustainable hosting & digital security
>>>>>> E: ana@xxxxxxxxxxxx
>>>>>> T: +31 20 4890444
>>>>>> W: https://greenhost.nl
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Ana Avilés
>>>> Greenhost - sustainable hosting & digital security
>>>> E: ana@xxxxxxxxxxxx
>>>> T: +31 20 4890444
>>>> W: https://greenhost.nl
>>>
>>>
>>>
>>
>> --
>> Ana Avilés
>> Greenhost - sustainable hosting & digital security
>> E: ana@xxxxxxxxxxxx
>> T: +31 20 4890444
>> W: https://greenhost.nl
> 
> 
> 

-- 
Ana Avilés
Greenhost - sustainable hosting & digital security
E: ana@xxxxxxxxxxxx
T: +31 20 4890444
W: https://greenhost.nl
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux