Re: Inconsistent PG with 1 inconsistent object not referenced in the log

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 24/07/18 04:12, Brad Hubbard wrote:
> Is there anything unusual or different about osd 44? Is there anything
> different about its history?
> 

Nothing that we are aware of. Debug level of logs is also very low, so
not much we can see.

> It seems 44 rarely agrees with others. Yes, the same procedure should
> fix it with the same caveats.

We fixed it again. Thanks a lot. I guess if it happens again we will
just put OSD 44 out.

> 
> On Mon, Jul 23, 2018 at 11:48 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>> I replaced the object with rados as suggested, and right after forced a
>> deep scrub which got us back to HEALTH_OK
>>
>> However, now we are on another inconsistent PG status. For the same rbd
>> image, but a different object. The object that was also mentioned in the
>> previous inconsistent PG. But, now its worse because we have a
>> data_digest mismatch. I wondered if this tells anything about the
>> previous substitution, or I should just go the same path replacing this
>> object with rados.
>>
>>
>> pg 0.186 is active+clean+inconsistent, acting [36,26,44]
>>
>> rados list-inconsistent-obj 0.186
>> {
>>     "epoch": 30586,
>>     "inconsistents": [
>>         {
>>             "object": {
>>                 "name": "rbd_data.15cec2ae8944a.000000000004db0e",
>>                 "nspace": "",
>>                 "locator": "",
>>                 "snap": "head",
>>                 "version": 5493833
>>             },
>>             "errors": [
>>                 "object_info_inconsistency",
>>                 "data_digest_mismatch",
>>                 "attr_value_mismatch"
>>             ],
>>             "union_shard_errors": [
>>                 "data_digest_mismatch_oi"
>>             ],
>>             "selected_object_info":
>> "0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(30587'5493833
>> client.1246390.0:1 dirty|data_digest|omap_digest s 4194304 uv 5493833 dd
>> 264b7d0d od ffffffff alloc_hint [0 0])",
>>             "shards": [
>>                 {
>>                     "osd": 26,
>>                     "errors": [
>>                         "data_digest_mismatch_oi"
>>                     ],
>>                     "size": 4194304,
>>                     "omap_digest": "0xffffffff",
>>                     "data_digest": "0x7dd0d0bd",
>>                     "object_info":
>> "0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304])",
>>                     "attrs": [
>>                         {
>>                             "name": "_",
>>                             "value":
>> "EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAwNGRiMGX+\/\/\/\/\/\/\/\/\/4Zx7B4AAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABREFQAAAAAAEl3AADi5FMAAAAAAEl3AAACAhUAAAAEJAAAAAAAAABtf4IAAAAAAAAAAAAAAEAAAAAAAPpfSVvV\/VMKAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4uRTAAAAAAAAAAAAAAAAAAA0AAAA+l9JW4x6Rw290NB9\/\/\/\/\/wAAQAAAAAAAAABAAAAAAAAAAAAA",
>>                             "Base64": true
>>                         },
>>                         {
>>                             "name": "snapset",
>>                             "value":
>> "AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==",
>>                             "Base64": true
>>                         }
>>                     ]
>>                 },
>>                 {
>>                     "osd": 36,
>>                     "errors": [
>>                         "data_digest_mismatch_oi"
>>                     ],
>>                     "size": 4194304,
>>                     "omap_digest": "0xffffffff",
>>                     "data_digest": "0x7dd0d0bd",
>>                     "object_info":
>> "0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304])",
>>                     "attrs": [
>>                         {
>>                             "name": "_",
>>                             "value":
>> "EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAwNGRiMGX+\/\/\/\/\/\/\/\/\/4Zx7B4AAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABREFQAAAAAAEl3AADi5FMAAAAAAEl3AAACAhUAAAAEJAAAAAAAAABtf4IAAAAAAAAAAAAAAEAAAAAAAPpfSVvV\/VMKAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4uRTAAAAAAAAAAAAAAAAAAA0AAAA+l9JW4x6Rw290NB9\/\/\/\/\/wAAQAAAAAAAAABAAAAAAAAAAAAA",
>>                             "Base64": true
>>                         },
>>                     ]
>>                 },
>>                 {
>>                     "osd": 44,
>>                     "errors": [],
>>                     "size": 4194304,
>>                     "omap_digest": "0xffffffff",
>>                     "data_digest": "0x264b7d0d",
>>                     "object_info":
>> "0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(30587'5493833
>> client.1246390.0:1 dirty|data_digest|omap_digest s 4194304 uv 5493833 dd
>> 264b7d0d od ffffffff alloc_hint [0 0])",
>>                     "attrs": [
>>                         {
>>                             "name": "_",
>>                             "value":
>> "EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABJ1FMAAAAAAHt3AAD0eE4AAAAAALxoAAACAhUAAAAItgQTAAAAAAABAAAAAAAAAAAAAAAAAEAAAAAAAIbaUVtSy\/8jAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAASdRTAAAAAAAAAAAAAAAAAAA0AAAAhtpRW\/VN8CQNfUsm\/\/\/\/\/wAAAAAAAAAAAAAAAAAAAAAAAAAA",
>>                             "Base64": true
>>                         },
>>                         {
>>                             "name": "snapset",
>>                             "value":
>> "AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==",
>>                             "Base64": true
>>                         }
>>                     ]
>>                 }
>>             ]
>>         }
>>     ]
>> }
>>
>>
>> On 20/07/18 00:27, Brad Hubbard wrote:
>>> On Fri, Jul 20, 2018 at 1:05 AM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>>>
>>>>
>>>> On 19/07/18 03:25, Brad Hubbard wrote:
>>>>> On Wed, Jul 18, 2018 at 6:25 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>>>>> Ah ok. Then I think it confirms what you are saying. Here it is:
>>>>>>
>>>>>> $ rados list-inconsistent-obj 0.190
>>>>>> {"epoch":30579,"inconsistents":[{"object":{"name":"rbd_data.15cec2ae8944a.000000000015c7d6","nspace":"","locator":"","snap":"head","version":5498082},"errors":["object_info_inconsistency","attr_value_mismatch"],"union_shard_errors":[],"selected_object_info":"0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>>>>> 7dd0d0bd od ffffffff alloc_hint [4194304
>>>>>> 4194304])","shards":[{"osd":16,"errors":[],"size":4194304,"object_info":"0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(26812'5142772
>>>>>> client.1044166.0:393154060 dirty|data_digest|omap_digest s 4194304 uv
>>>>>> 5142772 dd 264b7d0d od ffffffff alloc_hint [0
>>>>>> 0])","attrs":[{"name":"_","value":"DwgMAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAAD0eE4AAAAAALxoAADzeE4AAAAAALxoAAACAhUAAAAIxu4PAAAAAAAMDm8XAAAAAAAAAAAAAEAAAAAAAOJmPVsEa24SAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9HhOAAAAAAAAAAAAAAAAAAA0AAAA4mY9W1Q\/lBwNfUsm\/\/\/\/\/w==","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]},{"osd":37,"errors":[],"size":4194304,"object_info":"0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(26812'5142772
>>>>>> client.1044166.0:393154060 dirty|data_digest|omap_digest s 4194304 uv
>>>>>> 5142772 dd 264b7d0d od ffffffff alloc_hint [0
>>>>>> 0])","attrs":[{"name":"_","value":"DwgMAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAAD0eE4AAAAAALxoAADzeE4AAAAAALxoAAACAhUAAAAIxu4PAAAAAAAMDm8XAAAAAAAAAAAAAEAAAAAAAOJmPVsEa24SAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9HhOAAAAAAAAAAAAAAAAAAA0AAAA4mY9W1Q\/lBwNfUsm\/\/\/\/\/w==","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]},{"osd":44,"errors":[],"size":4194304,"object_info":"0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>>>>> 7dd0d0bd od ffffffff alloc_hint [4194304
>>>>>> 4194304])","attrs":[{"name":"_","value":"EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAwNGRiMGX+\/\/\/\/\/\/\/\/\/4Zx7B4AAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABREFQAAAAAAEl3AADi5FMAAAAAAEl3AAACAhUAAAAEJAAAAAAAAABtf4IAAAAAAAAAAAAAAEAAAAAAAPpfSVvV\/VMKAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4uRTAAAAAAAAAAAAAAAAAAA0AAAA+l9JW4x6Rw290NB9\/\/\/\/\/wAAQAAAAAAAAABAAAAAAAAAAAAA","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]}]}]}
>>>>>>
>>>>>>
>>>>>> To determine which is the right version of the object, is there no
>>>>>> timestamp that can tell us? maybe the object got updated to osd.37 and
>>>>>> osd.16 while osd.44 was down, and there comes the missmatch? because
>>>>>> otherwise, shouldn't the authoritative osd be leading?
>>>>>
>>>>> The primary will be serving IO requests so the version on osd 37 is
>>>>> what will be read by clients so I guess going with that is reasonable.
>>>>>
>>>>
>>>> OK good.
>>>>
>>>>> The version  on osd 44 was actually modified after the others (epoch
>>>>> 30537, as opposed to epoch 26812) but the sizes are all the same so
>>>>> the difference may be trivial (metadat only perhaps) and, according to
>>>>> the last request id (osd.36.0:8552301) came from another osd (36)
>>>>> which is kind of unexpected. Is there, or was there, a cache tier
>>>>> involved?
>>>>
>>>> Ah OK, very interesting! No, no cache tier involved. So at one point
>>>> osd.36 was part of the PG set?
>>>
>>> Maybe, all we know is that the last request came from osd.36 which is
>>> unusual because changes in this context generally only come from
>>> clients. A cache tier might explain it which is why I mentioned it.
>>>
>>>>
>>>>>
>>>>> If you want to go with the version that is currently being used (37
>>>>> and 16) you can just quiesce the rbd image clients and do a rados get,
>>>>> then a rados put of the object. I would suggest taking a backup of the
>>>>> object from osd 44 using the ceph-objectstore-tool although, as I
>>>>> said, that version will not be being used so I doubt you will miss it.
>>>>>
>>>>
>>>> Great, will do that. Thanks a lot for help.
>>>
>>> yw.
>>>
>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Ana
>>>>>>
>>>>>>
>>>>>> On 18/07/18 05:24, Brad Hubbard wrote:
>>>>>>> OK. What I *meant* to ask for was the output of "rados
>>>>>>> list-inconsistent-obj 0.190" (might still be worth posting that but it
>>>>>>> should just confirm findings below).
>>>>>>>
>>>>>>>
>>>>>>> The relevant lines from the log are below.
>>>>>>>
>>>>>>> 2018-07-16 12:24:45.940910 7fb422340700 2 osd.37 pg_epoch: 30554
>>>>>>> pg[0.190( v 30554'5390084 (30537'5387075,30554'5390084]
>>>>>>> local-les=30554 n=4123 ec=1 les/c/f 30554/30554/0 30552/30553/30542)
>>>>>>> [37,44,16] r=0 lpr=30553 crt=30554'5390079 lcod 30554'5390083 mlcod
>>>>>>> 30554'5390083 active+clean+scrubbing+deep+inconsistent+repair] 0.190
>>>>>>> shard 16: soid 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head
>>>>>>> data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44,
>>>>>>> data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
>>>>>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>>>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>>>>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
>>>>>>> mismatch '_' 2018-07-16 12:24:45.940941 7fb422340700 -1
>>>>>>> log_channel(cluster) log [ERR] : 0.190 shard 16: soid
>>>>>>> 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest
>>>>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest
>>>>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
>>>>>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>>>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>>>>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
>>>>>>> mismatch '_' 2018-07-16 12:24:45.940957 7fb422340700 -1
>>>>>>> log_channel(cluster) log [ERR] : 0.190 shard 37: soid
>>>>>>> 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest
>>>>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest
>>>>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
>>>>>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>>>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>>>>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
>>>>>>> mismatch '_'
>>>>>>>
>>>>>>> They show that osd 44 has been chosen as the authoritative shard and
>>>>>>> and it has a data digest for this object of 0x7dd0d0bd and that the
>>>>>>> data digest in the authoritative object info is also 0x7dd0d0bd.
>>>>>>>
>>>>>>> Shard 16 however, has a data digest of 0x264b7d0d and so does shard 37
>>>>>>> so the data for this object on osds 16 and 37 is different to that on
>>>>>>> osd 44.
>>>>>>>
>>>>>>> Basically, you'll need to pick which is the "right" copy of the object
>>>>>>> (I can't tell you) quiesce traffic to/from that object (rbd image) and
>>>>>>> get/put that object back into the cluster to fix the mismatch. Since
>>>>>>> this appears to be an rbd image this could potentially result in an
>>>>>>> image that needs an fsck or equivalent IIUC.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 17, 2018 at 10:06 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> Hi Brad,
>>>>>>>>
>>>>>>>> Here it is:
>>>>>>>>
>>>>>>>> {
>>>>>>>>     "state": "active+clean+inconsistent",
>>>>>>>>     "snap_trimq": "[]",
>>>>>>>>     "epoch": 30581,
>>>>>>>>     "up": [
>>>>>>>>         37,
>>>>>>>>         44,
>>>>>>>>         16
>>>>>>>>     ],
>>>>>>>>     "acting": [
>>>>>>>>         37,
>>>>>>>>         44,
>>>>>>>>         16
>>>>>>>>     ],
>>>>>>>>     "actingbackfill": [
>>>>>>>>         "16",
>>>>>>>>         "37",
>>>>>>>>         "44"
>>>>>>>>     ],
>>>>>>>>     "info": {
>>>>>>>>         "pgid": "0.190",
>>>>>>>>         "last_update": "30581'5420535",
>>>>>>>>         "last_complete": "30581'5420535",
>>>>>>>>         "log_tail": "30581'5417484",
>>>>>>>>         "last_user_version": 5420535,
>>>>>>>>         "last_backfill": "MAX",
>>>>>>>>         "last_backfill_bitwise": 0,
>>>>>>>>         "purged_snaps": "[]",
>>>>>>>>         "history": {
>>>>>>>>             "epoch_created": 1,
>>>>>>>>             "last_epoch_started": 30580,
>>>>>>>>             "last_epoch_clean": 30581,
>>>>>>>>             "last_epoch_split": 0,
>>>>>>>>             "last_epoch_marked_full": 0,
>>>>>>>>             "same_up_since": 30578,
>>>>>>>>             "same_interval_since": 30579,
>>>>>>>>             "same_primary_since": 30565,
>>>>>>>>             "last_scrub": "30554'5390240",
>>>>>>>>             "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>>>             "last_deep_scrub": "30554'5390240",
>>>>>>>>             "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>>>             "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>>>>>>>         },
>>>>>>>>         "stats": {
>>>>>>>>             "version": "30581'5420535",
>>>>>>>>             "reported_seq": "5155553",
>>>>>>>>             "reported_epoch": "30581",
>>>>>>>>             "state": "active+clean+inconsistent",
>>>>>>>>             "last_fresh": "2018-07-17 12:02:13.002428",
>>>>>>>>             "last_change": "2018-07-16 13:37:24.020403",
>>>>>>>>             "last_active": "2018-07-17 12:02:13.002428",
>>>>>>>>             "last_peered": "2018-07-17 12:02:13.002428",
>>>>>>>>             "last_clean": "2018-07-17 12:02:13.002428",
>>>>>>>>             "last_became_active": "2018-07-16 13:37:13.173821",
>>>>>>>>             "last_became_peered": "2018-07-16 13:37:13.173821",
>>>>>>>>             "last_unstale": "2018-07-17 12:02:13.002428",
>>>>>>>>             "last_undegraded": "2018-07-17 12:02:13.002428",
>>>>>>>>             "last_fullsized": "2018-07-17 12:02:13.002428",
>>>>>>>>             "mapping_epoch": 30578,
>>>>>>>>             "log_start": "30581'5417484",
>>>>>>>>             "ondisk_log_start": "30581'5417484",
>>>>>>>>             "created": 1,
>>>>>>>>             "last_epoch_clean": 30581,
>>>>>>>>             "parent": "0.0",
>>>>>>>>             "parent_split_bits": 0,
>>>>>>>>             "last_scrub": "30554'5390240",
>>>>>>>>             "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>>>             "last_deep_scrub": "30554'5390240",
>>>>>>>>             "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>>>             "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>>>>>>>             "log_size": 3051,
>>>>>>>>             "ondisk_log_size": 3051,
>>>>>>>>             "stats_invalid": false,
>>>>>>>>             "dirty_stats_invalid": false,
>>>>>>>>             "omap_stats_invalid": false,
>>>>>>>>             "hitset_stats_invalid": false,
>>>>>>>>             "hitset_bytes_stats_invalid": false,
>>>>>>>>             "pin_stats_invalid": true,
>>>>>>>>             "stat_sum": {
>>>>>>>>                 "num_bytes": 16946139153,
>>>>>>>>                 "num_objects": 4148,
>>>>>>>>                 "num_object_clones": 0,
>>>>>>>>                 "num_object_copies": 12444,
>>>>>>>>                 "num_objects_missing_on_primary": 0,
>>>>>>>>                 "num_objects_missing": 0,
>>>>>>>>                 "num_objects_degraded": 0,
>>>>>>>>                 "num_objects_misplaced": 0,
>>>>>>>>                 "num_objects_unfound": 0,
>>>>>>>>                 "num_objects_dirty": 4148,
>>>>>>>>                 "num_whiteouts": 0,
>>>>>>>>                 "num_read": 6895104,
>>>>>>>>                 "num_read_kb": 292185552,
>>>>>>>>                 "num_write": 10032749,
>>>>>>>>                 "num_write_kb": 185167701,
>>>>>>>>                 "num_scrub_errors": 1,
>>>>>>>>                 "num_shallow_scrub_errors": 1,
>>>>>>>>                 "num_deep_scrub_errors": 0,
>>>>>>>>                 "num_objects_recovered": 103598,
>>>>>>>>                 "num_bytes_recovered": 424107954567,
>>>>>>>>                 "num_keys_recovered": 110,
>>>>>>>>                 "num_objects_omap": 1,
>>>>>>>>                 "num_objects_hit_set_archive": 0,
>>>>>>>>                 "num_bytes_hit_set_archive": 0,
>>>>>>>>                 "num_flush": 0,
>>>>>>>>                 "num_flush_kb": 0,
>>>>>>>>                 "num_evict": 0,
>>>>>>>>                 "num_evict_kb": 0,
>>>>>>>>                 "num_promote": 0,
>>>>>>>>                 "num_flush_mode_high": 0,
>>>>>>>>                 "num_flush_mode_low": 0,
>>>>>>>>                 "num_evict_mode_some": 0,
>>>>>>>>                 "num_evict_mode_full": 0,
>>>>>>>>                 "num_objects_pinned": 0
>>>>>>>>             },
>>>>>>>>             "up": [
>>>>>>>>                 37,
>>>>>>>>                 44,
>>>>>>>>                 16
>>>>>>>>             ],
>>>>>>>>             "acting": [
>>>>>>>>                 37,
>>>>>>>>                 44,
>>>>>>>>                 16
>>>>>>>>             ],
>>>>>>>>             "blocked_by": [],
>>>>>>>>             "up_primary": 37,
>>>>>>>>             "acting_primary": 37
>>>>>>>>         },
>>>>>>>>         "empty": 0,
>>>>>>>>         "dne": 0,
>>>>>>>>         "incomplete": 0,
>>>>>>>>         "last_epoch_started": 30580,
>>>>>>>>         "hit_set_history": {
>>>>>>>>             "current_last_update": "0'0",
>>>>>>>>             "history": []
>>>>>>>>         }
>>>>>>>>     },
>>>>>>>>     "peer_info": [
>>>>>>>>         {
>>>>>>>>             "peer": "16",
>>>>>>>>             "pgid": "0.190",
>>>>>>>>             "last_update": "30581'5420535",
>>>>>>>>             "last_complete": "30581'5420535",
>>>>>>>>             "log_tail": "30537'5387475",
>>>>>>>>             "last_user_version": 5390577,
>>>>>>>>             "last_backfill": "MAX",
>>>>>>>>             "last_backfill_bitwise": 1,
>>>>>>>>             "purged_snaps": "[]",
>>>>>>>>             "history": {
>>>>>>>>                 "epoch_created": 1,
>>>>>>>>                 "last_epoch_started": 30580,
>>>>>>>>                 "last_epoch_clean": 30581,
>>>>>>>>                 "last_epoch_split": 0,
>>>>>>>>                 "last_epoch_marked_full": 0,
>>>>>>>>                 "same_up_since": 30578,
>>>>>>>>                 "same_interval_since": 30579,
>>>>>>>>                 "same_primary_since": 30565,
>>>>>>>>                 "last_scrub": "30554'5390240",
>>>>>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>>>                 "last_deep_scrub": "30554'5390240",
>>>>>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>>>>>>>             },
>>>>>>>>             "stats": {
>>>>>>>>                 "version": "30570'5390575",
>>>>>>>>                 "reported_seq": "5139870",
>>>>>>>>                 "reported_epoch": "30576",
>>>>>>>>                 "state": "active+undersized+degraded+inconsistent",
>>>>>>>>                 "last_fresh": "2018-07-16 13:36:40.284756",
>>>>>>>>                 "last_change": "2018-07-16 13:36:40.284277",
>>>>>>>>                 "last_active": "2018-07-16 13:36:40.284756",
>>>>>>>>                 "last_peered": "2018-07-16 13:36:40.284756",
>>>>>>>>                 "last_clean": "2018-07-16 13:36:23.558224",
>>>>>>>>                 "last_became_active": "2018-07-16 13:36:40.284277",
>>>>>>>>                 "last_became_peered": "2018-07-16 13:36:40.284277",
>>>>>>>>                 "last_unstale": "2018-07-16 13:36:40.284756",
>>>>>>>>                 "last_undegraded": "2018-07-16 13:36:40.203248",
>>>>>>>>                 "last_fullsized": "2018-07-16 13:36:40.203248",
>>>>>>>>                 "mapping_epoch": 30578,
>>>>>>>>                 "log_start": "30537'5387475",
>>>>>>>>                 "ondisk_log_start": "30537'5387475",
>>>>>>>>                 "created": 1,
>>>>>>>>                 "last_epoch_clean": 30576,
>>>>>>>>                 "parent": "0.0",
>>>>>>>>                 "parent_split_bits": 0,
>>>>>>>>                 "last_scrub": "30554'5390240",
>>>>>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>>>                 "last_deep_scrub": "30554'5390240",
>>>>>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>>>>>>>                 "log_size": 3100,
>>>>>>>>                 "ondisk_log_size": 3100,
>>>>>>>>                 "stats_invalid": false,
>>>>>>>>                 "dirty_stats_invalid": false,
>>>>>>>>                 "omap_stats_invalid": false,
>>>>>>>>                 "hitset_stats_invalid": false,
>>>>>>>>                 "hitset_bytes_stats_invalid": false,
>>>>>>>>                 "pin_stats_invalid": true,
>>>>>>>>                 "stat_sum": {
>>>>>>>>                     "num_bytes": 16841281553,
>>>>>>>>                     "num_objects": 4123,
>>>>>>>>                     "num_object_clones": 0,
>>>>>>>>                     "num_object_copies": 12369,
>>>>>>>>                     "num_objects_missing_on_primary": 0,
>>>>>>>>                     "num_objects_missing": 0,
>>>>>>>>                     "num_objects_degraded": 4123,
>>>>>>>>                     "num_objects_misplaced": 0,
>>>>>>>>                     "num_objects_unfound": 0,
>>>>>>>>                     "num_objects_dirty": 4123,
>>>>>>>>                     "num_whiteouts": 0,
>>>>>>>>                     "num_read": 6870027,
>>>>>>>>                     "num_read_kb": 291425720,
>>>>>>>>                     "num_write": 9972836,
>>>>>>>>                     "num_write_kb": 184701865,
>>>>>>>>                     "num_scrub_errors": 1,
>>>>>>>>                     "num_shallow_scrub_errors": 1,
>>>>>>>>                     "num_deep_scrub_errors": 0,
>>>>>>>>                     "num_objects_recovered": 103596,
>>>>>>>>                     "num_bytes_recovered": 424099565959,
>>>>>>>>                     "num_keys_recovered": 110,
>>>>>>>>                     "num_objects_omap": 1,
>>>>>>>>                     "num_objects_hit_set_archive": 0,
>>>>>>>>                     "num_bytes_hit_set_archive": 0,
>>>>>>>>                     "num_flush": 0,
>>>>>>>>                     "num_flush_kb": 0,
>>>>>>>>                     "num_evict": 0,
>>>>>>>>                     "num_evict_kb": 0,
>>>>>>>>                     "num_promote": 0,
>>>>>>>>                     "num_flush_mode_high": 0,
>>>>>>>>                     "num_flush_mode_low": 0,
>>>>>>>>                     "num_evict_mode_some": 0,
>>>>>>>>                     "num_evict_mode_full": 0,
>>>>>>>>                     "num_objects_pinned": 0
>>>>>>>>                 },
>>>>>>>>                 "up": [
>>>>>>>>                     37,
>>>>>>>>                     44,
>>>>>>>>                     16
>>>>>>>>                 ],
>>>>>>>>                 "acting": [
>>>>>>>>                     37,
>>>>>>>>                     44,
>>>>>>>>                     16
>>>>>>>>                 ],
>>>>>>>>                 "blocked_by": [],
>>>>>>>>                 "up_primary": 37,
>>>>>>>>                 "acting_primary": 37
>>>>>>>>             },
>>>>>>>>             "empty": 0,
>>>>>>>>             "dne": 0,
>>>>>>>>             "incomplete": 0,
>>>>>>>>             "last_epoch_started": 30580,
>>>>>>>>             "hit_set_history": {
>>>>>>>>                 "current_last_update": "0'0",
>>>>>>>>                 "history": []
>>>>>>>>             }
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>             "peer": "44",
>>>>>>>>             "pgid": "0.190",
>>>>>>>>             "last_update": "30581'5420535",
>>>>>>>>             "last_complete": "30570'5390575",
>>>>>>>>             "log_tail": "30537'5387475",
>>>>>>>>             "last_user_version": 5390575,
>>>>>>>>             "last_backfill": "MAX",
>>>>>>>>             "last_backfill_bitwise": 1,
>>>>>>>>             "purged_snaps": "[]",
>>>>>>>>             "history": {
>>>>>>>>                 "epoch_created": 1,
>>>>>>>>                 "last_epoch_started": 30580,
>>>>>>>>                 "last_epoch_clean": 30581,
>>>>>>>>                 "last_epoch_split": 0,
>>>>>>>>                 "last_epoch_marked_full": 0,
>>>>>>>>                 "same_up_since": 30578,
>>>>>>>>                 "same_interval_since": 30579,
>>>>>>>>                 "same_primary_since": 30565,
>>>>>>>>                 "last_scrub": "30554'5390240",
>>>>>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>>>                 "last_deep_scrub": "30554'5390240",
>>>>>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>>>>>>>             },
>>>>>>>>             "stats": {
>>>>>>>>                 "version": "30568'5390574",
>>>>>>>>                 "reported_seq": "5139846",
>>>>>>>>                 "reported_epoch": "30570",
>>>>>>>>                 "state": "active+undersized+degraded+inconsistent",
>>>>>>>>                 "last_fresh": "2018-07-16 13:36:07.003551",
>>>>>>>>                 "last_change": "2018-07-16 13:36:07.002580",
>>>>>>>>                 "last_active": "2018-07-16 13:36:07.003551",
>>>>>>>>                 "last_peered": "2018-07-16 13:36:07.003551",
>>>>>>>>                 "last_clean": "2018-07-16 13:35:50.922619",
>>>>>>>>                 "last_became_active": "2018-07-16 13:36:07.002580",
>>>>>>>>                 "last_became_peered": "2018-07-16 13:36:07.002580",
>>>>>>>>                 "last_unstale": "2018-07-16 13:36:07.003551",
>>>>>>>>                 "last_undegraded": "2018-07-16 13:36:05.922413",
>>>>>>>>                 "last_fullsized": "2018-07-16 13:36:05.922413",
>>>>>>>>                 "mapping_epoch": 30578,
>>>>>>>>                 "log_start": "30537'5387475",
>>>>>>>>                 "ondisk_log_start": "30537'5387475",
>>>>>>>>                 "created": 1,
>>>>>>>>                 "last_epoch_clean": 30570,
>>>>>>>>                 "parent": "0.0",
>>>>>>>>                 "parent_split_bits": 0,
>>>>>>>>                 "last_scrub": "30554'5390240",
>>>>>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>>>                 "last_deep_scrub": "30554'5390240",
>>>>>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>>>>>>>                 "log_size": 3099,
>>>>>>>>                 "ondisk_log_size": 3099,
>>>>>>>>                 "stats_invalid": false,
>>>>>>>>                 "dirty_stats_invalid": false,
>>>>>>>>                 "omap_stats_invalid": false,
>>>>>>>>                 "hitset_stats_invalid": false,
>>>>>>>>                 "hitset_bytes_stats_invalid": false,
>>>>>>>>                 "pin_stats_invalid": true,
>>>>>>>>                 "stat_sum": {
>>>>>>>>                     "num_bytes": 16841281553,
>>>>>>>>                     "num_objects": 4123,
>>>>>>>>                     "num_object_clones": 0,
>>>>>>>>                     "num_object_copies": 12369,
>>>>>>>>                     "num_objects_missing_on_primary": 0,
>>>>>>>>                     "num_objects_missing": 0,
>>>>>>>>                     "num_objects_degraded": 4123,
>>>>>>>>                     "num_objects_misplaced": 0,
>>>>>>>>                     "num_objects_unfound": 0,
>>>>>>>>                     "num_objects_dirty": 4123,
>>>>>>>>                     "num_whiteouts": 0,
>>>>>>>>                     "num_read": 6870027,
>>>>>>>>                     "num_read_kb": 291425720,
>>>>>>>>                     "num_write": 9972832,
>>>>>>>>                     "num_write_kb": 184701853,
>>>>>>>>                     "num_scrub_errors": 1,
>>>>>>>>                     "num_shallow_scrub_errors": 1,
>>>>>>>>                     "num_deep_scrub_errors": 0,
>>>>>>>>                     "num_objects_recovered": 103594,
>>>>>>>>                     "num_bytes_recovered": 424091177351,
>>>>>>>>                     "num_keys_recovered": 110,
>>>>>>>>                     "num_objects_omap": 1,
>>>>>>>>                     "num_objects_hit_set_archive": 0,
>>>>>>>>                     "num_bytes_hit_set_archive": 0,
>>>>>>>>                     "num_flush": 0,
>>>>>>>>                     "num_flush_kb": 0,
>>>>>>>>                     "num_evict": 0,
>>>>>>>>                     "num_evict_kb": 0,
>>>>>>>>                     "num_promote": 0,
>>>>>>>>                     "num_flush_mode_high": 0,
>>>>>>>>                     "num_flush_mode_low": 0,
>>>>>>>>                     "num_evict_mode_some": 0,
>>>>>>>>                     "num_evict_mode_full": 0,
>>>>>>>>                     "num_objects_pinned": 0
>>>>>>>>                 },
>>>>>>>>                 "up": [
>>>>>>>>                     37,
>>>>>>>>                     44,
>>>>>>>>                     16
>>>>>>>>                 ],
>>>>>>>>                 "acting": [
>>>>>>>>                     37,
>>>>>>>>                     44,
>>>>>>>>                     16
>>>>>>>>                 ],
>>>>>>>>                 "blocked_by": [],
>>>>>>>>                 "up_primary": 37,
>>>>>>>>                 "acting_primary": 37
>>>>>>>>             },
>>>>>>>>             "empty": 0,
>>>>>>>>             "dne": 0,
>>>>>>>>             "incomplete": 0,
>>>>>>>>             "last_epoch_started": 30580,
>>>>>>>>             "hit_set_history": {
>>>>>>>>                 "current_last_update": "0'0",
>>>>>>>>                 "history": []
>>>>>>>>             }
>>>>>>>>         }
>>>>>>>>     ],
>>>>>>>>     "recovery_state": [
>>>>>>>>         {
>>>>>>>>             "name": "Started\/Primary\/Active",
>>>>>>>>             "enter_time": "2018-07-16 13:37:13.050211",
>>>>>>>>             "might_have_unfound": [
>>>>>>>>                 {
>>>>>>>>                     "osd": "16",
>>>>>>>>                     "status": "already probed"
>>>>>>>>                 },
>>>>>>>>                 {
>>>>>>>>                     "osd": "44",
>>>>>>>>                     "status": "already probed"
>>>>>>>>                 }
>>>>>>>>             ],
>>>>>>>>             "recovery_progress": {
>>>>>>>>                 "backfill_targets": [],
>>>>>>>>                 "waiting_on_backfill": [],
>>>>>>>>                 "last_backfill_started": "MIN",
>>>>>>>>                 "backfill_info": {
>>>>>>>>                     "begin": "MIN",
>>>>>>>>                     "end": "MIN",
>>>>>>>>                     "objects": []
>>>>>>>>                 },
>>>>>>>>                 "peer_backfill_info": [],
>>>>>>>>                 "backfills_in_flight": [],
>>>>>>>>                 "recovering": [],
>>>>>>>>                 "pg_backend": {
>>>>>>>>                     "pull_from_peer": [],
>>>>>>>>                     "pushing": []
>>>>>>>>                 }
>>>>>>>>             },
>>>>>>>>             "scrub": {
>>>>>>>>                 "scrubber.epoch_start": "0",
>>>>>>>>                 "scrubber.active": 0,
>>>>>>>>                 "scrubber.state": "INACTIVE",
>>>>>>>>                 "scrubber.start": "MIN",
>>>>>>>>                 "scrubber.end": "MIN",
>>>>>>>>                 "scrubber.subset_last_update": "0'0",
>>>>>>>>                 "scrubber.deep": false,
>>>>>>>>                 "scrubber.seed": 0,
>>>>>>>>                 "scrubber.waiting_on": 0,
>>>>>>>>                 "scrubber.waiting_on_whom": []
>>>>>>>>             }
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>             "name": "Started",
>>>>>>>>             "enter_time": "2018-07-16 13:37:11.980264"
>>>>>>>>         }
>>>>>>>>     ],
>>>>>>>>     "agent_state": {}
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> On 17/07/18 02:19, Brad Hubbard wrote:
>>>>>>>>> Can we see a pg query of 0.190 ?
>>>>>>>>>
>>>>>>>>> On Tue, Jul 17, 2018 at 1:05 AM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> We have a cluster that was running hammer (0.94.10). We hit a bug where
>>>>>>>>>> right after seemingly fixing an inconsistent PG, the primary OSD would
>>>>>>>>>> crash and restart. Next deep-scrub will again return inconsistent PG.
>>>>>>>>>>
>>>>>>>>>> We filled in a bug issue
>>>>>>>>>> https://tracker.ceph.com/issues/24652#change-115654 that was closed
>>>>>>>>>> since it was a known bug fixed in newer versions of Ceph.
>>>>>>>>>>
>>>>>>>>>> Now the cluster is running jewel (10.2.11). There is again one
>>>>>>>>>> inconsistent PG with 1 error which not able to fix and with no
>>>>>>>>>> reference to the inconsistent object.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> scrub 0 missing, 1 inconsistent objects
>>>>>>>>>> scrub 1 errors
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We have the logs with debug level 20 while repairing the PG. The one for
>>>>>>>>>> the primary OSD is: 94e20123-fcda-49d7-98a2-919507dfbc92
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Kind regards,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Ana Avilés
>>>>>>>>>> Greenhost - sustainable hosting & digital security
>>>>>>>>>> E: ana@xxxxxxxxxxxx
>>>>>>>>>> T: +31 20 4890444
>>>>>>>>>> W: https://greenhost.nl
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ana Avilés
>>>>>>>> Greenhost - sustainable hosting & digital security
>>>>>>>> E: ana@xxxxxxxxxxxx
>>>>>>>> T: +31 20 4890444
>>>>>>>> W: https://greenhost.nl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ana Avilés
>>>>>> Greenhost - sustainable hosting & digital security
>>>>>> E: ana@xxxxxxxxxxxx
>>>>>> T: +31 20 4890444
>>>>>> W: https://greenhost.nl
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Ana Avilés
>>>> Greenhost - sustainable hosting & digital security
>>>> E: ana@xxxxxxxxxxxx
>>>> T: +31 20 4890444
>>>> W: https://greenhost.nl
>>>
>>>
>>>
>>
>> --
>> Ana Avilés
>> Greenhost - sustainable hosting & digital security
>> E: ana@xxxxxxxxxxxx
>> T: +31 20 4890444
>> W: https://greenhost.nl
> 
> 
> 

-- 
Ana Avilés
Greenhost - sustainable hosting & digital security
E: ana@xxxxxxxxxxxx
T: +31 20 4890444
W: https://greenhost.nl
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux