Re: Inconsistent PG with 1 inconsistent object not referenced in the log

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 20, 2018 at 1:05 AM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>
>
> On 19/07/18 03:25, Brad Hubbard wrote:
>> On Wed, Jul 18, 2018 at 6:25 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>> Ah ok. Then I think it confirms what you are saying. Here it is:
>>>
>>> $ rados list-inconsistent-obj 0.190
>>> {"epoch":30579,"inconsistents":[{"object":{"name":"rbd_data.15cec2ae8944a.000000000015c7d6","nspace":"","locator":"","snap":"head","version":5498082},"errors":["object_info_inconsistency","attr_value_mismatch"],"union_shard_errors":[],"selected_object_info":"0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>> 7dd0d0bd od ffffffff alloc_hint [4194304
>>> 4194304])","shards":[{"osd":16,"errors":[],"size":4194304,"object_info":"0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(26812'5142772
>>> client.1044166.0:393154060 dirty|data_digest|omap_digest s 4194304 uv
>>> 5142772 dd 264b7d0d od ffffffff alloc_hint [0
>>> 0])","attrs":[{"name":"_","value":"DwgMAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAAD0eE4AAAAAALxoAADzeE4AAAAAALxoAAACAhUAAAAIxu4PAAAAAAAMDm8XAAAAAAAAAAAAAEAAAAAAAOJmPVsEa24SAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9HhOAAAAAAAAAAAAAAAAAAA0AAAA4mY9W1Q\/lBwNfUsm\/\/\/\/\/w==","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]},{"osd":37,"errors":[],"size":4194304,"object_info":"0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(26812'5142772
>>> client.1044166.0:393154060 dirty|data_digest|omap_digest s 4194304 uv
>>> 5142772 dd 264b7d0d od ffffffff alloc_hint [0
>>> 0])","attrs":[{"name":"_","value":"DwgMAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAAD0eE4AAAAAALxoAADzeE4AAAAAALxoAAACAhUAAAAIxu4PAAAAAAAMDm8XAAAAAAAAAAAAAEAAAAAAAOJmPVsEa24SAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9HhOAAAAAAAAAAAAAAAAAAA0AAAA4mY9W1Q\/lBwNfUsm\/\/\/\/\/w==","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]},{"osd":44,"errors":[],"size":4194304,"object_info":"0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>> 7dd0d0bd od ffffffff alloc_hint [4194304
>>> 4194304])","attrs":[{"name":"_","value":"EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAwNGRiMGX+\/\/\/\/\/\/\/\/\/4Zx7B4AAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABREFQAAAAAAEl3AADi5FMAAAAAAEl3AAACAhUAAAAEJAAAAAAAAABtf4IAAAAAAAAAAAAAAEAAAAAAAPpfSVvV\/VMKAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4uRTAAAAAAAAAAAAAAAAAAA0AAAA+l9JW4x6Rw290NB9\/\/\/\/\/wAAQAAAAAAAAABAAAAAAAAAAAAA","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]}]}]}
>>>
>>>
>>> To determine which is the right version of the object, is there no
>>> timestamp that can tell us? maybe the object got updated to osd.37 and
>>> osd.16 while osd.44 was down, and there comes the missmatch? because
>>> otherwise, shouldn't the authoritative osd be leading?
>>
>> The primary will be serving IO requests so the version on osd 37 is
>> what will be read by clients so I guess going with that is reasonable.
>>
>
> OK good.
>
>> The version  on osd 44 was actually modified after the others (epoch
>> 30537, as opposed to epoch 26812) but the sizes are all the same so
>> the difference may be trivial (metadat only perhaps) and, according to
>> the last request id (osd.36.0:8552301) came from another osd (36)
>> which is kind of unexpected. Is there, or was there, a cache tier
>> involved?
>
> Ah OK, very interesting! No, no cache tier involved. So at one point
> osd.36 was part of the PG set?

Maybe, all we know is that the last request came from osd.36 which is
unusual because changes in this context generally only come from
clients. A cache tier might explain it which is why I mentioned it.

>
>>
>> If you want to go with the version that is currently being used (37
>> and 16) you can just quiesce the rbd image clients and do a rados get,
>> then a rados put of the object. I would suggest taking a backup of the
>> object from osd 44 using the ceph-objectstore-tool although, as I
>> said, that version will not be being used so I doubt you will miss it.
>>
>
> Great, will do that. Thanks a lot for help.

yw.

>
>>>
>>> Regards,
>>> Ana
>>>
>>>
>>> On 18/07/18 05:24, Brad Hubbard wrote:
>>>> OK. What I *meant* to ask for was the output of "rados
>>>> list-inconsistent-obj 0.190" (might still be worth posting that but it
>>>> should just confirm findings below).
>>>>
>>>>
>>>> The relevant lines from the log are below.
>>>>
>>>> 2018-07-16 12:24:45.940910 7fb422340700 2 osd.37 pg_epoch: 30554
>>>> pg[0.190( v 30554'5390084 (30537'5387075,30554'5390084]
>>>> local-les=30554 n=4123 ec=1 les/c/f 30554/30554/0 30552/30553/30542)
>>>> [37,44,16] r=0 lpr=30553 crt=30554'5390079 lcod 30554'5390083 mlcod
>>>> 30554'5390083 active+clean+scrubbing+deep+inconsistent+repair] 0.190
>>>> shard 16: soid 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head
>>>> data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44,
>>>> data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
>>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
>>>> mismatch '_' 2018-07-16 12:24:45.940941 7fb422340700 -1
>>>> log_channel(cluster) log [ERR] : 0.190 shard 16: soid
>>>> 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest
>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest
>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
>>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
>>>> mismatch '_' 2018-07-16 12:24:45.940957 7fb422340700 -1
>>>> log_channel(cluster) log [ERR] : 0.190 shard 37: soid
>>>> 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest
>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest
>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
>>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
>>>> mismatch '_'
>>>>
>>>> They show that osd 44 has been chosen as the authoritative shard and
>>>> and it has a data digest for this object of 0x7dd0d0bd and that the
>>>> data digest in the authoritative object info is also 0x7dd0d0bd.
>>>>
>>>> Shard 16 however, has a data digest of 0x264b7d0d and so does shard 37
>>>> so the data for this object on osds 16 and 37 is different to that on
>>>> osd 44.
>>>>
>>>> Basically, you'll need to pick which is the "right" copy of the object
>>>> (I can't tell you) quiesce traffic to/from that object (rbd image) and
>>>> get/put that object back into the cluster to fix the mismatch. Since
>>>> this appears to be an rbd image this could potentially result in an
>>>> image that needs an fsck or equivalent IIUC.
>>>>
>>>>
>>>> On Tue, Jul 17, 2018 at 10:06 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>>>>
>>>>> Hi Brad,
>>>>>
>>>>> Here it is:
>>>>>
>>>>> {
>>>>>     "state": "active+clean+inconsistent",
>>>>>     "snap_trimq": "[]",
>>>>>     "epoch": 30581,
>>>>>     "up": [
>>>>>         37,
>>>>>         44,
>>>>>         16
>>>>>     ],
>>>>>     "acting": [
>>>>>         37,
>>>>>         44,
>>>>>         16
>>>>>     ],
>>>>>     "actingbackfill": [
>>>>>         "16",
>>>>>         "37",
>>>>>         "44"
>>>>>     ],
>>>>>     "info": {
>>>>>         "pgid": "0.190",
>>>>>         "last_update": "30581'5420535",
>>>>>         "last_complete": "30581'5420535",
>>>>>         "log_tail": "30581'5417484",
>>>>>         "last_user_version": 5420535,
>>>>>         "last_backfill": "MAX",
>>>>>         "last_backfill_bitwise": 0,
>>>>>         "purged_snaps": "[]",
>>>>>         "history": {
>>>>>             "epoch_created": 1,
>>>>>             "last_epoch_started": 30580,
>>>>>             "last_epoch_clean": 30581,
>>>>>             "last_epoch_split": 0,
>>>>>             "last_epoch_marked_full": 0,
>>>>>             "same_up_since": 30578,
>>>>>             "same_interval_since": 30579,
>>>>>             "same_primary_since": 30565,
>>>>>             "last_scrub": "30554'5390240",
>>>>>             "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>             "last_deep_scrub": "30554'5390240",
>>>>>             "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>             "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>>>>         },
>>>>>         "stats": {
>>>>>             "version": "30581'5420535",
>>>>>             "reported_seq": "5155553",
>>>>>             "reported_epoch": "30581",
>>>>>             "state": "active+clean+inconsistent",
>>>>>             "last_fresh": "2018-07-17 12:02:13.002428",
>>>>>             "last_change": "2018-07-16 13:37:24.020403",
>>>>>             "last_active": "2018-07-17 12:02:13.002428",
>>>>>             "last_peered": "2018-07-17 12:02:13.002428",
>>>>>             "last_clean": "2018-07-17 12:02:13.002428",
>>>>>             "last_became_active": "2018-07-16 13:37:13.173821",
>>>>>             "last_became_peered": "2018-07-16 13:37:13.173821",
>>>>>             "last_unstale": "2018-07-17 12:02:13.002428",
>>>>>             "last_undegraded": "2018-07-17 12:02:13.002428",
>>>>>             "last_fullsized": "2018-07-17 12:02:13.002428",
>>>>>             "mapping_epoch": 30578,
>>>>>             "log_start": "30581'5417484",
>>>>>             "ondisk_log_start": "30581'5417484",
>>>>>             "created": 1,
>>>>>             "last_epoch_clean": 30581,
>>>>>             "parent": "0.0",
>>>>>             "parent_split_bits": 0,
>>>>>             "last_scrub": "30554'5390240",
>>>>>             "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>             "last_deep_scrub": "30554'5390240",
>>>>>             "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>             "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>>>>             "log_size": 3051,
>>>>>             "ondisk_log_size": 3051,
>>>>>             "stats_invalid": false,
>>>>>             "dirty_stats_invalid": false,
>>>>>             "omap_stats_invalid": false,
>>>>>             "hitset_stats_invalid": false,
>>>>>             "hitset_bytes_stats_invalid": false,
>>>>>             "pin_stats_invalid": true,
>>>>>             "stat_sum": {
>>>>>                 "num_bytes": 16946139153,
>>>>>                 "num_objects": 4148,
>>>>>                 "num_object_clones": 0,
>>>>>                 "num_object_copies": 12444,
>>>>>                 "num_objects_missing_on_primary": 0,
>>>>>                 "num_objects_missing": 0,
>>>>>                 "num_objects_degraded": 0,
>>>>>                 "num_objects_misplaced": 0,
>>>>>                 "num_objects_unfound": 0,
>>>>>                 "num_objects_dirty": 4148,
>>>>>                 "num_whiteouts": 0,
>>>>>                 "num_read": 6895104,
>>>>>                 "num_read_kb": 292185552,
>>>>>                 "num_write": 10032749,
>>>>>                 "num_write_kb": 185167701,
>>>>>                 "num_scrub_errors": 1,
>>>>>                 "num_shallow_scrub_errors": 1,
>>>>>                 "num_deep_scrub_errors": 0,
>>>>>                 "num_objects_recovered": 103598,
>>>>>                 "num_bytes_recovered": 424107954567,
>>>>>                 "num_keys_recovered": 110,
>>>>>                 "num_objects_omap": 1,
>>>>>                 "num_objects_hit_set_archive": 0,
>>>>>                 "num_bytes_hit_set_archive": 0,
>>>>>                 "num_flush": 0,
>>>>>                 "num_flush_kb": 0,
>>>>>                 "num_evict": 0,
>>>>>                 "num_evict_kb": 0,
>>>>>                 "num_promote": 0,
>>>>>                 "num_flush_mode_high": 0,
>>>>>                 "num_flush_mode_low": 0,
>>>>>                 "num_evict_mode_some": 0,
>>>>>                 "num_evict_mode_full": 0,
>>>>>                 "num_objects_pinned": 0
>>>>>             },
>>>>>             "up": [
>>>>>                 37,
>>>>>                 44,
>>>>>                 16
>>>>>             ],
>>>>>             "acting": [
>>>>>                 37,
>>>>>                 44,
>>>>>                 16
>>>>>             ],
>>>>>             "blocked_by": [],
>>>>>             "up_primary": 37,
>>>>>             "acting_primary": 37
>>>>>         },
>>>>>         "empty": 0,
>>>>>         "dne": 0,
>>>>>         "incomplete": 0,
>>>>>         "last_epoch_started": 30580,
>>>>>         "hit_set_history": {
>>>>>             "current_last_update": "0'0",
>>>>>             "history": []
>>>>>         }
>>>>>     },
>>>>>     "peer_info": [
>>>>>         {
>>>>>             "peer": "16",
>>>>>             "pgid": "0.190",
>>>>>             "last_update": "30581'5420535",
>>>>>             "last_complete": "30581'5420535",
>>>>>             "log_tail": "30537'5387475",
>>>>>             "last_user_version": 5390577,
>>>>>             "last_backfill": "MAX",
>>>>>             "last_backfill_bitwise": 1,
>>>>>             "purged_snaps": "[]",
>>>>>             "history": {
>>>>>                 "epoch_created": 1,
>>>>>                 "last_epoch_started": 30580,
>>>>>                 "last_epoch_clean": 30581,
>>>>>                 "last_epoch_split": 0,
>>>>>                 "last_epoch_marked_full": 0,
>>>>>                 "same_up_since": 30578,
>>>>>                 "same_interval_since": 30579,
>>>>>                 "same_primary_since": 30565,
>>>>>                 "last_scrub": "30554'5390240",
>>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>                 "last_deep_scrub": "30554'5390240",
>>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>>>>             },
>>>>>             "stats": {
>>>>>                 "version": "30570'5390575",
>>>>>                 "reported_seq": "5139870",
>>>>>                 "reported_epoch": "30576",
>>>>>                 "state": "active+undersized+degraded+inconsistent",
>>>>>                 "last_fresh": "2018-07-16 13:36:40.284756",
>>>>>                 "last_change": "2018-07-16 13:36:40.284277",
>>>>>                 "last_active": "2018-07-16 13:36:40.284756",
>>>>>                 "last_peered": "2018-07-16 13:36:40.284756",
>>>>>                 "last_clean": "2018-07-16 13:36:23.558224",
>>>>>                 "last_became_active": "2018-07-16 13:36:40.284277",
>>>>>                 "last_became_peered": "2018-07-16 13:36:40.284277",
>>>>>                 "last_unstale": "2018-07-16 13:36:40.284756",
>>>>>                 "last_undegraded": "2018-07-16 13:36:40.203248",
>>>>>                 "last_fullsized": "2018-07-16 13:36:40.203248",
>>>>>                 "mapping_epoch": 30578,
>>>>>                 "log_start": "30537'5387475",
>>>>>                 "ondisk_log_start": "30537'5387475",
>>>>>                 "created": 1,
>>>>>                 "last_epoch_clean": 30576,
>>>>>                 "parent": "0.0",
>>>>>                 "parent_split_bits": 0,
>>>>>                 "last_scrub": "30554'5390240",
>>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>                 "last_deep_scrub": "30554'5390240",
>>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>>>>                 "log_size": 3100,
>>>>>                 "ondisk_log_size": 3100,
>>>>>                 "stats_invalid": false,
>>>>>                 "dirty_stats_invalid": false,
>>>>>                 "omap_stats_invalid": false,
>>>>>                 "hitset_stats_invalid": false,
>>>>>                 "hitset_bytes_stats_invalid": false,
>>>>>                 "pin_stats_invalid": true,
>>>>>                 "stat_sum": {
>>>>>                     "num_bytes": 16841281553,
>>>>>                     "num_objects": 4123,
>>>>>                     "num_object_clones": 0,
>>>>>                     "num_object_copies": 12369,
>>>>>                     "num_objects_missing_on_primary": 0,
>>>>>                     "num_objects_missing": 0,
>>>>>                     "num_objects_degraded": 4123,
>>>>>                     "num_objects_misplaced": 0,
>>>>>                     "num_objects_unfound": 0,
>>>>>                     "num_objects_dirty": 4123,
>>>>>                     "num_whiteouts": 0,
>>>>>                     "num_read": 6870027,
>>>>>                     "num_read_kb": 291425720,
>>>>>                     "num_write": 9972836,
>>>>>                     "num_write_kb": 184701865,
>>>>>                     "num_scrub_errors": 1,
>>>>>                     "num_shallow_scrub_errors": 1,
>>>>>                     "num_deep_scrub_errors": 0,
>>>>>                     "num_objects_recovered": 103596,
>>>>>                     "num_bytes_recovered": 424099565959,
>>>>>                     "num_keys_recovered": 110,
>>>>>                     "num_objects_omap": 1,
>>>>>                     "num_objects_hit_set_archive": 0,
>>>>>                     "num_bytes_hit_set_archive": 0,
>>>>>                     "num_flush": 0,
>>>>>                     "num_flush_kb": 0,
>>>>>                     "num_evict": 0,
>>>>>                     "num_evict_kb": 0,
>>>>>                     "num_promote": 0,
>>>>>                     "num_flush_mode_high": 0,
>>>>>                     "num_flush_mode_low": 0,
>>>>>                     "num_evict_mode_some": 0,
>>>>>                     "num_evict_mode_full": 0,
>>>>>                     "num_objects_pinned": 0
>>>>>                 },
>>>>>                 "up": [
>>>>>                     37,
>>>>>                     44,
>>>>>                     16
>>>>>                 ],
>>>>>                 "acting": [
>>>>>                     37,
>>>>>                     44,
>>>>>                     16
>>>>>                 ],
>>>>>                 "blocked_by": [],
>>>>>                 "up_primary": 37,
>>>>>                 "acting_primary": 37
>>>>>             },
>>>>>             "empty": 0,
>>>>>             "dne": 0,
>>>>>             "incomplete": 0,
>>>>>             "last_epoch_started": 30580,
>>>>>             "hit_set_history": {
>>>>>                 "current_last_update": "0'0",
>>>>>                 "history": []
>>>>>             }
>>>>>         },
>>>>>         {
>>>>>             "peer": "44",
>>>>>             "pgid": "0.190",
>>>>>             "last_update": "30581'5420535",
>>>>>             "last_complete": "30570'5390575",
>>>>>             "log_tail": "30537'5387475",
>>>>>             "last_user_version": 5390575,
>>>>>             "last_backfill": "MAX",
>>>>>             "last_backfill_bitwise": 1,
>>>>>             "purged_snaps": "[]",
>>>>>             "history": {
>>>>>                 "epoch_created": 1,
>>>>>                 "last_epoch_started": 30580,
>>>>>                 "last_epoch_clean": 30581,
>>>>>                 "last_epoch_split": 0,
>>>>>                 "last_epoch_marked_full": 0,
>>>>>                 "same_up_since": 30578,
>>>>>                 "same_interval_since": 30579,
>>>>>                 "same_primary_since": 30565,
>>>>>                 "last_scrub": "30554'5390240",
>>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>                 "last_deep_scrub": "30554'5390240",
>>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>>>>             },
>>>>>             "stats": {
>>>>>                 "version": "30568'5390574",
>>>>>                 "reported_seq": "5139846",
>>>>>                 "reported_epoch": "30570",
>>>>>                 "state": "active+undersized+degraded+inconsistent",
>>>>>                 "last_fresh": "2018-07-16 13:36:07.003551",
>>>>>                 "last_change": "2018-07-16 13:36:07.002580",
>>>>>                 "last_active": "2018-07-16 13:36:07.003551",
>>>>>                 "last_peered": "2018-07-16 13:36:07.003551",
>>>>>                 "last_clean": "2018-07-16 13:35:50.922619",
>>>>>                 "last_became_active": "2018-07-16 13:36:07.002580",
>>>>>                 "last_became_peered": "2018-07-16 13:36:07.002580",
>>>>>                 "last_unstale": "2018-07-16 13:36:07.003551",
>>>>>                 "last_undegraded": "2018-07-16 13:36:05.922413",
>>>>>                 "last_fullsized": "2018-07-16 13:36:05.922413",
>>>>>                 "mapping_epoch": 30578,
>>>>>                 "log_start": "30537'5387475",
>>>>>                 "ondisk_log_start": "30537'5387475",
>>>>>                 "created": 1,
>>>>>                 "last_epoch_clean": 30570,
>>>>>                 "parent": "0.0",
>>>>>                 "parent_split_bits": 0,
>>>>>                 "last_scrub": "30554'5390240",
>>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>                 "last_deep_scrub": "30554'5390240",
>>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>>>>                 "log_size": 3099,
>>>>>                 "ondisk_log_size": 3099,
>>>>>                 "stats_invalid": false,
>>>>>                 "dirty_stats_invalid": false,
>>>>>                 "omap_stats_invalid": false,
>>>>>                 "hitset_stats_invalid": false,
>>>>>                 "hitset_bytes_stats_invalid": false,
>>>>>                 "pin_stats_invalid": true,
>>>>>                 "stat_sum": {
>>>>>                     "num_bytes": 16841281553,
>>>>>                     "num_objects": 4123,
>>>>>                     "num_object_clones": 0,
>>>>>                     "num_object_copies": 12369,
>>>>>                     "num_objects_missing_on_primary": 0,
>>>>>                     "num_objects_missing": 0,
>>>>>                     "num_objects_degraded": 4123,
>>>>>                     "num_objects_misplaced": 0,
>>>>>                     "num_objects_unfound": 0,
>>>>>                     "num_objects_dirty": 4123,
>>>>>                     "num_whiteouts": 0,
>>>>>                     "num_read": 6870027,
>>>>>                     "num_read_kb": 291425720,
>>>>>                     "num_write": 9972832,
>>>>>                     "num_write_kb": 184701853,
>>>>>                     "num_scrub_errors": 1,
>>>>>                     "num_shallow_scrub_errors": 1,
>>>>>                     "num_deep_scrub_errors": 0,
>>>>>                     "num_objects_recovered": 103594,
>>>>>                     "num_bytes_recovered": 424091177351,
>>>>>                     "num_keys_recovered": 110,
>>>>>                     "num_objects_omap": 1,
>>>>>                     "num_objects_hit_set_archive": 0,
>>>>>                     "num_bytes_hit_set_archive": 0,
>>>>>                     "num_flush": 0,
>>>>>                     "num_flush_kb": 0,
>>>>>                     "num_evict": 0,
>>>>>                     "num_evict_kb": 0,
>>>>>                     "num_promote": 0,
>>>>>                     "num_flush_mode_high": 0,
>>>>>                     "num_flush_mode_low": 0,
>>>>>                     "num_evict_mode_some": 0,
>>>>>                     "num_evict_mode_full": 0,
>>>>>                     "num_objects_pinned": 0
>>>>>                 },
>>>>>                 "up": [
>>>>>                     37,
>>>>>                     44,
>>>>>                     16
>>>>>                 ],
>>>>>                 "acting": [
>>>>>                     37,
>>>>>                     44,
>>>>>                     16
>>>>>                 ],
>>>>>                 "blocked_by": [],
>>>>>                 "up_primary": 37,
>>>>>                 "acting_primary": 37
>>>>>             },
>>>>>             "empty": 0,
>>>>>             "dne": 0,
>>>>>             "incomplete": 0,
>>>>>             "last_epoch_started": 30580,
>>>>>             "hit_set_history": {
>>>>>                 "current_last_update": "0'0",
>>>>>                 "history": []
>>>>>             }
>>>>>         }
>>>>>     ],
>>>>>     "recovery_state": [
>>>>>         {
>>>>>             "name": "Started\/Primary\/Active",
>>>>>             "enter_time": "2018-07-16 13:37:13.050211",
>>>>>             "might_have_unfound": [
>>>>>                 {
>>>>>                     "osd": "16",
>>>>>                     "status": "already probed"
>>>>>                 },
>>>>>                 {
>>>>>                     "osd": "44",
>>>>>                     "status": "already probed"
>>>>>                 }
>>>>>             ],
>>>>>             "recovery_progress": {
>>>>>                 "backfill_targets": [],
>>>>>                 "waiting_on_backfill": [],
>>>>>                 "last_backfill_started": "MIN",
>>>>>                 "backfill_info": {
>>>>>                     "begin": "MIN",
>>>>>                     "end": "MIN",
>>>>>                     "objects": []
>>>>>                 },
>>>>>                 "peer_backfill_info": [],
>>>>>                 "backfills_in_flight": [],
>>>>>                 "recovering": [],
>>>>>                 "pg_backend": {
>>>>>                     "pull_from_peer": [],
>>>>>                     "pushing": []
>>>>>                 }
>>>>>             },
>>>>>             "scrub": {
>>>>>                 "scrubber.epoch_start": "0",
>>>>>                 "scrubber.active": 0,
>>>>>                 "scrubber.state": "INACTIVE",
>>>>>                 "scrubber.start": "MIN",
>>>>>                 "scrubber.end": "MIN",
>>>>>                 "scrubber.subset_last_update": "0'0",
>>>>>                 "scrubber.deep": false,
>>>>>                 "scrubber.seed": 0,
>>>>>                 "scrubber.waiting_on": 0,
>>>>>                 "scrubber.waiting_on_whom": []
>>>>>             }
>>>>>         },
>>>>>         {
>>>>>             "name": "Started",
>>>>>             "enter_time": "2018-07-16 13:37:11.980264"
>>>>>         }
>>>>>     ],
>>>>>     "agent_state": {}
>>>>> }
>>>>>
>>>>>
>>>>> On 17/07/18 02:19, Brad Hubbard wrote:
>>>>>> Can we see a pg query of 0.190 ?
>>>>>>
>>>>>> On Tue, Jul 17, 2018 at 1:05 AM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> We have a cluster that was running hammer (0.94.10). We hit a bug where
>>>>>>> right after seemingly fixing an inconsistent PG, the primary OSD would
>>>>>>> crash and restart. Next deep-scrub will again return inconsistent PG.
>>>>>>>
>>>>>>> We filled in a bug issue
>>>>>>> https://tracker.ceph.com/issues/24652#change-115654 that was closed
>>>>>>> since it was a known bug fixed in newer versions of Ceph.
>>>>>>>
>>>>>>> Now the cluster is running jewel (10.2.11). There is again one
>>>>>>> inconsistent PG with 1 error which not able to fix and with no
>>>>>>> reference to the inconsistent object.
>>>>>>>
>>>>>>>
>>>>>>> scrub 0 missing, 1 inconsistent objects
>>>>>>> scrub 1 errors
>>>>>>>
>>>>>>>
>>>>>>> We have the logs with debug level 20 while repairing the PG. The one for
>>>>>>> the primary OSD is: 94e20123-fcda-49d7-98a2-919507dfbc92
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Kind regards,
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ana Avilés
>>>>>>> Greenhost - sustainable hosting & digital security
>>>>>>> E: ana@xxxxxxxxxxxx
>>>>>>> T: +31 20 4890444
>>>>>>> W: https://greenhost.nl
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Ana Avilés
>>>>> Greenhost - sustainable hosting & digital security
>>>>> E: ana@xxxxxxxxxxxx
>>>>> T: +31 20 4890444
>>>>> W: https://greenhost.nl
>>>>
>>>>
>>>>
>>>
>>> --
>>> Ana Avilés
>>> Greenhost - sustainable hosting & digital security
>>> E: ana@xxxxxxxxxxxx
>>> T: +31 20 4890444
>>> W: https://greenhost.nl
>>
>>
>>
>
> --
> Ana Avilés
> Greenhost - sustainable hosting & digital security
> E: ana@xxxxxxxxxxxx
> T: +31 20 4890444
> W: https://greenhost.nl



-- 
Cheers,
Brad
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux