Re: Inconsistent PG with 1 inconsistent object not referenced in the log

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 19/07/18 03:25, Brad Hubbard wrote:
> On Wed, Jul 18, 2018 at 6:25 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>> Ah ok. Then I think it confirms what you are saying. Here it is:
>>
>> $ rados list-inconsistent-obj 0.190
>> {"epoch":30579,"inconsistents":[{"object":{"name":"rbd_data.15cec2ae8944a.000000000015c7d6","nspace":"","locator":"","snap":"head","version":5498082},"errors":["object_info_inconsistency","attr_value_mismatch"],"union_shard_errors":[],"selected_object_info":"0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>> 7dd0d0bd od ffffffff alloc_hint [4194304
>> 4194304])","shards":[{"osd":16,"errors":[],"size":4194304,"object_info":"0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(26812'5142772
>> client.1044166.0:393154060 dirty|data_digest|omap_digest s 4194304 uv
>> 5142772 dd 264b7d0d od ffffffff alloc_hint [0
>> 0])","attrs":[{"name":"_","value":"DwgMAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAAD0eE4AAAAAALxoAADzeE4AAAAAALxoAAACAhUAAAAIxu4PAAAAAAAMDm8XAAAAAAAAAAAAAEAAAAAAAOJmPVsEa24SAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9HhOAAAAAAAAAAAAAAAAAAA0AAAA4mY9W1Q\/lBwNfUsm\/\/\/\/\/w==","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]},{"osd":37,"errors":[],"size":4194304,"object_info":"0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(26812'5142772
>> client.1044166.0:393154060 dirty|data_digest|omap_digest s 4194304 uv
>> 5142772 dd 264b7d0d od ffffffff alloc_hint [0
>> 0])","attrs":[{"name":"_","value":"DwgMAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAAD0eE4AAAAAALxoAADzeE4AAAAAALxoAAACAhUAAAAIxu4PAAAAAAAMDm8XAAAAAAAAAAAAAEAAAAAAAOJmPVsEa24SAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9HhOAAAAAAAAAAAAAAAAAAA0AAAA4mY9W1Q\/lBwNfUsm\/\/\/\/\/w==","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]},{"osd":44,"errors":[],"size":4194304,"object_info":"0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>> 7dd0d0bd od ffffffff alloc_hint [4194304
>> 4194304])","attrs":[{"name":"_","value":"EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAwNGRiMGX+\/\/\/\/\/\/\/\/\/4Zx7B4AAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABREFQAAAAAAEl3AADi5FMAAAAAAEl3AAACAhUAAAAEJAAAAAAAAABtf4IAAAAAAAAAAAAAAEAAAAAAAPpfSVvV\/VMKAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4uRTAAAAAAAAAAAAAAAAAAA0AAAA+l9JW4x6Rw290NB9\/\/\/\/\/wAAQAAAAAAAAABAAAAAAAAAAAAA","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]}]}]}
>>
>>
>> To determine which is the right version of the object, is there no
>> timestamp that can tell us? maybe the object got updated to osd.37 and
>> osd.16 while osd.44 was down, and there comes the missmatch? because
>> otherwise, shouldn't the authoritative osd be leading?
> 
> The primary will be serving IO requests so the version on osd 37 is
> what will be read by clients so I guess going with that is reasonable.
> 

OK good.

> The version  on osd 44 was actually modified after the others (epoch
> 30537, as opposed to epoch 26812) but the sizes are all the same so
> the difference may be trivial (metadat only perhaps) and, according to
> the last request id (osd.36.0:8552301) came from another osd (36)
> which is kind of unexpected. Is there, or was there, a cache tier
> involved?

Ah OK, very interesting! No, no cache tier involved. So at one point
osd.36 was part of the PG set?

> 
> If you want to go with the version that is currently being used (37
> and 16) you can just quiesce the rbd image clients and do a rados get,
> then a rados put of the object. I would suggest taking a backup of the
> object from osd 44 using the ceph-objectstore-tool although, as I
> said, that version will not be being used so I doubt you will miss it.
> 

Great, will do that. Thanks a lot for help.

>>
>> Regards,
>> Ana
>>
>>
>> On 18/07/18 05:24, Brad Hubbard wrote:
>>> OK. What I *meant* to ask for was the output of "rados
>>> list-inconsistent-obj 0.190" (might still be worth posting that but it
>>> should just confirm findings below).
>>>
>>>
>>> The relevant lines from the log are below.
>>>
>>> 2018-07-16 12:24:45.940910 7fb422340700 2 osd.37 pg_epoch: 30554
>>> pg[0.190( v 30554'5390084 (30537'5387075,30554'5390084]
>>> local-les=30554 n=4123 ec=1 les/c/f 30554/30554/0 30552/30553/30542)
>>> [37,44,16] r=0 lpr=30553 crt=30554'5390079 lcod 30554'5390083 mlcod
>>> 30554'5390083 active+clean+scrubbing+deep+inconsistent+repair] 0.190
>>> shard 16: soid 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head
>>> data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44,
>>> data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
>>> mismatch '_' 2018-07-16 12:24:45.940941 7fb422340700 -1
>>> log_channel(cluster) log [ERR] : 0.190 shard 16: soid
>>> 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest
>>> 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest
>>> 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
>>> mismatch '_' 2018-07-16 12:24:45.940957 7fb422340700 -1
>>> log_channel(cluster) log [ERR] : 0.190 shard 37: soid
>>> 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest
>>> 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest
>>> 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
>>> mismatch '_'
>>>
>>> They show that osd 44 has been chosen as the authoritative shard and
>>> and it has a data digest for this object of 0x7dd0d0bd and that the
>>> data digest in the authoritative object info is also 0x7dd0d0bd.
>>>
>>> Shard 16 however, has a data digest of 0x264b7d0d and so does shard 37
>>> so the data for this object on osds 16 and 37 is different to that on
>>> osd 44.
>>>
>>> Basically, you'll need to pick which is the "right" copy of the object
>>> (I can't tell you) quiesce traffic to/from that object (rbd image) and
>>> get/put that object back into the cluster to fix the mismatch. Since
>>> this appears to be an rbd image this could potentially result in an
>>> image that needs an fsck or equivalent IIUC.
>>>
>>>
>>> On Tue, Jul 17, 2018 at 10:06 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>>>
>>>> Hi Brad,
>>>>
>>>> Here it is:
>>>>
>>>> {
>>>>     "state": "active+clean+inconsistent",
>>>>     "snap_trimq": "[]",
>>>>     "epoch": 30581,
>>>>     "up": [
>>>>         37,
>>>>         44,
>>>>         16
>>>>     ],
>>>>     "acting": [
>>>>         37,
>>>>         44,
>>>>         16
>>>>     ],
>>>>     "actingbackfill": [
>>>>         "16",
>>>>         "37",
>>>>         "44"
>>>>     ],
>>>>     "info": {
>>>>         "pgid": "0.190",
>>>>         "last_update": "30581'5420535",
>>>>         "last_complete": "30581'5420535",
>>>>         "log_tail": "30581'5417484",
>>>>         "last_user_version": 5420535,
>>>>         "last_backfill": "MAX",
>>>>         "last_backfill_bitwise": 0,
>>>>         "purged_snaps": "[]",
>>>>         "history": {
>>>>             "epoch_created": 1,
>>>>             "last_epoch_started": 30580,
>>>>             "last_epoch_clean": 30581,
>>>>             "last_epoch_split": 0,
>>>>             "last_epoch_marked_full": 0,
>>>>             "same_up_since": 30578,
>>>>             "same_interval_since": 30579,
>>>>             "same_primary_since": 30565,
>>>>             "last_scrub": "30554'5390240",
>>>>             "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>             "last_deep_scrub": "30554'5390240",
>>>>             "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>             "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>>>         },
>>>>         "stats": {
>>>>             "version": "30581'5420535",
>>>>             "reported_seq": "5155553",
>>>>             "reported_epoch": "30581",
>>>>             "state": "active+clean+inconsistent",
>>>>             "last_fresh": "2018-07-17 12:02:13.002428",
>>>>             "last_change": "2018-07-16 13:37:24.020403",
>>>>             "last_active": "2018-07-17 12:02:13.002428",
>>>>             "last_peered": "2018-07-17 12:02:13.002428",
>>>>             "last_clean": "2018-07-17 12:02:13.002428",
>>>>             "last_became_active": "2018-07-16 13:37:13.173821",
>>>>             "last_became_peered": "2018-07-16 13:37:13.173821",
>>>>             "last_unstale": "2018-07-17 12:02:13.002428",
>>>>             "last_undegraded": "2018-07-17 12:02:13.002428",
>>>>             "last_fullsized": "2018-07-17 12:02:13.002428",
>>>>             "mapping_epoch": 30578,
>>>>             "log_start": "30581'5417484",
>>>>             "ondisk_log_start": "30581'5417484",
>>>>             "created": 1,
>>>>             "last_epoch_clean": 30581,
>>>>             "parent": "0.0",
>>>>             "parent_split_bits": 0,
>>>>             "last_scrub": "30554'5390240",
>>>>             "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>             "last_deep_scrub": "30554'5390240",
>>>>             "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>             "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>>>             "log_size": 3051,
>>>>             "ondisk_log_size": 3051,
>>>>             "stats_invalid": false,
>>>>             "dirty_stats_invalid": false,
>>>>             "omap_stats_invalid": false,
>>>>             "hitset_stats_invalid": false,
>>>>             "hitset_bytes_stats_invalid": false,
>>>>             "pin_stats_invalid": true,
>>>>             "stat_sum": {
>>>>                 "num_bytes": 16946139153,
>>>>                 "num_objects": 4148,
>>>>                 "num_object_clones": 0,
>>>>                 "num_object_copies": 12444,
>>>>                 "num_objects_missing_on_primary": 0,
>>>>                 "num_objects_missing": 0,
>>>>                 "num_objects_degraded": 0,
>>>>                 "num_objects_misplaced": 0,
>>>>                 "num_objects_unfound": 0,
>>>>                 "num_objects_dirty": 4148,
>>>>                 "num_whiteouts": 0,
>>>>                 "num_read": 6895104,
>>>>                 "num_read_kb": 292185552,
>>>>                 "num_write": 10032749,
>>>>                 "num_write_kb": 185167701,
>>>>                 "num_scrub_errors": 1,
>>>>                 "num_shallow_scrub_errors": 1,
>>>>                 "num_deep_scrub_errors": 0,
>>>>                 "num_objects_recovered": 103598,
>>>>                 "num_bytes_recovered": 424107954567,
>>>>                 "num_keys_recovered": 110,
>>>>                 "num_objects_omap": 1,
>>>>                 "num_objects_hit_set_archive": 0,
>>>>                 "num_bytes_hit_set_archive": 0,
>>>>                 "num_flush": 0,
>>>>                 "num_flush_kb": 0,
>>>>                 "num_evict": 0,
>>>>                 "num_evict_kb": 0,
>>>>                 "num_promote": 0,
>>>>                 "num_flush_mode_high": 0,
>>>>                 "num_flush_mode_low": 0,
>>>>                 "num_evict_mode_some": 0,
>>>>                 "num_evict_mode_full": 0,
>>>>                 "num_objects_pinned": 0
>>>>             },
>>>>             "up": [
>>>>                 37,
>>>>                 44,
>>>>                 16
>>>>             ],
>>>>             "acting": [
>>>>                 37,
>>>>                 44,
>>>>                 16
>>>>             ],
>>>>             "blocked_by": [],
>>>>             "up_primary": 37,
>>>>             "acting_primary": 37
>>>>         },
>>>>         "empty": 0,
>>>>         "dne": 0,
>>>>         "incomplete": 0,
>>>>         "last_epoch_started": 30580,
>>>>         "hit_set_history": {
>>>>             "current_last_update": "0'0",
>>>>             "history": []
>>>>         }
>>>>     },
>>>>     "peer_info": [
>>>>         {
>>>>             "peer": "16",
>>>>             "pgid": "0.190",
>>>>             "last_update": "30581'5420535",
>>>>             "last_complete": "30581'5420535",
>>>>             "log_tail": "30537'5387475",
>>>>             "last_user_version": 5390577,
>>>>             "last_backfill": "MAX",
>>>>             "last_backfill_bitwise": 1,
>>>>             "purged_snaps": "[]",
>>>>             "history": {
>>>>                 "epoch_created": 1,
>>>>                 "last_epoch_started": 30580,
>>>>                 "last_epoch_clean": 30581,
>>>>                 "last_epoch_split": 0,
>>>>                 "last_epoch_marked_full": 0,
>>>>                 "same_up_since": 30578,
>>>>                 "same_interval_since": 30579,
>>>>                 "same_primary_since": 30565,
>>>>                 "last_scrub": "30554'5390240",
>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>                 "last_deep_scrub": "30554'5390240",
>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>>>             },
>>>>             "stats": {
>>>>                 "version": "30570'5390575",
>>>>                 "reported_seq": "5139870",
>>>>                 "reported_epoch": "30576",
>>>>                 "state": "active+undersized+degraded+inconsistent",
>>>>                 "last_fresh": "2018-07-16 13:36:40.284756",
>>>>                 "last_change": "2018-07-16 13:36:40.284277",
>>>>                 "last_active": "2018-07-16 13:36:40.284756",
>>>>                 "last_peered": "2018-07-16 13:36:40.284756",
>>>>                 "last_clean": "2018-07-16 13:36:23.558224",
>>>>                 "last_became_active": "2018-07-16 13:36:40.284277",
>>>>                 "last_became_peered": "2018-07-16 13:36:40.284277",
>>>>                 "last_unstale": "2018-07-16 13:36:40.284756",
>>>>                 "last_undegraded": "2018-07-16 13:36:40.203248",
>>>>                 "last_fullsized": "2018-07-16 13:36:40.203248",
>>>>                 "mapping_epoch": 30578,
>>>>                 "log_start": "30537'5387475",
>>>>                 "ondisk_log_start": "30537'5387475",
>>>>                 "created": 1,
>>>>                 "last_epoch_clean": 30576,
>>>>                 "parent": "0.0",
>>>>                 "parent_split_bits": 0,
>>>>                 "last_scrub": "30554'5390240",
>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>                 "last_deep_scrub": "30554'5390240",
>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>>>                 "log_size": 3100,
>>>>                 "ondisk_log_size": 3100,
>>>>                 "stats_invalid": false,
>>>>                 "dirty_stats_invalid": false,
>>>>                 "omap_stats_invalid": false,
>>>>                 "hitset_stats_invalid": false,
>>>>                 "hitset_bytes_stats_invalid": false,
>>>>                 "pin_stats_invalid": true,
>>>>                 "stat_sum": {
>>>>                     "num_bytes": 16841281553,
>>>>                     "num_objects": 4123,
>>>>                     "num_object_clones": 0,
>>>>                     "num_object_copies": 12369,
>>>>                     "num_objects_missing_on_primary": 0,
>>>>                     "num_objects_missing": 0,
>>>>                     "num_objects_degraded": 4123,
>>>>                     "num_objects_misplaced": 0,
>>>>                     "num_objects_unfound": 0,
>>>>                     "num_objects_dirty": 4123,
>>>>                     "num_whiteouts": 0,
>>>>                     "num_read": 6870027,
>>>>                     "num_read_kb": 291425720,
>>>>                     "num_write": 9972836,
>>>>                     "num_write_kb": 184701865,
>>>>                     "num_scrub_errors": 1,
>>>>                     "num_shallow_scrub_errors": 1,
>>>>                     "num_deep_scrub_errors": 0,
>>>>                     "num_objects_recovered": 103596,
>>>>                     "num_bytes_recovered": 424099565959,
>>>>                     "num_keys_recovered": 110,
>>>>                     "num_objects_omap": 1,
>>>>                     "num_objects_hit_set_archive": 0,
>>>>                     "num_bytes_hit_set_archive": 0,
>>>>                     "num_flush": 0,
>>>>                     "num_flush_kb": 0,
>>>>                     "num_evict": 0,
>>>>                     "num_evict_kb": 0,
>>>>                     "num_promote": 0,
>>>>                     "num_flush_mode_high": 0,
>>>>                     "num_flush_mode_low": 0,
>>>>                     "num_evict_mode_some": 0,
>>>>                     "num_evict_mode_full": 0,
>>>>                     "num_objects_pinned": 0
>>>>                 },
>>>>                 "up": [
>>>>                     37,
>>>>                     44,
>>>>                     16
>>>>                 ],
>>>>                 "acting": [
>>>>                     37,
>>>>                     44,
>>>>                     16
>>>>                 ],
>>>>                 "blocked_by": [],
>>>>                 "up_primary": 37,
>>>>                 "acting_primary": 37
>>>>             },
>>>>             "empty": 0,
>>>>             "dne": 0,
>>>>             "incomplete": 0,
>>>>             "last_epoch_started": 30580,
>>>>             "hit_set_history": {
>>>>                 "current_last_update": "0'0",
>>>>                 "history": []
>>>>             }
>>>>         },
>>>>         {
>>>>             "peer": "44",
>>>>             "pgid": "0.190",
>>>>             "last_update": "30581'5420535",
>>>>             "last_complete": "30570'5390575",
>>>>             "log_tail": "30537'5387475",
>>>>             "last_user_version": 5390575,
>>>>             "last_backfill": "MAX",
>>>>             "last_backfill_bitwise": 1,
>>>>             "purged_snaps": "[]",
>>>>             "history": {
>>>>                 "epoch_created": 1,
>>>>                 "last_epoch_started": 30580,
>>>>                 "last_epoch_clean": 30581,
>>>>                 "last_epoch_split": 0,
>>>>                 "last_epoch_marked_full": 0,
>>>>                 "same_up_since": 30578,
>>>>                 "same_interval_since": 30579,
>>>>                 "same_primary_since": 30565,
>>>>                 "last_scrub": "30554'5390240",
>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>                 "last_deep_scrub": "30554'5390240",
>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>>>             },
>>>>             "stats": {
>>>>                 "version": "30568'5390574",
>>>>                 "reported_seq": "5139846",
>>>>                 "reported_epoch": "30570",
>>>>                 "state": "active+undersized+degraded+inconsistent",
>>>>                 "last_fresh": "2018-07-16 13:36:07.003551",
>>>>                 "last_change": "2018-07-16 13:36:07.002580",
>>>>                 "last_active": "2018-07-16 13:36:07.003551",
>>>>                 "last_peered": "2018-07-16 13:36:07.003551",
>>>>                 "last_clean": "2018-07-16 13:35:50.922619",
>>>>                 "last_became_active": "2018-07-16 13:36:07.002580",
>>>>                 "last_became_peered": "2018-07-16 13:36:07.002580",
>>>>                 "last_unstale": "2018-07-16 13:36:07.003551",
>>>>                 "last_undegraded": "2018-07-16 13:36:05.922413",
>>>>                 "last_fullsized": "2018-07-16 13:36:05.922413",
>>>>                 "mapping_epoch": 30578,
>>>>                 "log_start": "30537'5387475",
>>>>                 "ondisk_log_start": "30537'5387475",
>>>>                 "created": 1,
>>>>                 "last_epoch_clean": 30570,
>>>>                 "parent": "0.0",
>>>>                 "parent_split_bits": 0,
>>>>                 "last_scrub": "30554'5390240",
>>>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>                 "last_deep_scrub": "30554'5390240",
>>>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>>>                 "log_size": 3099,
>>>>                 "ondisk_log_size": 3099,
>>>>                 "stats_invalid": false,
>>>>                 "dirty_stats_invalid": false,
>>>>                 "omap_stats_invalid": false,
>>>>                 "hitset_stats_invalid": false,
>>>>                 "hitset_bytes_stats_invalid": false,
>>>>                 "pin_stats_invalid": true,
>>>>                 "stat_sum": {
>>>>                     "num_bytes": 16841281553,
>>>>                     "num_objects": 4123,
>>>>                     "num_object_clones": 0,
>>>>                     "num_object_copies": 12369,
>>>>                     "num_objects_missing_on_primary": 0,
>>>>                     "num_objects_missing": 0,
>>>>                     "num_objects_degraded": 4123,
>>>>                     "num_objects_misplaced": 0,
>>>>                     "num_objects_unfound": 0,
>>>>                     "num_objects_dirty": 4123,
>>>>                     "num_whiteouts": 0,
>>>>                     "num_read": 6870027,
>>>>                     "num_read_kb": 291425720,
>>>>                     "num_write": 9972832,
>>>>                     "num_write_kb": 184701853,
>>>>                     "num_scrub_errors": 1,
>>>>                     "num_shallow_scrub_errors": 1,
>>>>                     "num_deep_scrub_errors": 0,
>>>>                     "num_objects_recovered": 103594,
>>>>                     "num_bytes_recovered": 424091177351,
>>>>                     "num_keys_recovered": 110,
>>>>                     "num_objects_omap": 1,
>>>>                     "num_objects_hit_set_archive": 0,
>>>>                     "num_bytes_hit_set_archive": 0,
>>>>                     "num_flush": 0,
>>>>                     "num_flush_kb": 0,
>>>>                     "num_evict": 0,
>>>>                     "num_evict_kb": 0,
>>>>                     "num_promote": 0,
>>>>                     "num_flush_mode_high": 0,
>>>>                     "num_flush_mode_low": 0,
>>>>                     "num_evict_mode_some": 0,
>>>>                     "num_evict_mode_full": 0,
>>>>                     "num_objects_pinned": 0
>>>>                 },
>>>>                 "up": [
>>>>                     37,
>>>>                     44,
>>>>                     16
>>>>                 ],
>>>>                 "acting": [
>>>>                     37,
>>>>                     44,
>>>>                     16
>>>>                 ],
>>>>                 "blocked_by": [],
>>>>                 "up_primary": 37,
>>>>                 "acting_primary": 37
>>>>             },
>>>>             "empty": 0,
>>>>             "dne": 0,
>>>>             "incomplete": 0,
>>>>             "last_epoch_started": 30580,
>>>>             "hit_set_history": {
>>>>                 "current_last_update": "0'0",
>>>>                 "history": []
>>>>             }
>>>>         }
>>>>     ],
>>>>     "recovery_state": [
>>>>         {
>>>>             "name": "Started\/Primary\/Active",
>>>>             "enter_time": "2018-07-16 13:37:13.050211",
>>>>             "might_have_unfound": [
>>>>                 {
>>>>                     "osd": "16",
>>>>                     "status": "already probed"
>>>>                 },
>>>>                 {
>>>>                     "osd": "44",
>>>>                     "status": "already probed"
>>>>                 }
>>>>             ],
>>>>             "recovery_progress": {
>>>>                 "backfill_targets": [],
>>>>                 "waiting_on_backfill": [],
>>>>                 "last_backfill_started": "MIN",
>>>>                 "backfill_info": {
>>>>                     "begin": "MIN",
>>>>                     "end": "MIN",
>>>>                     "objects": []
>>>>                 },
>>>>                 "peer_backfill_info": [],
>>>>                 "backfills_in_flight": [],
>>>>                 "recovering": [],
>>>>                 "pg_backend": {
>>>>                     "pull_from_peer": [],
>>>>                     "pushing": []
>>>>                 }
>>>>             },
>>>>             "scrub": {
>>>>                 "scrubber.epoch_start": "0",
>>>>                 "scrubber.active": 0,
>>>>                 "scrubber.state": "INACTIVE",
>>>>                 "scrubber.start": "MIN",
>>>>                 "scrubber.end": "MIN",
>>>>                 "scrubber.subset_last_update": "0'0",
>>>>                 "scrubber.deep": false,
>>>>                 "scrubber.seed": 0,
>>>>                 "scrubber.waiting_on": 0,
>>>>                 "scrubber.waiting_on_whom": []
>>>>             }
>>>>         },
>>>>         {
>>>>             "name": "Started",
>>>>             "enter_time": "2018-07-16 13:37:11.980264"
>>>>         }
>>>>     ],
>>>>     "agent_state": {}
>>>> }
>>>>
>>>>
>>>> On 17/07/18 02:19, Brad Hubbard wrote:
>>>>> Can we see a pg query of 0.190 ?
>>>>>
>>>>> On Tue, Jul 17, 2018 at 1:05 AM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> We have a cluster that was running hammer (0.94.10). We hit a bug where
>>>>>> right after seemingly fixing an inconsistent PG, the primary OSD would
>>>>>> crash and restart. Next deep-scrub will again return inconsistent PG.
>>>>>>
>>>>>> We filled in a bug issue
>>>>>> https://tracker.ceph.com/issues/24652#change-115654 that was closed
>>>>>> since it was a known bug fixed in newer versions of Ceph.
>>>>>>
>>>>>> Now the cluster is running jewel (10.2.11). There is again one
>>>>>> inconsistent PG with 1 error which not able to fix and with no
>>>>>> reference to the inconsistent object.
>>>>>>
>>>>>>
>>>>>> scrub 0 missing, 1 inconsistent objects
>>>>>> scrub 1 errors
>>>>>>
>>>>>>
>>>>>> We have the logs with debug level 20 while repairing the PG. The one for
>>>>>> the primary OSD is: 94e20123-fcda-49d7-98a2-919507dfbc92
>>>>>>
>>>>>> Thanks!
>>>>>> Kind regards,
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ana Avilés
>>>>>> Greenhost - sustainable hosting & digital security
>>>>>> E: ana@xxxxxxxxxxxx
>>>>>> T: +31 20 4890444
>>>>>> W: https://greenhost.nl
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Ana Avilés
>>>> Greenhost - sustainable hosting & digital security
>>>> E: ana@xxxxxxxxxxxx
>>>> T: +31 20 4890444
>>>> W: https://greenhost.nl
>>>
>>>
>>>
>>
>> --
>> Ana Avilés
>> Greenhost - sustainable hosting & digital security
>> E: ana@xxxxxxxxxxxx
>> T: +31 20 4890444
>> W: https://greenhost.nl
> 
> 
> 

-- 
Ana Avilés
Greenhost - sustainable hosting & digital security
E: ana@xxxxxxxxxxxx
T: +31 20 4890444
W: https://greenhost.nl
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux