I replaced the object with rados as suggested, and right after forced a deep scrub which got us back to HEALTH_OK However, now we are on another inconsistent PG status. For the same rbd image, but a different object. The object that was also mentioned in the previous inconsistent PG. But, now its worse because we have a data_digest mismatch. I wondered if this tells anything about the previous substitution, or I should just go the same path replacing this object with rados. pg 0.186 is active+clean+inconsistent, acting [36,26,44] rados list-inconsistent-obj 0.186 { "epoch": 30586, "inconsistents": [ { "object": { "name": "rbd_data.15cec2ae8944a.000000000004db0e", "nspace": "", "locator": "", "snap": "head", "version": 5493833 }, "errors": [ "object_info_inconsistency", "data_digest_mismatch", "attr_value_mismatch" ], "union_shard_errors": [ "data_digest_mismatch_oi" ], "selected_object_info": "0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(30587'5493833 client.1246390.0:1 dirty|data_digest|omap_digest s 4194304 uv 5493833 dd 264b7d0d od ffffffff alloc_hint [0 0])", "shards": [ { "osd": 26, "errors": [ "data_digest_mismatch_oi" ], "size": 4194304, "omap_digest": "0xffffffff", "data_digest": "0x7dd0d0bd", "object_info": "0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201 osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd 7dd0d0bd od ffffffff alloc_hint [4194304 4194304])", "attrs": [ { "name": "_", "value": "EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAwNGRiMGX+\/\/\/\/\/\/\/\/\/4Zx7B4AAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABREFQAAAAAAEl3AADi5FMAAAAAAEl3AAACAhUAAAAEJAAAAAAAAABtf4IAAAAAAAAAAAAAAEAAAAAAAPpfSVvV\/VMKAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4uRTAAAAAAAAAAAAAAAAAAA0AAAA+l9JW4x6Rw290NB9\/\/\/\/\/wAAQAAAAAAAAABAAAAAAAAAAAAA", "Base64": true }, { "name": "snapset", "value": "AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==", "Base64": true } ] }, { "osd": 36, "errors": [ "data_digest_mismatch_oi" ], "size": 4194304, "omap_digest": "0xffffffff", "data_digest": "0x7dd0d0bd", "object_info": "0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201 osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd 7dd0d0bd od ffffffff alloc_hint [4194304 4194304])", "attrs": [ { "name": "_", "value": "EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAwNGRiMGX+\/\/\/\/\/\/\/\/\/4Zx7B4AAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABREFQAAAAAAEl3AADi5FMAAAAAAEl3AAACAhUAAAAEJAAAAAAAAABtf4IAAAAAAAAAAAAAAEAAAAAAAPpfSVvV\/VMKAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4uRTAAAAAAAAAAAAAAAAAAA0AAAA+l9JW4x6Rw290NB9\/\/\/\/\/wAAQAAAAAAAAABAAAAAAAAAAAAA", "Base64": true }, ] }, { "osd": 44, "errors": [], "size": 4194304, "omap_digest": "0xffffffff", "data_digest": "0x264b7d0d", "object_info": "0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(30587'5493833 client.1246390.0:1 dirty|data_digest|omap_digest s 4194304 uv 5493833 dd 264b7d0d od ffffffff alloc_hint [0 0])", "attrs": [ { "name": "_", "value": "EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABJ1FMAAAAAAHt3AAD0eE4AAAAAALxoAAACAhUAAAAItgQTAAAAAAABAAAAAAAAAAAAAAAAAEAAAAAAAIbaUVtSy\/8jAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAASdRTAAAAAAAAAAAAAAAAAAA0AAAAhtpRW\/VN8CQNfUsm\/\/\/\/\/wAAAAAAAAAAAAAAAAAAAAAAAAAA", "Base64": true }, { "name": "snapset", "value": "AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==", "Base64": true } ] } ] } ] } On 20/07/18 00:27, Brad Hubbard wrote: > On Fri, Jul 20, 2018 at 1:05 AM, Ana Aviles <ana@xxxxxxxxxxxx> wrote: >> >> >> On 19/07/18 03:25, Brad Hubbard wrote: >>> On Wed, Jul 18, 2018 at 6:25 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote: >>>> Ah ok. Then I think it confirms what you are saying. Here it is: >>>> >>>> $ rados list-inconsistent-obj 0.190 >>>> {"epoch":30579,"inconsistents":[{"object":{"name":"rbd_data.15cec2ae8944a.000000000015c7d6","nspace":"","locator":"","snap":"head","version":5498082},"errors":["object_info_inconsistency","attr_value_mismatch"],"union_shard_errors":[],"selected_object_info":"0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201 >>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd >>>> 7dd0d0bd od ffffffff alloc_hint [4194304 >>>> 4194304])","shards":[{"osd":16,"errors":[],"size":4194304,"object_info":"0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(26812'5142772 >>>> client.1044166.0:393154060 dirty|data_digest|omap_digest s 4194304 uv >>>> 5142772 dd 264b7d0d od ffffffff alloc_hint [0 >>>> 0])","attrs":[{"name":"_","value":"DwgMAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAAD0eE4AAAAAALxoAADzeE4AAAAAALxoAAACAhUAAAAIxu4PAAAAAAAMDm8XAAAAAAAAAAAAAEAAAAAAAOJmPVsEa24SAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9HhOAAAAAAAAAAAAAAAAAAA0AAAA4mY9W1Q\/lBwNfUsm\/\/\/\/\/w==","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]},{"osd":37,"errors":[],"size":4194304,"object_info":"0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(26812'5142772 >>>> client.1044166.0:393154060 dirty|data_digest|omap_digest s 4194304 uv >>>> 5142772 dd 264b7d0d od ffffffff alloc_hint [0 >>>> 0])","attrs":[{"name":"_","value":"DwgMAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAAD0eE4AAAAAALxoAADzeE4AAAAAALxoAAACAhUAAAAIxu4PAAAAAAAMDm8XAAAAAAAAAAAAAEAAAAAAAOJmPVsEa24SAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9HhOAAAAAAAAAAAAAAAAAAA0AAAA4mY9W1Q\/lBwNfUsm\/\/\/\/\/w==","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]},{"osd":44,"errors":[],"size":4194304,"object_info":"0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201 >>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd >>>> 7dd0d0bd od ffffffff alloc_hint [4194304 >>>> 4194304])","attrs":[{"name":"_","value":"EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAwNGRiMGX+\/\/\/\/\/\/\/\/\/4Zx7B4AAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABREFQAAAAAAEl3AADi5FMAAAAAAEl3AAACAhUAAAAEJAAAAAAAAABtf4IAAAAAAAAAAAAAAEAAAAAAAPpfSVvV\/VMKAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4uRTAAAAAAAAAAAAAAAAAAA0AAAA+l9JW4x6Rw290NB9\/\/\/\/\/wAAQAAAAAAAAABAAAAAAAAAAAAA","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]}]}]} >>>> >>>> >>>> To determine which is the right version of the object, is there no >>>> timestamp that can tell us? maybe the object got updated to osd.37 and >>>> osd.16 while osd.44 was down, and there comes the missmatch? because >>>> otherwise, shouldn't the authoritative osd be leading? >>> >>> The primary will be serving IO requests so the version on osd 37 is >>> what will be read by clients so I guess going with that is reasonable. >>> >> >> OK good. >> >>> The version on osd 44 was actually modified after the others (epoch >>> 30537, as opposed to epoch 26812) but the sizes are all the same so >>> the difference may be trivial (metadat only perhaps) and, according to >>> the last request id (osd.36.0:8552301) came from another osd (36) >>> which is kind of unexpected. Is there, or was there, a cache tier >>> involved? >> >> Ah OK, very interesting! No, no cache tier involved. So at one point >> osd.36 was part of the PG set? > > Maybe, all we know is that the last request came from osd.36 which is > unusual because changes in this context generally only come from > clients. A cache tier might explain it which is why I mentioned it. > >> >>> >>> If you want to go with the version that is currently being used (37 >>> and 16) you can just quiesce the rbd image clients and do a rados get, >>> then a rados put of the object. I would suggest taking a backup of the >>> object from osd 44 using the ceph-objectstore-tool although, as I >>> said, that version will not be being used so I doubt you will miss it. >>> >> >> Great, will do that. Thanks a lot for help. > > yw. > >> >>>> >>>> Regards, >>>> Ana >>>> >>>> >>>> On 18/07/18 05:24, Brad Hubbard wrote: >>>>> OK. What I *meant* to ask for was the output of "rados >>>>> list-inconsistent-obj 0.190" (might still be worth posting that but it >>>>> should just confirm findings below). >>>>> >>>>> >>>>> The relevant lines from the log are below. >>>>> >>>>> 2018-07-16 12:24:45.940910 7fb422340700 2 osd.37 pg_epoch: 30554 >>>>> pg[0.190( v 30554'5390084 (30537'5387075,30554'5390084] >>>>> local-les=30554 n=4123 ec=1 les/c/f 30554/30554/0 30552/30553/30542) >>>>> [37,44,16] r=0 lpr=30553 crt=30554'5390079 lcod 30554'5390083 mlcod >>>>> 30554'5390083 active+clean+scrubbing+deep+inconsistent+repair] 0.190 >>>>> shard 16: soid 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head >>>>> data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, >>>>> data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi >>>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201 >>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd >>>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value >>>>> mismatch '_' 2018-07-16 12:24:45.940941 7fb422340700 -1 >>>>> log_channel(cluster) log [ERR] : 0.190 shard 16: soid >>>>> 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest >>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest >>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi >>>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201 >>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd >>>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value >>>>> mismatch '_' 2018-07-16 12:24:45.940957 7fb422340700 -1 >>>>> log_channel(cluster) log [ERR] : 0.190 shard 37: soid >>>>> 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest >>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest >>>>> 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi >>>>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201 >>>>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd >>>>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value >>>>> mismatch '_' >>>>> >>>>> They show that osd 44 has been chosen as the authoritative shard and >>>>> and it has a data digest for this object of 0x7dd0d0bd and that the >>>>> data digest in the authoritative object info is also 0x7dd0d0bd. >>>>> >>>>> Shard 16 however, has a data digest of 0x264b7d0d and so does shard 37 >>>>> so the data for this object on osds 16 and 37 is different to that on >>>>> osd 44. >>>>> >>>>> Basically, you'll need to pick which is the "right" copy of the object >>>>> (I can't tell you) quiesce traffic to/from that object (rbd image) and >>>>> get/put that object back into the cluster to fix the mismatch. Since >>>>> this appears to be an rbd image this could potentially result in an >>>>> image that needs an fsck or equivalent IIUC. >>>>> >>>>> >>>>> On Tue, Jul 17, 2018 at 10:06 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote: >>>>>> >>>>>> Hi Brad, >>>>>> >>>>>> Here it is: >>>>>> >>>>>> { >>>>>> "state": "active+clean+inconsistent", >>>>>> "snap_trimq": "[]", >>>>>> "epoch": 30581, >>>>>> "up": [ >>>>>> 37, >>>>>> 44, >>>>>> 16 >>>>>> ], >>>>>> "acting": [ >>>>>> 37, >>>>>> 44, >>>>>> 16 >>>>>> ], >>>>>> "actingbackfill": [ >>>>>> "16", >>>>>> "37", >>>>>> "44" >>>>>> ], >>>>>> "info": { >>>>>> "pgid": "0.190", >>>>>> "last_update": "30581'5420535", >>>>>> "last_complete": "30581'5420535", >>>>>> "log_tail": "30581'5417484", >>>>>> "last_user_version": 5420535, >>>>>> "last_backfill": "MAX", >>>>>> "last_backfill_bitwise": 0, >>>>>> "purged_snaps": "[]", >>>>>> "history": { >>>>>> "epoch_created": 1, >>>>>> "last_epoch_started": 30580, >>>>>> "last_epoch_clean": 30581, >>>>>> "last_epoch_split": 0, >>>>>> "last_epoch_marked_full": 0, >>>>>> "same_up_since": 30578, >>>>>> "same_interval_since": 30579, >>>>>> "same_primary_since": 30565, >>>>>> "last_scrub": "30554'5390240", >>>>>> "last_scrub_stamp": "2018-07-16 12:27:03.547524", >>>>>> "last_deep_scrub": "30554'5390240", >>>>>> "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524", >>>>>> "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555" >>>>>> }, >>>>>> "stats": { >>>>>> "version": "30581'5420535", >>>>>> "reported_seq": "5155553", >>>>>> "reported_epoch": "30581", >>>>>> "state": "active+clean+inconsistent", >>>>>> "last_fresh": "2018-07-17 12:02:13.002428", >>>>>> "last_change": "2018-07-16 13:37:24.020403", >>>>>> "last_active": "2018-07-17 12:02:13.002428", >>>>>> "last_peered": "2018-07-17 12:02:13.002428", >>>>>> "last_clean": "2018-07-17 12:02:13.002428", >>>>>> "last_became_active": "2018-07-16 13:37:13.173821", >>>>>> "last_became_peered": "2018-07-16 13:37:13.173821", >>>>>> "last_unstale": "2018-07-17 12:02:13.002428", >>>>>> "last_undegraded": "2018-07-17 12:02:13.002428", >>>>>> "last_fullsized": "2018-07-17 12:02:13.002428", >>>>>> "mapping_epoch": 30578, >>>>>> "log_start": "30581'5417484", >>>>>> "ondisk_log_start": "30581'5417484", >>>>>> "created": 1, >>>>>> "last_epoch_clean": 30581, >>>>>> "parent": "0.0", >>>>>> "parent_split_bits": 0, >>>>>> "last_scrub": "30554'5390240", >>>>>> "last_scrub_stamp": "2018-07-16 12:27:03.547524", >>>>>> "last_deep_scrub": "30554'5390240", >>>>>> "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524", >>>>>> "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555", >>>>>> "log_size": 3051, >>>>>> "ondisk_log_size": 3051, >>>>>> "stats_invalid": false, >>>>>> "dirty_stats_invalid": false, >>>>>> "omap_stats_invalid": false, >>>>>> "hitset_stats_invalid": false, >>>>>> "hitset_bytes_stats_invalid": false, >>>>>> "pin_stats_invalid": true, >>>>>> "stat_sum": { >>>>>> "num_bytes": 16946139153, >>>>>> "num_objects": 4148, >>>>>> "num_object_clones": 0, >>>>>> "num_object_copies": 12444, >>>>>> "num_objects_missing_on_primary": 0, >>>>>> "num_objects_missing": 0, >>>>>> "num_objects_degraded": 0, >>>>>> "num_objects_misplaced": 0, >>>>>> "num_objects_unfound": 0, >>>>>> "num_objects_dirty": 4148, >>>>>> "num_whiteouts": 0, >>>>>> "num_read": 6895104, >>>>>> "num_read_kb": 292185552, >>>>>> "num_write": 10032749, >>>>>> "num_write_kb": 185167701, >>>>>> "num_scrub_errors": 1, >>>>>> "num_shallow_scrub_errors": 1, >>>>>> "num_deep_scrub_errors": 0, >>>>>> "num_objects_recovered": 103598, >>>>>> "num_bytes_recovered": 424107954567, >>>>>> "num_keys_recovered": 110, >>>>>> "num_objects_omap": 1, >>>>>> "num_objects_hit_set_archive": 0, >>>>>> "num_bytes_hit_set_archive": 0, >>>>>> "num_flush": 0, >>>>>> "num_flush_kb": 0, >>>>>> "num_evict": 0, >>>>>> "num_evict_kb": 0, >>>>>> "num_promote": 0, >>>>>> "num_flush_mode_high": 0, >>>>>> "num_flush_mode_low": 0, >>>>>> "num_evict_mode_some": 0, >>>>>> "num_evict_mode_full": 0, >>>>>> "num_objects_pinned": 0 >>>>>> }, >>>>>> "up": [ >>>>>> 37, >>>>>> 44, >>>>>> 16 >>>>>> ], >>>>>> "acting": [ >>>>>> 37, >>>>>> 44, >>>>>> 16 >>>>>> ], >>>>>> "blocked_by": [], >>>>>> "up_primary": 37, >>>>>> "acting_primary": 37 >>>>>> }, >>>>>> "empty": 0, >>>>>> "dne": 0, >>>>>> "incomplete": 0, >>>>>> "last_epoch_started": 30580, >>>>>> "hit_set_history": { >>>>>> "current_last_update": "0'0", >>>>>> "history": [] >>>>>> } >>>>>> }, >>>>>> "peer_info": [ >>>>>> { >>>>>> "peer": "16", >>>>>> "pgid": "0.190", >>>>>> "last_update": "30581'5420535", >>>>>> "last_complete": "30581'5420535", >>>>>> "log_tail": "30537'5387475", >>>>>> "last_user_version": 5390577, >>>>>> "last_backfill": "MAX", >>>>>> "last_backfill_bitwise": 1, >>>>>> "purged_snaps": "[]", >>>>>> "history": { >>>>>> "epoch_created": 1, >>>>>> "last_epoch_started": 30580, >>>>>> "last_epoch_clean": 30581, >>>>>> "last_epoch_split": 0, >>>>>> "last_epoch_marked_full": 0, >>>>>> "same_up_since": 30578, >>>>>> "same_interval_since": 30579, >>>>>> "same_primary_since": 30565, >>>>>> "last_scrub": "30554'5390240", >>>>>> "last_scrub_stamp": "2018-07-16 12:27:03.547524", >>>>>> "last_deep_scrub": "30554'5390240", >>>>>> "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524", >>>>>> "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555" >>>>>> }, >>>>>> "stats": { >>>>>> "version": "30570'5390575", >>>>>> "reported_seq": "5139870", >>>>>> "reported_epoch": "30576", >>>>>> "state": "active+undersized+degraded+inconsistent", >>>>>> "last_fresh": "2018-07-16 13:36:40.284756", >>>>>> "last_change": "2018-07-16 13:36:40.284277", >>>>>> "last_active": "2018-07-16 13:36:40.284756", >>>>>> "last_peered": "2018-07-16 13:36:40.284756", >>>>>> "last_clean": "2018-07-16 13:36:23.558224", >>>>>> "last_became_active": "2018-07-16 13:36:40.284277", >>>>>> "last_became_peered": "2018-07-16 13:36:40.284277", >>>>>> "last_unstale": "2018-07-16 13:36:40.284756", >>>>>> "last_undegraded": "2018-07-16 13:36:40.203248", >>>>>> "last_fullsized": "2018-07-16 13:36:40.203248", >>>>>> "mapping_epoch": 30578, >>>>>> "log_start": "30537'5387475", >>>>>> "ondisk_log_start": "30537'5387475", >>>>>> "created": 1, >>>>>> "last_epoch_clean": 30576, >>>>>> "parent": "0.0", >>>>>> "parent_split_bits": 0, >>>>>> "last_scrub": "30554'5390240", >>>>>> "last_scrub_stamp": "2018-07-16 12:27:03.547524", >>>>>> "last_deep_scrub": "30554'5390240", >>>>>> "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524", >>>>>> "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555", >>>>>> "log_size": 3100, >>>>>> "ondisk_log_size": 3100, >>>>>> "stats_invalid": false, >>>>>> "dirty_stats_invalid": false, >>>>>> "omap_stats_invalid": false, >>>>>> "hitset_stats_invalid": false, >>>>>> "hitset_bytes_stats_invalid": false, >>>>>> "pin_stats_invalid": true, >>>>>> "stat_sum": { >>>>>> "num_bytes": 16841281553, >>>>>> "num_objects": 4123, >>>>>> "num_object_clones": 0, >>>>>> "num_object_copies": 12369, >>>>>> "num_objects_missing_on_primary": 0, >>>>>> "num_objects_missing": 0, >>>>>> "num_objects_degraded": 4123, >>>>>> "num_objects_misplaced": 0, >>>>>> "num_objects_unfound": 0, >>>>>> "num_objects_dirty": 4123, >>>>>> "num_whiteouts": 0, >>>>>> "num_read": 6870027, >>>>>> "num_read_kb": 291425720, >>>>>> "num_write": 9972836, >>>>>> "num_write_kb": 184701865, >>>>>> "num_scrub_errors": 1, >>>>>> "num_shallow_scrub_errors": 1, >>>>>> "num_deep_scrub_errors": 0, >>>>>> "num_objects_recovered": 103596, >>>>>> "num_bytes_recovered": 424099565959, >>>>>> "num_keys_recovered": 110, >>>>>> "num_objects_omap": 1, >>>>>> "num_objects_hit_set_archive": 0, >>>>>> "num_bytes_hit_set_archive": 0, >>>>>> "num_flush": 0, >>>>>> "num_flush_kb": 0, >>>>>> "num_evict": 0, >>>>>> "num_evict_kb": 0, >>>>>> "num_promote": 0, >>>>>> "num_flush_mode_high": 0, >>>>>> "num_flush_mode_low": 0, >>>>>> "num_evict_mode_some": 0, >>>>>> "num_evict_mode_full": 0, >>>>>> "num_objects_pinned": 0 >>>>>> }, >>>>>> "up": [ >>>>>> 37, >>>>>> 44, >>>>>> 16 >>>>>> ], >>>>>> "acting": [ >>>>>> 37, >>>>>> 44, >>>>>> 16 >>>>>> ], >>>>>> "blocked_by": [], >>>>>> "up_primary": 37, >>>>>> "acting_primary": 37 >>>>>> }, >>>>>> "empty": 0, >>>>>> "dne": 0, >>>>>> "incomplete": 0, >>>>>> "last_epoch_started": 30580, >>>>>> "hit_set_history": { >>>>>> "current_last_update": "0'0", >>>>>> "history": [] >>>>>> } >>>>>> }, >>>>>> { >>>>>> "peer": "44", >>>>>> "pgid": "0.190", >>>>>> "last_update": "30581'5420535", >>>>>> "last_complete": "30570'5390575", >>>>>> "log_tail": "30537'5387475", >>>>>> "last_user_version": 5390575, >>>>>> "last_backfill": "MAX", >>>>>> "last_backfill_bitwise": 1, >>>>>> "purged_snaps": "[]", >>>>>> "history": { >>>>>> "epoch_created": 1, >>>>>> "last_epoch_started": 30580, >>>>>> "last_epoch_clean": 30581, >>>>>> "last_epoch_split": 0, >>>>>> "last_epoch_marked_full": 0, >>>>>> "same_up_since": 30578, >>>>>> "same_interval_since": 30579, >>>>>> "same_primary_since": 30565, >>>>>> "last_scrub": "30554'5390240", >>>>>> "last_scrub_stamp": "2018-07-16 12:27:03.547524", >>>>>> "last_deep_scrub": "30554'5390240", >>>>>> "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524", >>>>>> "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555" >>>>>> }, >>>>>> "stats": { >>>>>> "version": "30568'5390574", >>>>>> "reported_seq": "5139846", >>>>>> "reported_epoch": "30570", >>>>>> "state": "active+undersized+degraded+inconsistent", >>>>>> "last_fresh": "2018-07-16 13:36:07.003551", >>>>>> "last_change": "2018-07-16 13:36:07.002580", >>>>>> "last_active": "2018-07-16 13:36:07.003551", >>>>>> "last_peered": "2018-07-16 13:36:07.003551", >>>>>> "last_clean": "2018-07-16 13:35:50.922619", >>>>>> "last_became_active": "2018-07-16 13:36:07.002580", >>>>>> "last_became_peered": "2018-07-16 13:36:07.002580", >>>>>> "last_unstale": "2018-07-16 13:36:07.003551", >>>>>> "last_undegraded": "2018-07-16 13:36:05.922413", >>>>>> "last_fullsized": "2018-07-16 13:36:05.922413", >>>>>> "mapping_epoch": 30578, >>>>>> "log_start": "30537'5387475", >>>>>> "ondisk_log_start": "30537'5387475", >>>>>> "created": 1, >>>>>> "last_epoch_clean": 30570, >>>>>> "parent": "0.0", >>>>>> "parent_split_bits": 0, >>>>>> "last_scrub": "30554'5390240", >>>>>> "last_scrub_stamp": "2018-07-16 12:27:03.547524", >>>>>> "last_deep_scrub": "30554'5390240", >>>>>> "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524", >>>>>> "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555", >>>>>> "log_size": 3099, >>>>>> "ondisk_log_size": 3099, >>>>>> "stats_invalid": false, >>>>>> "dirty_stats_invalid": false, >>>>>> "omap_stats_invalid": false, >>>>>> "hitset_stats_invalid": false, >>>>>> "hitset_bytes_stats_invalid": false, >>>>>> "pin_stats_invalid": true, >>>>>> "stat_sum": { >>>>>> "num_bytes": 16841281553, >>>>>> "num_objects": 4123, >>>>>> "num_object_clones": 0, >>>>>> "num_object_copies": 12369, >>>>>> "num_objects_missing_on_primary": 0, >>>>>> "num_objects_missing": 0, >>>>>> "num_objects_degraded": 4123, >>>>>> "num_objects_misplaced": 0, >>>>>> "num_objects_unfound": 0, >>>>>> "num_objects_dirty": 4123, >>>>>> "num_whiteouts": 0, >>>>>> "num_read": 6870027, >>>>>> "num_read_kb": 291425720, >>>>>> "num_write": 9972832, >>>>>> "num_write_kb": 184701853, >>>>>> "num_scrub_errors": 1, >>>>>> "num_shallow_scrub_errors": 1, >>>>>> "num_deep_scrub_errors": 0, >>>>>> "num_objects_recovered": 103594, >>>>>> "num_bytes_recovered": 424091177351, >>>>>> "num_keys_recovered": 110, >>>>>> "num_objects_omap": 1, >>>>>> "num_objects_hit_set_archive": 0, >>>>>> "num_bytes_hit_set_archive": 0, >>>>>> "num_flush": 0, >>>>>> "num_flush_kb": 0, >>>>>> "num_evict": 0, >>>>>> "num_evict_kb": 0, >>>>>> "num_promote": 0, >>>>>> "num_flush_mode_high": 0, >>>>>> "num_flush_mode_low": 0, >>>>>> "num_evict_mode_some": 0, >>>>>> "num_evict_mode_full": 0, >>>>>> "num_objects_pinned": 0 >>>>>> }, >>>>>> "up": [ >>>>>> 37, >>>>>> 44, >>>>>> 16 >>>>>> ], >>>>>> "acting": [ >>>>>> 37, >>>>>> 44, >>>>>> 16 >>>>>> ], >>>>>> "blocked_by": [], >>>>>> "up_primary": 37, >>>>>> "acting_primary": 37 >>>>>> }, >>>>>> "empty": 0, >>>>>> "dne": 0, >>>>>> "incomplete": 0, >>>>>> "last_epoch_started": 30580, >>>>>> "hit_set_history": { >>>>>> "current_last_update": "0'0", >>>>>> "history": [] >>>>>> } >>>>>> } >>>>>> ], >>>>>> "recovery_state": [ >>>>>> { >>>>>> "name": "Started\/Primary\/Active", >>>>>> "enter_time": "2018-07-16 13:37:13.050211", >>>>>> "might_have_unfound": [ >>>>>> { >>>>>> "osd": "16", >>>>>> "status": "already probed" >>>>>> }, >>>>>> { >>>>>> "osd": "44", >>>>>> "status": "already probed" >>>>>> } >>>>>> ], >>>>>> "recovery_progress": { >>>>>> "backfill_targets": [], >>>>>> "waiting_on_backfill": [], >>>>>> "last_backfill_started": "MIN", >>>>>> "backfill_info": { >>>>>> "begin": "MIN", >>>>>> "end": "MIN", >>>>>> "objects": [] >>>>>> }, >>>>>> "peer_backfill_info": [], >>>>>> "backfills_in_flight": [], >>>>>> "recovering": [], >>>>>> "pg_backend": { >>>>>> "pull_from_peer": [], >>>>>> "pushing": [] >>>>>> } >>>>>> }, >>>>>> "scrub": { >>>>>> "scrubber.epoch_start": "0", >>>>>> "scrubber.active": 0, >>>>>> "scrubber.state": "INACTIVE", >>>>>> "scrubber.start": "MIN", >>>>>> "scrubber.end": "MIN", >>>>>> "scrubber.subset_last_update": "0'0", >>>>>> "scrubber.deep": false, >>>>>> "scrubber.seed": 0, >>>>>> "scrubber.waiting_on": 0, >>>>>> "scrubber.waiting_on_whom": [] >>>>>> } >>>>>> }, >>>>>> { >>>>>> "name": "Started", >>>>>> "enter_time": "2018-07-16 13:37:11.980264" >>>>>> } >>>>>> ], >>>>>> "agent_state": {} >>>>>> } >>>>>> >>>>>> >>>>>> On 17/07/18 02:19, Brad Hubbard wrote: >>>>>>> Can we see a pg query of 0.190 ? >>>>>>> >>>>>>> On Tue, Jul 17, 2018 at 1:05 AM, Ana Aviles <ana@xxxxxxxxxxxx> wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> We have a cluster that was running hammer (0.94.10). We hit a bug where >>>>>>>> right after seemingly fixing an inconsistent PG, the primary OSD would >>>>>>>> crash and restart. Next deep-scrub will again return inconsistent PG. >>>>>>>> >>>>>>>> We filled in a bug issue >>>>>>>> https://tracker.ceph.com/issues/24652#change-115654 that was closed >>>>>>>> since it was a known bug fixed in newer versions of Ceph. >>>>>>>> >>>>>>>> Now the cluster is running jewel (10.2.11). There is again one >>>>>>>> inconsistent PG with 1 error which not able to fix and with no >>>>>>>> reference to the inconsistent object. >>>>>>>> >>>>>>>> >>>>>>>> scrub 0 missing, 1 inconsistent objects >>>>>>>> scrub 1 errors >>>>>>>> >>>>>>>> >>>>>>>> We have the logs with debug level 20 while repairing the PG. The one for >>>>>>>> the primary OSD is: 94e20123-fcda-49d7-98a2-919507dfbc92 >>>>>>>> >>>>>>>> Thanks! >>>>>>>> Kind regards, >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ana Avilés >>>>>>>> Greenhost - sustainable hosting & digital security >>>>>>>> E: ana@xxxxxxxxxxxx >>>>>>>> T: +31 20 4890444 >>>>>>>> W: https://greenhost.nl >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Ana Avilés >>>>>> Greenhost - sustainable hosting & digital security >>>>>> E: ana@xxxxxxxxxxxx >>>>>> T: +31 20 4890444 >>>>>> W: https://greenhost.nl >>>>> >>>>> >>>>> >>>> >>>> -- >>>> Ana Avilés >>>> Greenhost - sustainable hosting & digital security >>>> E: ana@xxxxxxxxxxxx >>>> T: +31 20 4890444 >>>> W: https://greenhost.nl >>> >>> >>> >> >> -- >> Ana Avilés >> Greenhost - sustainable hosting & digital security >> E: ana@xxxxxxxxxxxx >> T: +31 20 4890444 >> W: https://greenhost.nl > > > -- Ana Avilés Greenhost - sustainable hosting & digital security E: ana@xxxxxxxxxxxx T: +31 20 4890444 W: https://greenhost.nl -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html