On 19/07/18 03:25, Brad Hubbard wrote: > On Wed, Jul 18, 2018 at 6:25 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote: >> Ah ok. Then I think it confirms what you are saying. Here it is: >> >> $ rados list-inconsistent-obj 0.190 >> {"epoch":30579,"inconsistents":[{"object":{"name":"rbd_data.15cec2ae8944a.000000000015c7d6","nspace":"","locator":"","snap":"head","version":5498082},"errors":["object_info_inconsistency","attr_value_mismatch"],"union_shard_errors":[],"selected_object_info":"0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201 >> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd >> 7dd0d0bd od ffffffff alloc_hint [4194304 >> 4194304])","shards":[{"osd":16,"errors":[],"size":4194304,"object_info":"0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(26812'5142772 >> client.1044166.0:393154060 dirty|data_digest|omap_digest s 4194304 uv >> 5142772 dd 264b7d0d od ffffffff alloc_hint [0 >> 0])","attrs":[{"name":"_","value":"DwgMAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAAD0eE4AAAAAALxoAADzeE4AAAAAALxoAAACAhUAAAAIxu4PAAAAAAAMDm8XAAAAAAAAAAAAAEAAAAAAAOJmPVsEa24SAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9HhOAAAAAAAAAAAAAAAAAAA0AAAA4mY9W1Q\/lBwNfUsm\/\/\/\/\/w==","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]},{"osd":37,"errors":[],"size":4194304,"object_info":"0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head(26812'5142772 >> client.1044166.0:393154060 dirty|data_digest|omap_digest s 4194304 uv >> 5142772 dd 264b7d0d od ffffffff alloc_hint [0 >> 0])","attrs":[{"name":"_","value":"DwgMAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAxNWM3ZDb+\/\/\/\/\/\/\/\/\/5BDu3wAAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAAD0eE4AAAAAALxoAADzeE4AAAAAALxoAAACAhUAAAAIxu4PAAAAAAAMDm8XAAAAAAAAAAAAAEAAAAAAAOJmPVsEa24SAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9HhOAAAAAAAAAAAAAAAAAAA0AAAA4mY9W1Q\/lBwNfUsm\/\/\/\/\/w==","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]},{"osd":44,"errors":[],"size":4194304,"object_info":"0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201 >> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd >> 7dd0d0bd od ffffffff alloc_hint [4194304 >> 4194304])","attrs":[{"name":"_","value":"EAggAQAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xNWNlYzJhZTg5NDRhLjAwMDAwMDAwMDAwNGRiMGX+\/\/\/\/\/\/\/\/\/4Zx7B4AAAAAAAAAAAAAAAAABgMcAAAAAAAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABREFQAAAAAAEl3AADi5FMAAAAAAEl3AAACAhUAAAAEJAAAAAAAAABtf4IAAAAAAAAAAAAAAEAAAAAAAPpfSVvV\/VMKAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4uRTAAAAAAAAAAAAAAAAAAA0AAAA+l9JW4x6Rw290NB9\/\/\/\/\/wAAQAAAAAAAAABAAAAAAAAAAAAA","Base64":true},{"name":"snapset","value":"AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==","Base64":true}]}]}]} >> >> >> To determine which is the right version of the object, is there no >> timestamp that can tell us? maybe the object got updated to osd.37 and >> osd.16 while osd.44 was down, and there comes the missmatch? because >> otherwise, shouldn't the authoritative osd be leading? > > The primary will be serving IO requests so the version on osd 37 is > what will be read by clients so I guess going with that is reasonable. > OK good. > The version on osd 44 was actually modified after the others (epoch > 30537, as opposed to epoch 26812) but the sizes are all the same so > the difference may be trivial (metadat only perhaps) and, according to > the last request id (osd.36.0:8552301) came from another osd (36) > which is kind of unexpected. Is there, or was there, a cache tier > involved? Ah OK, very interesting! No, no cache tier involved. So at one point osd.36 was part of the PG set? > > If you want to go with the version that is currently being used (37 > and 16) you can just quiesce the rbd image clients and do a rados get, > then a rados put of the object. I would suggest taking a backup of the > object from osd 44 using the ceph-objectstore-tool although, as I > said, that version will not be being used so I doubt you will miss it. > Great, will do that. Thanks a lot for help. >> >> Regards, >> Ana >> >> >> On 18/07/18 05:24, Brad Hubbard wrote: >>> OK. What I *meant* to ask for was the output of "rados >>> list-inconsistent-obj 0.190" (might still be worth posting that but it >>> should just confirm findings below). >>> >>> >>> The relevant lines from the log are below. >>> >>> 2018-07-16 12:24:45.940910 7fb422340700 2 osd.37 pg_epoch: 30554 >>> pg[0.190( v 30554'5390084 (30537'5387075,30554'5390084] >>> local-les=30554 n=4123 ec=1 les/c/f 30554/30554/0 30552/30553/30542) >>> [37,44,16] r=0 lpr=30553 crt=30554'5390079 lcod 30554'5390083 mlcod >>> 30554'5390083 active+clean+scrubbing+deep+inconsistent+repair] 0.190 >>> shard 16: soid 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head >>> data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, >>> data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi >>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201 >>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd >>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value >>> mismatch '_' 2018-07-16 12:24:45.940941 7fb422340700 -1 >>> log_channel(cluster) log [ERR] : 0.190 shard 16: soid >>> 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest >>> 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest >>> 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi >>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201 >>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd >>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value >>> mismatch '_' 2018-07-16 12:24:45.940957 7fb422340700 -1 >>> log_channel(cluster) log [ERR] : 0.190 shard 37: soid >>> 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest >>> 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest >>> 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi >>> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201 >>> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd >>> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value >>> mismatch '_' >>> >>> They show that osd 44 has been chosen as the authoritative shard and >>> and it has a data digest for this object of 0x7dd0d0bd and that the >>> data digest in the authoritative object info is also 0x7dd0d0bd. >>> >>> Shard 16 however, has a data digest of 0x264b7d0d and so does shard 37 >>> so the data for this object on osds 16 and 37 is different to that on >>> osd 44. >>> >>> Basically, you'll need to pick which is the "right" copy of the object >>> (I can't tell you) quiesce traffic to/from that object (rbd image) and >>> get/put that object back into the cluster to fix the mismatch. Since >>> this appears to be an rbd image this could potentially result in an >>> image that needs an fsck or equivalent IIUC. >>> >>> >>> On Tue, Jul 17, 2018 at 10:06 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote: >>>> >>>> Hi Brad, >>>> >>>> Here it is: >>>> >>>> { >>>> "state": "active+clean+inconsistent", >>>> "snap_trimq": "[]", >>>> "epoch": 30581, >>>> "up": [ >>>> 37, >>>> 44, >>>> 16 >>>> ], >>>> "acting": [ >>>> 37, >>>> 44, >>>> 16 >>>> ], >>>> "actingbackfill": [ >>>> "16", >>>> "37", >>>> "44" >>>> ], >>>> "info": { >>>> "pgid": "0.190", >>>> "last_update": "30581'5420535", >>>> "last_complete": "30581'5420535", >>>> "log_tail": "30581'5417484", >>>> "last_user_version": 5420535, >>>> "last_backfill": "MAX", >>>> "last_backfill_bitwise": 0, >>>> "purged_snaps": "[]", >>>> "history": { >>>> "epoch_created": 1, >>>> "last_epoch_started": 30580, >>>> "last_epoch_clean": 30581, >>>> "last_epoch_split": 0, >>>> "last_epoch_marked_full": 0, >>>> "same_up_since": 30578, >>>> "same_interval_since": 30579, >>>> "same_primary_since": 30565, >>>> "last_scrub": "30554'5390240", >>>> "last_scrub_stamp": "2018-07-16 12:27:03.547524", >>>> "last_deep_scrub": "30554'5390240", >>>> "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524", >>>> "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555" >>>> }, >>>> "stats": { >>>> "version": "30581'5420535", >>>> "reported_seq": "5155553", >>>> "reported_epoch": "30581", >>>> "state": "active+clean+inconsistent", >>>> "last_fresh": "2018-07-17 12:02:13.002428", >>>> "last_change": "2018-07-16 13:37:24.020403", >>>> "last_active": "2018-07-17 12:02:13.002428", >>>> "last_peered": "2018-07-17 12:02:13.002428", >>>> "last_clean": "2018-07-17 12:02:13.002428", >>>> "last_became_active": "2018-07-16 13:37:13.173821", >>>> "last_became_peered": "2018-07-16 13:37:13.173821", >>>> "last_unstale": "2018-07-17 12:02:13.002428", >>>> "last_undegraded": "2018-07-17 12:02:13.002428", >>>> "last_fullsized": "2018-07-17 12:02:13.002428", >>>> "mapping_epoch": 30578, >>>> "log_start": "30581'5417484", >>>> "ondisk_log_start": "30581'5417484", >>>> "created": 1, >>>> "last_epoch_clean": 30581, >>>> "parent": "0.0", >>>> "parent_split_bits": 0, >>>> "last_scrub": "30554'5390240", >>>> "last_scrub_stamp": "2018-07-16 12:27:03.547524", >>>> "last_deep_scrub": "30554'5390240", >>>> "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524", >>>> "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555", >>>> "log_size": 3051, >>>> "ondisk_log_size": 3051, >>>> "stats_invalid": false, >>>> "dirty_stats_invalid": false, >>>> "omap_stats_invalid": false, >>>> "hitset_stats_invalid": false, >>>> "hitset_bytes_stats_invalid": false, >>>> "pin_stats_invalid": true, >>>> "stat_sum": { >>>> "num_bytes": 16946139153, >>>> "num_objects": 4148, >>>> "num_object_clones": 0, >>>> "num_object_copies": 12444, >>>> "num_objects_missing_on_primary": 0, >>>> "num_objects_missing": 0, >>>> "num_objects_degraded": 0, >>>> "num_objects_misplaced": 0, >>>> "num_objects_unfound": 0, >>>> "num_objects_dirty": 4148, >>>> "num_whiteouts": 0, >>>> "num_read": 6895104, >>>> "num_read_kb": 292185552, >>>> "num_write": 10032749, >>>> "num_write_kb": 185167701, >>>> "num_scrub_errors": 1, >>>> "num_shallow_scrub_errors": 1, >>>> "num_deep_scrub_errors": 0, >>>> "num_objects_recovered": 103598, >>>> "num_bytes_recovered": 424107954567, >>>> "num_keys_recovered": 110, >>>> "num_objects_omap": 1, >>>> "num_objects_hit_set_archive": 0, >>>> "num_bytes_hit_set_archive": 0, >>>> "num_flush": 0, >>>> "num_flush_kb": 0, >>>> "num_evict": 0, >>>> "num_evict_kb": 0, >>>> "num_promote": 0, >>>> "num_flush_mode_high": 0, >>>> "num_flush_mode_low": 0, >>>> "num_evict_mode_some": 0, >>>> "num_evict_mode_full": 0, >>>> "num_objects_pinned": 0 >>>> }, >>>> "up": [ >>>> 37, >>>> 44, >>>> 16 >>>> ], >>>> "acting": [ >>>> 37, >>>> 44, >>>> 16 >>>> ], >>>> "blocked_by": [], >>>> "up_primary": 37, >>>> "acting_primary": 37 >>>> }, >>>> "empty": 0, >>>> "dne": 0, >>>> "incomplete": 0, >>>> "last_epoch_started": 30580, >>>> "hit_set_history": { >>>> "current_last_update": "0'0", >>>> "history": [] >>>> } >>>> }, >>>> "peer_info": [ >>>> { >>>> "peer": "16", >>>> "pgid": "0.190", >>>> "last_update": "30581'5420535", >>>> "last_complete": "30581'5420535", >>>> "log_tail": "30537'5387475", >>>> "last_user_version": 5390577, >>>> "last_backfill": "MAX", >>>> "last_backfill_bitwise": 1, >>>> "purged_snaps": "[]", >>>> "history": { >>>> "epoch_created": 1, >>>> "last_epoch_started": 30580, >>>> "last_epoch_clean": 30581, >>>> "last_epoch_split": 0, >>>> "last_epoch_marked_full": 0, >>>> "same_up_since": 30578, >>>> "same_interval_since": 30579, >>>> "same_primary_since": 30565, >>>> "last_scrub": "30554'5390240", >>>> "last_scrub_stamp": "2018-07-16 12:27:03.547524", >>>> "last_deep_scrub": "30554'5390240", >>>> "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524", >>>> "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555" >>>> }, >>>> "stats": { >>>> "version": "30570'5390575", >>>> "reported_seq": "5139870", >>>> "reported_epoch": "30576", >>>> "state": "active+undersized+degraded+inconsistent", >>>> "last_fresh": "2018-07-16 13:36:40.284756", >>>> "last_change": "2018-07-16 13:36:40.284277", >>>> "last_active": "2018-07-16 13:36:40.284756", >>>> "last_peered": "2018-07-16 13:36:40.284756", >>>> "last_clean": "2018-07-16 13:36:23.558224", >>>> "last_became_active": "2018-07-16 13:36:40.284277", >>>> "last_became_peered": "2018-07-16 13:36:40.284277", >>>> "last_unstale": "2018-07-16 13:36:40.284756", >>>> "last_undegraded": "2018-07-16 13:36:40.203248", >>>> "last_fullsized": "2018-07-16 13:36:40.203248", >>>> "mapping_epoch": 30578, >>>> "log_start": "30537'5387475", >>>> "ondisk_log_start": "30537'5387475", >>>> "created": 1, >>>> "last_epoch_clean": 30576, >>>> "parent": "0.0", >>>> "parent_split_bits": 0, >>>> "last_scrub": "30554'5390240", >>>> "last_scrub_stamp": "2018-07-16 12:27:03.547524", >>>> "last_deep_scrub": "30554'5390240", >>>> "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524", >>>> "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555", >>>> "log_size": 3100, >>>> "ondisk_log_size": 3100, >>>> "stats_invalid": false, >>>> "dirty_stats_invalid": false, >>>> "omap_stats_invalid": false, >>>> "hitset_stats_invalid": false, >>>> "hitset_bytes_stats_invalid": false, >>>> "pin_stats_invalid": true, >>>> "stat_sum": { >>>> "num_bytes": 16841281553, >>>> "num_objects": 4123, >>>> "num_object_clones": 0, >>>> "num_object_copies": 12369, >>>> "num_objects_missing_on_primary": 0, >>>> "num_objects_missing": 0, >>>> "num_objects_degraded": 4123, >>>> "num_objects_misplaced": 0, >>>> "num_objects_unfound": 0, >>>> "num_objects_dirty": 4123, >>>> "num_whiteouts": 0, >>>> "num_read": 6870027, >>>> "num_read_kb": 291425720, >>>> "num_write": 9972836, >>>> "num_write_kb": 184701865, >>>> "num_scrub_errors": 1, >>>> "num_shallow_scrub_errors": 1, >>>> "num_deep_scrub_errors": 0, >>>> "num_objects_recovered": 103596, >>>> "num_bytes_recovered": 424099565959, >>>> "num_keys_recovered": 110, >>>> "num_objects_omap": 1, >>>> "num_objects_hit_set_archive": 0, >>>> "num_bytes_hit_set_archive": 0, >>>> "num_flush": 0, >>>> "num_flush_kb": 0, >>>> "num_evict": 0, >>>> "num_evict_kb": 0, >>>> "num_promote": 0, >>>> "num_flush_mode_high": 0, >>>> "num_flush_mode_low": 0, >>>> "num_evict_mode_some": 0, >>>> "num_evict_mode_full": 0, >>>> "num_objects_pinned": 0 >>>> }, >>>> "up": [ >>>> 37, >>>> 44, >>>> 16 >>>> ], >>>> "acting": [ >>>> 37, >>>> 44, >>>> 16 >>>> ], >>>> "blocked_by": [], >>>> "up_primary": 37, >>>> "acting_primary": 37 >>>> }, >>>> "empty": 0, >>>> "dne": 0, >>>> "incomplete": 0, >>>> "last_epoch_started": 30580, >>>> "hit_set_history": { >>>> "current_last_update": "0'0", >>>> "history": [] >>>> } >>>> }, >>>> { >>>> "peer": "44", >>>> "pgid": "0.190", >>>> "last_update": "30581'5420535", >>>> "last_complete": "30570'5390575", >>>> "log_tail": "30537'5387475", >>>> "last_user_version": 5390575, >>>> "last_backfill": "MAX", >>>> "last_backfill_bitwise": 1, >>>> "purged_snaps": "[]", >>>> "history": { >>>> "epoch_created": 1, >>>> "last_epoch_started": 30580, >>>> "last_epoch_clean": 30581, >>>> "last_epoch_split": 0, >>>> "last_epoch_marked_full": 0, >>>> "same_up_since": 30578, >>>> "same_interval_since": 30579, >>>> "same_primary_since": 30565, >>>> "last_scrub": "30554'5390240", >>>> "last_scrub_stamp": "2018-07-16 12:27:03.547524", >>>> "last_deep_scrub": "30554'5390240", >>>> "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524", >>>> "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555" >>>> }, >>>> "stats": { >>>> "version": "30568'5390574", >>>> "reported_seq": "5139846", >>>> "reported_epoch": "30570", >>>> "state": "active+undersized+degraded+inconsistent", >>>> "last_fresh": "2018-07-16 13:36:07.003551", >>>> "last_change": "2018-07-16 13:36:07.002580", >>>> "last_active": "2018-07-16 13:36:07.003551", >>>> "last_peered": "2018-07-16 13:36:07.003551", >>>> "last_clean": "2018-07-16 13:35:50.922619", >>>> "last_became_active": "2018-07-16 13:36:07.002580", >>>> "last_became_peered": "2018-07-16 13:36:07.002580", >>>> "last_unstale": "2018-07-16 13:36:07.003551", >>>> "last_undegraded": "2018-07-16 13:36:05.922413", >>>> "last_fullsized": "2018-07-16 13:36:05.922413", >>>> "mapping_epoch": 30578, >>>> "log_start": "30537'5387475", >>>> "ondisk_log_start": "30537'5387475", >>>> "created": 1, >>>> "last_epoch_clean": 30570, >>>> "parent": "0.0", >>>> "parent_split_bits": 0, >>>> "last_scrub": "30554'5390240", >>>> "last_scrub_stamp": "2018-07-16 12:27:03.547524", >>>> "last_deep_scrub": "30554'5390240", >>>> "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524", >>>> "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555", >>>> "log_size": 3099, >>>> "ondisk_log_size": 3099, >>>> "stats_invalid": false, >>>> "dirty_stats_invalid": false, >>>> "omap_stats_invalid": false, >>>> "hitset_stats_invalid": false, >>>> "hitset_bytes_stats_invalid": false, >>>> "pin_stats_invalid": true, >>>> "stat_sum": { >>>> "num_bytes": 16841281553, >>>> "num_objects": 4123, >>>> "num_object_clones": 0, >>>> "num_object_copies": 12369, >>>> "num_objects_missing_on_primary": 0, >>>> "num_objects_missing": 0, >>>> "num_objects_degraded": 4123, >>>> "num_objects_misplaced": 0, >>>> "num_objects_unfound": 0, >>>> "num_objects_dirty": 4123, >>>> "num_whiteouts": 0, >>>> "num_read": 6870027, >>>> "num_read_kb": 291425720, >>>> "num_write": 9972832, >>>> "num_write_kb": 184701853, >>>> "num_scrub_errors": 1, >>>> "num_shallow_scrub_errors": 1, >>>> "num_deep_scrub_errors": 0, >>>> "num_objects_recovered": 103594, >>>> "num_bytes_recovered": 424091177351, >>>> "num_keys_recovered": 110, >>>> "num_objects_omap": 1, >>>> "num_objects_hit_set_archive": 0, >>>> "num_bytes_hit_set_archive": 0, >>>> "num_flush": 0, >>>> "num_flush_kb": 0, >>>> "num_evict": 0, >>>> "num_evict_kb": 0, >>>> "num_promote": 0, >>>> "num_flush_mode_high": 0, >>>> "num_flush_mode_low": 0, >>>> "num_evict_mode_some": 0, >>>> "num_evict_mode_full": 0, >>>> "num_objects_pinned": 0 >>>> }, >>>> "up": [ >>>> 37, >>>> 44, >>>> 16 >>>> ], >>>> "acting": [ >>>> 37, >>>> 44, >>>> 16 >>>> ], >>>> "blocked_by": [], >>>> "up_primary": 37, >>>> "acting_primary": 37 >>>> }, >>>> "empty": 0, >>>> "dne": 0, >>>> "incomplete": 0, >>>> "last_epoch_started": 30580, >>>> "hit_set_history": { >>>> "current_last_update": "0'0", >>>> "history": [] >>>> } >>>> } >>>> ], >>>> "recovery_state": [ >>>> { >>>> "name": "Started\/Primary\/Active", >>>> "enter_time": "2018-07-16 13:37:13.050211", >>>> "might_have_unfound": [ >>>> { >>>> "osd": "16", >>>> "status": "already probed" >>>> }, >>>> { >>>> "osd": "44", >>>> "status": "already probed" >>>> } >>>> ], >>>> "recovery_progress": { >>>> "backfill_targets": [], >>>> "waiting_on_backfill": [], >>>> "last_backfill_started": "MIN", >>>> "backfill_info": { >>>> "begin": "MIN", >>>> "end": "MIN", >>>> "objects": [] >>>> }, >>>> "peer_backfill_info": [], >>>> "backfills_in_flight": [], >>>> "recovering": [], >>>> "pg_backend": { >>>> "pull_from_peer": [], >>>> "pushing": [] >>>> } >>>> }, >>>> "scrub": { >>>> "scrubber.epoch_start": "0", >>>> "scrubber.active": 0, >>>> "scrubber.state": "INACTIVE", >>>> "scrubber.start": "MIN", >>>> "scrubber.end": "MIN", >>>> "scrubber.subset_last_update": "0'0", >>>> "scrubber.deep": false, >>>> "scrubber.seed": 0, >>>> "scrubber.waiting_on": 0, >>>> "scrubber.waiting_on_whom": [] >>>> } >>>> }, >>>> { >>>> "name": "Started", >>>> "enter_time": "2018-07-16 13:37:11.980264" >>>> } >>>> ], >>>> "agent_state": {} >>>> } >>>> >>>> >>>> On 17/07/18 02:19, Brad Hubbard wrote: >>>>> Can we see a pg query of 0.190 ? >>>>> >>>>> On Tue, Jul 17, 2018 at 1:05 AM, Ana Aviles <ana@xxxxxxxxxxxx> wrote: >>>>>> Hello, >>>>>> >>>>>> We have a cluster that was running hammer (0.94.10). We hit a bug where >>>>>> right after seemingly fixing an inconsistent PG, the primary OSD would >>>>>> crash and restart. Next deep-scrub will again return inconsistent PG. >>>>>> >>>>>> We filled in a bug issue >>>>>> https://tracker.ceph.com/issues/24652#change-115654 that was closed >>>>>> since it was a known bug fixed in newer versions of Ceph. >>>>>> >>>>>> Now the cluster is running jewel (10.2.11). There is again one >>>>>> inconsistent PG with 1 error which not able to fix and with no >>>>>> reference to the inconsistent object. >>>>>> >>>>>> >>>>>> scrub 0 missing, 1 inconsistent objects >>>>>> scrub 1 errors >>>>>> >>>>>> >>>>>> We have the logs with debug level 20 while repairing the PG. The one for >>>>>> the primary OSD is: 94e20123-fcda-49d7-98a2-919507dfbc92 >>>>>> >>>>>> Thanks! >>>>>> Kind regards, >>>>>> >>>>>> >>>>>> -- >>>>>> Ana Avilés >>>>>> Greenhost - sustainable hosting & digital security >>>>>> E: ana@xxxxxxxxxxxx >>>>>> T: +31 20 4890444 >>>>>> W: https://greenhost.nl >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>>>> >>>> >>>> -- >>>> Ana Avilés >>>> Greenhost - sustainable hosting & digital security >>>> E: ana@xxxxxxxxxxxx >>>> T: +31 20 4890444 >>>> W: https://greenhost.nl >>> >>> >>> >> >> -- >> Ana Avilés >> Greenhost - sustainable hosting & digital security >> E: ana@xxxxxxxxxxxx >> T: +31 20 4890444 >> W: https://greenhost.nl > > > -- Ana Avilés Greenhost - sustainable hosting & digital security E: ana@xxxxxxxxxxxx T: +31 20 4890444 W: https://greenhost.nl -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html