Re: Inconsistent PG with 1 inconsistent object not referenced in the log

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Ah ok. Then I think it confirms what you are saying. Here it is:

$ rados list-inconsistent-obj 0.190
osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
7dd0d0bd od ffffffff alloc_hint [4194304
client.1044166.0:393154060 dirty|data_digest|omap_digest s 4194304 uv
5142772 dd 264b7d0d od ffffffff alloc_hint [0
client.1044166.0:393154060 dirty|data_digest|omap_digest s 4194304 uv
5142772 dd 264b7d0d od ffffffff alloc_hint [0
osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
7dd0d0bd od ffffffff alloc_hint [4194304

To determine which is the right version of the object, is there no
timestamp that can tell us? maybe the object got updated to osd.37 and
osd.16 while osd.44 was down, and there comes the missmatch? because
otherwise, shouldn't the authoritative osd be leading?


On 18/07/18 05:24, Brad Hubbard wrote:
> OK. What I *meant* to ask for was the output of "rados
> list-inconsistent-obj 0.190" (might still be worth posting that but it
> should just confirm findings below).
> The relevant lines from the log are below.
> 2018-07-16 12:24:45.940910 7fb422340700 2 osd.37 pg_epoch: 30554
> pg[0.190( v 30554'5390084 (30537'5387075,30554'5390084]
> local-les=30554 n=4123 ec=1 les/c/f 30554/30554/0 30552/30553/30542)
> [37,44,16] r=0 lpr=30553 crt=30554'5390079 lcod 30554'5390083 mlcod
> 30554'5390083 active+clean+scrubbing+deep+inconsistent+repair] 0.190
> shard 16: soid 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head
> data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44,
> data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
> mismatch '_' 2018-07-16 12:24:45.940941 7fb422340700 -1
> log_channel(cluster) log [ERR] : 0.190 shard 16: soid
> 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest
> 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest
> 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
> mismatch '_' 2018-07-16 12:24:45.940957 7fb422340700 -1
> log_channel(cluster) log [ERR] : 0.190 shard 37: soid
> 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest
> 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest
> 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
> 0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
> osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
> 7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
> mismatch '_'
> They show that osd 44 has been chosen as the authoritative shard and
> and it has a data digest for this object of 0x7dd0d0bd and that the
> data digest in the authoritative object info is also 0x7dd0d0bd.
> Shard 16 however, has a data digest of 0x264b7d0d and so does shard 37
> so the data for this object on osds 16 and 37 is different to that on
> osd 44.
> Basically, you'll need to pick which is the "right" copy of the object
> (I can't tell you) quiesce traffic to/from that object (rbd image) and
> get/put that object back into the cluster to fix the mismatch. Since
> this appears to be an rbd image this could potentially result in an
> image that needs an fsck or equivalent IIUC.
> On Tue, Jul 17, 2018 at 10:06 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>> Hi Brad,
>> Here it is:
>> {
>>     "state": "active+clean+inconsistent",
>>     "snap_trimq": "[]",
>>     "epoch": 30581,
>>     "up": [
>>         37,
>>         44,
>>         16
>>     ],
>>     "acting": [
>>         37,
>>         44,
>>         16
>>     ],
>>     "actingbackfill": [
>>         "16",
>>         "37",
>>         "44"
>>     ],
>>     "info": {
>>         "pgid": "0.190",
>>         "last_update": "30581'5420535",
>>         "last_complete": "30581'5420535",
>>         "log_tail": "30581'5417484",
>>         "last_user_version": 5420535,
>>         "last_backfill": "MAX",
>>         "last_backfill_bitwise": 0,
>>         "purged_snaps": "[]",
>>         "history": {
>>             "epoch_created": 1,
>>             "last_epoch_started": 30580,
>>             "last_epoch_clean": 30581,
>>             "last_epoch_split": 0,
>>             "last_epoch_marked_full": 0,
>>             "same_up_since": 30578,
>>             "same_interval_since": 30579,
>>             "same_primary_since": 30565,
>>             "last_scrub": "30554'5390240",
>>             "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>             "last_deep_scrub": "30554'5390240",
>>             "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>             "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>         },
>>         "stats": {
>>             "version": "30581'5420535",
>>             "reported_seq": "5155553",
>>             "reported_epoch": "30581",
>>             "state": "active+clean+inconsistent",
>>             "last_fresh": "2018-07-17 12:02:13.002428",
>>             "last_change": "2018-07-16 13:37:24.020403",
>>             "last_active": "2018-07-17 12:02:13.002428",
>>             "last_peered": "2018-07-17 12:02:13.002428",
>>             "last_clean": "2018-07-17 12:02:13.002428",
>>             "last_became_active": "2018-07-16 13:37:13.173821",
>>             "last_became_peered": "2018-07-16 13:37:13.173821",
>>             "last_unstale": "2018-07-17 12:02:13.002428",
>>             "last_undegraded": "2018-07-17 12:02:13.002428",
>>             "last_fullsized": "2018-07-17 12:02:13.002428",
>>             "mapping_epoch": 30578,
>>             "log_start": "30581'5417484",
>>             "ondisk_log_start": "30581'5417484",
>>             "created": 1,
>>             "last_epoch_clean": 30581,
>>             "parent": "0.0",
>>             "parent_split_bits": 0,
>>             "last_scrub": "30554'5390240",
>>             "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>             "last_deep_scrub": "30554'5390240",
>>             "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>             "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>             "log_size": 3051,
>>             "ondisk_log_size": 3051,
>>             "stats_invalid": false,
>>             "dirty_stats_invalid": false,
>>             "omap_stats_invalid": false,
>>             "hitset_stats_invalid": false,
>>             "hitset_bytes_stats_invalid": false,
>>             "pin_stats_invalid": true,
>>             "stat_sum": {
>>                 "num_bytes": 16946139153,
>>                 "num_objects": 4148,
>>                 "num_object_clones": 0,
>>                 "num_object_copies": 12444,
>>                 "num_objects_missing_on_primary": 0,
>>                 "num_objects_missing": 0,
>>                 "num_objects_degraded": 0,
>>                 "num_objects_misplaced": 0,
>>                 "num_objects_unfound": 0,
>>                 "num_objects_dirty": 4148,
>>                 "num_whiteouts": 0,
>>                 "num_read": 6895104,
>>                 "num_read_kb": 292185552,
>>                 "num_write": 10032749,
>>                 "num_write_kb": 185167701,
>>                 "num_scrub_errors": 1,
>>                 "num_shallow_scrub_errors": 1,
>>                 "num_deep_scrub_errors": 0,
>>                 "num_objects_recovered": 103598,
>>                 "num_bytes_recovered": 424107954567,
>>                 "num_keys_recovered": 110,
>>                 "num_objects_omap": 1,
>>                 "num_objects_hit_set_archive": 0,
>>                 "num_bytes_hit_set_archive": 0,
>>                 "num_flush": 0,
>>                 "num_flush_kb": 0,
>>                 "num_evict": 0,
>>                 "num_evict_kb": 0,
>>                 "num_promote": 0,
>>                 "num_flush_mode_high": 0,
>>                 "num_flush_mode_low": 0,
>>                 "num_evict_mode_some": 0,
>>                 "num_evict_mode_full": 0,
>>                 "num_objects_pinned": 0
>>             },
>>             "up": [
>>                 37,
>>                 44,
>>                 16
>>             ],
>>             "acting": [
>>                 37,
>>                 44,
>>                 16
>>             ],
>>             "blocked_by": [],
>>             "up_primary": 37,
>>             "acting_primary": 37
>>         },
>>         "empty": 0,
>>         "dne": 0,
>>         "incomplete": 0,
>>         "last_epoch_started": 30580,
>>         "hit_set_history": {
>>             "current_last_update": "0'0",
>>             "history": []
>>         }
>>     },
>>     "peer_info": [
>>         {
>>             "peer": "16",
>>             "pgid": "0.190",
>>             "last_update": "30581'5420535",
>>             "last_complete": "30581'5420535",
>>             "log_tail": "30537'5387475",
>>             "last_user_version": 5390577,
>>             "last_backfill": "MAX",
>>             "last_backfill_bitwise": 1,
>>             "purged_snaps": "[]",
>>             "history": {
>>                 "epoch_created": 1,
>>                 "last_epoch_started": 30580,
>>                 "last_epoch_clean": 30581,
>>                 "last_epoch_split": 0,
>>                 "last_epoch_marked_full": 0,
>>                 "same_up_since": 30578,
>>                 "same_interval_since": 30579,
>>                 "same_primary_since": 30565,
>>                 "last_scrub": "30554'5390240",
>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>                 "last_deep_scrub": "30554'5390240",
>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>             },
>>             "stats": {
>>                 "version": "30570'5390575",
>>                 "reported_seq": "5139870",
>>                 "reported_epoch": "30576",
>>                 "state": "active+undersized+degraded+inconsistent",
>>                 "last_fresh": "2018-07-16 13:36:40.284756",
>>                 "last_change": "2018-07-16 13:36:40.284277",
>>                 "last_active": "2018-07-16 13:36:40.284756",
>>                 "last_peered": "2018-07-16 13:36:40.284756",
>>                 "last_clean": "2018-07-16 13:36:23.558224",
>>                 "last_became_active": "2018-07-16 13:36:40.284277",
>>                 "last_became_peered": "2018-07-16 13:36:40.284277",
>>                 "last_unstale": "2018-07-16 13:36:40.284756",
>>                 "last_undegraded": "2018-07-16 13:36:40.203248",
>>                 "last_fullsized": "2018-07-16 13:36:40.203248",
>>                 "mapping_epoch": 30578,
>>                 "log_start": "30537'5387475",
>>                 "ondisk_log_start": "30537'5387475",
>>                 "created": 1,
>>                 "last_epoch_clean": 30576,
>>                 "parent": "0.0",
>>                 "parent_split_bits": 0,
>>                 "last_scrub": "30554'5390240",
>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>                 "last_deep_scrub": "30554'5390240",
>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>                 "log_size": 3100,
>>                 "ondisk_log_size": 3100,
>>                 "stats_invalid": false,
>>                 "dirty_stats_invalid": false,
>>                 "omap_stats_invalid": false,
>>                 "hitset_stats_invalid": false,
>>                 "hitset_bytes_stats_invalid": false,
>>                 "pin_stats_invalid": true,
>>                 "stat_sum": {
>>                     "num_bytes": 16841281553,
>>                     "num_objects": 4123,
>>                     "num_object_clones": 0,
>>                     "num_object_copies": 12369,
>>                     "num_objects_missing_on_primary": 0,
>>                     "num_objects_missing": 0,
>>                     "num_objects_degraded": 4123,
>>                     "num_objects_misplaced": 0,
>>                     "num_objects_unfound": 0,
>>                     "num_objects_dirty": 4123,
>>                     "num_whiteouts": 0,
>>                     "num_read": 6870027,
>>                     "num_read_kb": 291425720,
>>                     "num_write": 9972836,
>>                     "num_write_kb": 184701865,
>>                     "num_scrub_errors": 1,
>>                     "num_shallow_scrub_errors": 1,
>>                     "num_deep_scrub_errors": 0,
>>                     "num_objects_recovered": 103596,
>>                     "num_bytes_recovered": 424099565959,
>>                     "num_keys_recovered": 110,
>>                     "num_objects_omap": 1,
>>                     "num_objects_hit_set_archive": 0,
>>                     "num_bytes_hit_set_archive": 0,
>>                     "num_flush": 0,
>>                     "num_flush_kb": 0,
>>                     "num_evict": 0,
>>                     "num_evict_kb": 0,
>>                     "num_promote": 0,
>>                     "num_flush_mode_high": 0,
>>                     "num_flush_mode_low": 0,
>>                     "num_evict_mode_some": 0,
>>                     "num_evict_mode_full": 0,
>>                     "num_objects_pinned": 0
>>                 },
>>                 "up": [
>>                     37,
>>                     44,
>>                     16
>>                 ],
>>                 "acting": [
>>                     37,
>>                     44,
>>                     16
>>                 ],
>>                 "blocked_by": [],
>>                 "up_primary": 37,
>>                 "acting_primary": 37
>>             },
>>             "empty": 0,
>>             "dne": 0,
>>             "incomplete": 0,
>>             "last_epoch_started": 30580,
>>             "hit_set_history": {
>>                 "current_last_update": "0'0",
>>                 "history": []
>>             }
>>         },
>>         {
>>             "peer": "44",
>>             "pgid": "0.190",
>>             "last_update": "30581'5420535",
>>             "last_complete": "30570'5390575",
>>             "log_tail": "30537'5387475",
>>             "last_user_version": 5390575,
>>             "last_backfill": "MAX",
>>             "last_backfill_bitwise": 1,
>>             "purged_snaps": "[]",
>>             "history": {
>>                 "epoch_created": 1,
>>                 "last_epoch_started": 30580,
>>                 "last_epoch_clean": 30581,
>>                 "last_epoch_split": 0,
>>                 "last_epoch_marked_full": 0,
>>                 "same_up_since": 30578,
>>                 "same_interval_since": 30579,
>>                 "same_primary_since": 30565,
>>                 "last_scrub": "30554'5390240",
>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>                 "last_deep_scrub": "30554'5390240",
>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>>             },
>>             "stats": {
>>                 "version": "30568'5390574",
>>                 "reported_seq": "5139846",
>>                 "reported_epoch": "30570",
>>                 "state": "active+undersized+degraded+inconsistent",
>>                 "last_fresh": "2018-07-16 13:36:07.003551",
>>                 "last_change": "2018-07-16 13:36:07.002580",
>>                 "last_active": "2018-07-16 13:36:07.003551",
>>                 "last_peered": "2018-07-16 13:36:07.003551",
>>                 "last_clean": "2018-07-16 13:35:50.922619",
>>                 "last_became_active": "2018-07-16 13:36:07.002580",
>>                 "last_became_peered": "2018-07-16 13:36:07.002580",
>>                 "last_unstale": "2018-07-16 13:36:07.003551",
>>                 "last_undegraded": "2018-07-16 13:36:05.922413",
>>                 "last_fullsized": "2018-07-16 13:36:05.922413",
>>                 "mapping_epoch": 30578,
>>                 "log_start": "30537'5387475",
>>                 "ondisk_log_start": "30537'5387475",
>>                 "created": 1,
>>                 "last_epoch_clean": 30570,
>>                 "parent": "0.0",
>>                 "parent_split_bits": 0,
>>                 "last_scrub": "30554'5390240",
>>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>>                 "last_deep_scrub": "30554'5390240",
>>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>>                 "log_size": 3099,
>>                 "ondisk_log_size": 3099,
>>                 "stats_invalid": false,
>>                 "dirty_stats_invalid": false,
>>                 "omap_stats_invalid": false,
>>                 "hitset_stats_invalid": false,
>>                 "hitset_bytes_stats_invalid": false,
>>                 "pin_stats_invalid": true,
>>                 "stat_sum": {
>>                     "num_bytes": 16841281553,
>>                     "num_objects": 4123,
>>                     "num_object_clones": 0,
>>                     "num_object_copies": 12369,
>>                     "num_objects_missing_on_primary": 0,
>>                     "num_objects_missing": 0,
>>                     "num_objects_degraded": 4123,
>>                     "num_objects_misplaced": 0,
>>                     "num_objects_unfound": 0,
>>                     "num_objects_dirty": 4123,
>>                     "num_whiteouts": 0,
>>                     "num_read": 6870027,
>>                     "num_read_kb": 291425720,
>>                     "num_write": 9972832,
>>                     "num_write_kb": 184701853,
>>                     "num_scrub_errors": 1,
>>                     "num_shallow_scrub_errors": 1,
>>                     "num_deep_scrub_errors": 0,
>>                     "num_objects_recovered": 103594,
>>                     "num_bytes_recovered": 424091177351,
>>                     "num_keys_recovered": 110,
>>                     "num_objects_omap": 1,
>>                     "num_objects_hit_set_archive": 0,
>>                     "num_bytes_hit_set_archive": 0,
>>                     "num_flush": 0,
>>                     "num_flush_kb": 0,
>>                     "num_evict": 0,
>>                     "num_evict_kb": 0,
>>                     "num_promote": 0,
>>                     "num_flush_mode_high": 0,
>>                     "num_flush_mode_low": 0,
>>                     "num_evict_mode_some": 0,
>>                     "num_evict_mode_full": 0,
>>                     "num_objects_pinned": 0
>>                 },
>>                 "up": [
>>                     37,
>>                     44,
>>                     16
>>                 ],
>>                 "acting": [
>>                     37,
>>                     44,
>>                     16
>>                 ],
>>                 "blocked_by": [],
>>                 "up_primary": 37,
>>                 "acting_primary": 37
>>             },
>>             "empty": 0,
>>             "dne": 0,
>>             "incomplete": 0,
>>             "last_epoch_started": 30580,
>>             "hit_set_history": {
>>                 "current_last_update": "0'0",
>>                 "history": []
>>             }
>>         }
>>     ],
>>     "recovery_state": [
>>         {
>>             "name": "Started\/Primary\/Active",
>>             "enter_time": "2018-07-16 13:37:13.050211",
>>             "might_have_unfound": [
>>                 {
>>                     "osd": "16",
>>                     "status": "already probed"
>>                 },
>>                 {
>>                     "osd": "44",
>>                     "status": "already probed"
>>                 }
>>             ],
>>             "recovery_progress": {
>>                 "backfill_targets": [],
>>                 "waiting_on_backfill": [],
>>                 "last_backfill_started": "MIN",
>>                 "backfill_info": {
>>                     "begin": "MIN",
>>                     "end": "MIN",
>>                     "objects": []
>>                 },
>>                 "peer_backfill_info": [],
>>                 "backfills_in_flight": [],
>>                 "recovering": [],
>>                 "pg_backend": {
>>                     "pull_from_peer": [],
>>                     "pushing": []
>>                 }
>>             },
>>             "scrub": {
>>                 "scrubber.epoch_start": "0",
>>                 "": 0,
>>                 "scrubber.state": "INACTIVE",
>>                 "scrubber.start": "MIN",
>>                 "scrubber.end": "MIN",
>>                 "scrubber.subset_last_update": "0'0",
>>                 "scrubber.deep": false,
>>                 "scrubber.seed": 0,
>>                 "scrubber.waiting_on": 0,
>>                 "scrubber.waiting_on_whom": []
>>             }
>>         },
>>         {
>>             "name": "Started",
>>             "enter_time": "2018-07-16 13:37:11.980264"
>>         }
>>     ],
>>     "agent_state": {}
>> }
>> On 17/07/18 02:19, Brad Hubbard wrote:
>>> Can we see a pg query of 0.190 ?
>>> On Tue, Jul 17, 2018 at 1:05 AM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>>> Hello,
>>>> We have a cluster that was running hammer (0.94.10). We hit a bug where
>>>> right after seemingly fixing an inconsistent PG, the primary OSD would
>>>> crash and restart. Next deep-scrub will again return inconsistent PG.
>>>> We filled in a bug issue
>>>> that was closed
>>>> since it was a known bug fixed in newer versions of Ceph.
>>>> Now the cluster is running jewel (10.2.11). There is again one
>>>> inconsistent PG with 1 error which not able to fix and with no
>>>> reference to the inconsistent object.
>>>> scrub 0 missing, 1 inconsistent objects
>>>> scrub 1 errors
>>>> We have the logs with debug level 20 while repairing the PG. The one for
>>>> the primary OSD is: 94e20123-fcda-49d7-98a2-919507dfbc92
>>>> Thanks!
>>>> Kind regards,
>>>> --
>>>> Ana Avilés
>>>> Greenhost - sustainable hosting & digital security
>>>> E: ana@xxxxxxxxxxxx
>>>> T: +31 20 4890444
>>>> W:
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at
>> --
>> Ana Avilés
>> Greenhost - sustainable hosting & digital security
>> E: ana@xxxxxxxxxxxx
>> T: +31 20 4890444
>> W:

Ana Avilés
Greenhost - sustainable hosting & digital security
E: ana@xxxxxxxxxxxx
T: +31 20 4890444
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux