Re: Inconsistent PG with 1 inconsistent object not referenced in the log

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



OK. What I *meant* to ask for was the output of "rados
list-inconsistent-obj 0.190" (might still be worth posting that but it
should just confirm findings below).


The relevant lines from the log are below.

2018-07-16 12:24:45.940910 7fb422340700 2 osd.37 pg_epoch: 30554
pg[0.190( v 30554'5390084 (30537'5387075,30554'5390084]
local-les=30554 n=4123 ec=1 les/c/f 30554/30554/0 30552/30553/30542)
[37,44,16] r=0 lpr=30553 crt=30554'5390079 lcod 30554'5390083 mlcod
30554'5390083 active+clean+scrubbing+deep+inconsistent+repair] 0.190
shard 16: soid 0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head
data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from shard 44,
data_digest 0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
mismatch '_' 2018-07-16 12:24:45.940941 7fb422340700 -1
log_channel(cluster) log [ERR] : 0.190 shard 16: soid
0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest
0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest
0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
mismatch '_' 2018-07-16 12:24:45.940957 7fb422340700 -1
log_channel(cluster) log [ERR] : 0.190 shard 37: soid
0:09c2dd3e:::rbd_data.15cec2ae8944a.000000000015c7d6:head data_digest
0x264b7d0d != data_digest 0x7dd0d0bd from shard 44, data_digest
0x264b7d0d != data_digest 0x7dd0d0bd from auth oi
0:618e3778:::rbd_data.15cec2ae8944a.000000000004db0e:head(30537'5509201
osd.36.0:8552301 dirty|data_digest|omap_digest s 4194304 uv 5498082 dd
7dd0d0bd od ffffffff alloc_hint [4194304 4194304]), attr value
mismatch '_'

They show that osd 44 has been chosen as the authoritative shard and
and it has a data digest for this object of 0x7dd0d0bd and that the
data digest in the authoritative object info is also 0x7dd0d0bd.

Shard 16 however, has a data digest of 0x264b7d0d and so does shard 37
so the data for this object on osds 16 and 37 is different to that on
osd 44.

Basically, you'll need to pick which is the "right" copy of the object
(I can't tell you) quiesce traffic to/from that object (rbd image) and
get/put that object back into the cluster to fix the mismatch. Since
this appears to be an rbd image this could potentially result in an
image that needs an fsck or equivalent IIUC.


On Tue, Jul 17, 2018 at 10:06 PM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>
> Hi Brad,
>
> Here it is:
>
> {
>     "state": "active+clean+inconsistent",
>     "snap_trimq": "[]",
>     "epoch": 30581,
>     "up": [
>         37,
>         44,
>         16
>     ],
>     "acting": [
>         37,
>         44,
>         16
>     ],
>     "actingbackfill": [
>         "16",
>         "37",
>         "44"
>     ],
>     "info": {
>         "pgid": "0.190",
>         "last_update": "30581'5420535",
>         "last_complete": "30581'5420535",
>         "log_tail": "30581'5417484",
>         "last_user_version": 5420535,
>         "last_backfill": "MAX",
>         "last_backfill_bitwise": 0,
>         "purged_snaps": "[]",
>         "history": {
>             "epoch_created": 1,
>             "last_epoch_started": 30580,
>             "last_epoch_clean": 30581,
>             "last_epoch_split": 0,
>             "last_epoch_marked_full": 0,
>             "same_up_since": 30578,
>             "same_interval_since": 30579,
>             "same_primary_since": 30565,
>             "last_scrub": "30554'5390240",
>             "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>             "last_deep_scrub": "30554'5390240",
>             "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>             "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>         },
>         "stats": {
>             "version": "30581'5420535",
>             "reported_seq": "5155553",
>             "reported_epoch": "30581",
>             "state": "active+clean+inconsistent",
>             "last_fresh": "2018-07-17 12:02:13.002428",
>             "last_change": "2018-07-16 13:37:24.020403",
>             "last_active": "2018-07-17 12:02:13.002428",
>             "last_peered": "2018-07-17 12:02:13.002428",
>             "last_clean": "2018-07-17 12:02:13.002428",
>             "last_became_active": "2018-07-16 13:37:13.173821",
>             "last_became_peered": "2018-07-16 13:37:13.173821",
>             "last_unstale": "2018-07-17 12:02:13.002428",
>             "last_undegraded": "2018-07-17 12:02:13.002428",
>             "last_fullsized": "2018-07-17 12:02:13.002428",
>             "mapping_epoch": 30578,
>             "log_start": "30581'5417484",
>             "ondisk_log_start": "30581'5417484",
>             "created": 1,
>             "last_epoch_clean": 30581,
>             "parent": "0.0",
>             "parent_split_bits": 0,
>             "last_scrub": "30554'5390240",
>             "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>             "last_deep_scrub": "30554'5390240",
>             "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>             "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>             "log_size": 3051,
>             "ondisk_log_size": 3051,
>             "stats_invalid": false,
>             "dirty_stats_invalid": false,
>             "omap_stats_invalid": false,
>             "hitset_stats_invalid": false,
>             "hitset_bytes_stats_invalid": false,
>             "pin_stats_invalid": true,
>             "stat_sum": {
>                 "num_bytes": 16946139153,
>                 "num_objects": 4148,
>                 "num_object_clones": 0,
>                 "num_object_copies": 12444,
>                 "num_objects_missing_on_primary": 0,
>                 "num_objects_missing": 0,
>                 "num_objects_degraded": 0,
>                 "num_objects_misplaced": 0,
>                 "num_objects_unfound": 0,
>                 "num_objects_dirty": 4148,
>                 "num_whiteouts": 0,
>                 "num_read": 6895104,
>                 "num_read_kb": 292185552,
>                 "num_write": 10032749,
>                 "num_write_kb": 185167701,
>                 "num_scrub_errors": 1,
>                 "num_shallow_scrub_errors": 1,
>                 "num_deep_scrub_errors": 0,
>                 "num_objects_recovered": 103598,
>                 "num_bytes_recovered": 424107954567,
>                 "num_keys_recovered": 110,
>                 "num_objects_omap": 1,
>                 "num_objects_hit_set_archive": 0,
>                 "num_bytes_hit_set_archive": 0,
>                 "num_flush": 0,
>                 "num_flush_kb": 0,
>                 "num_evict": 0,
>                 "num_evict_kb": 0,
>                 "num_promote": 0,
>                 "num_flush_mode_high": 0,
>                 "num_flush_mode_low": 0,
>                 "num_evict_mode_some": 0,
>                 "num_evict_mode_full": 0,
>                 "num_objects_pinned": 0
>             },
>             "up": [
>                 37,
>                 44,
>                 16
>             ],
>             "acting": [
>                 37,
>                 44,
>                 16
>             ],
>             "blocked_by": [],
>             "up_primary": 37,
>             "acting_primary": 37
>         },
>         "empty": 0,
>         "dne": 0,
>         "incomplete": 0,
>         "last_epoch_started": 30580,
>         "hit_set_history": {
>             "current_last_update": "0'0",
>             "history": []
>         }
>     },
>     "peer_info": [
>         {
>             "peer": "16",
>             "pgid": "0.190",
>             "last_update": "30581'5420535",
>             "last_complete": "30581'5420535",
>             "log_tail": "30537'5387475",
>             "last_user_version": 5390577,
>             "last_backfill": "MAX",
>             "last_backfill_bitwise": 1,
>             "purged_snaps": "[]",
>             "history": {
>                 "epoch_created": 1,
>                 "last_epoch_started": 30580,
>                 "last_epoch_clean": 30581,
>                 "last_epoch_split": 0,
>                 "last_epoch_marked_full": 0,
>                 "same_up_since": 30578,
>                 "same_interval_since": 30579,
>                 "same_primary_since": 30565,
>                 "last_scrub": "30554'5390240",
>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>                 "last_deep_scrub": "30554'5390240",
>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>             },
>             "stats": {
>                 "version": "30570'5390575",
>                 "reported_seq": "5139870",
>                 "reported_epoch": "30576",
>                 "state": "active+undersized+degraded+inconsistent",
>                 "last_fresh": "2018-07-16 13:36:40.284756",
>                 "last_change": "2018-07-16 13:36:40.284277",
>                 "last_active": "2018-07-16 13:36:40.284756",
>                 "last_peered": "2018-07-16 13:36:40.284756",
>                 "last_clean": "2018-07-16 13:36:23.558224",
>                 "last_became_active": "2018-07-16 13:36:40.284277",
>                 "last_became_peered": "2018-07-16 13:36:40.284277",
>                 "last_unstale": "2018-07-16 13:36:40.284756",
>                 "last_undegraded": "2018-07-16 13:36:40.203248",
>                 "last_fullsized": "2018-07-16 13:36:40.203248",
>                 "mapping_epoch": 30578,
>                 "log_start": "30537'5387475",
>                 "ondisk_log_start": "30537'5387475",
>                 "created": 1,
>                 "last_epoch_clean": 30576,
>                 "parent": "0.0",
>                 "parent_split_bits": 0,
>                 "last_scrub": "30554'5390240",
>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>                 "last_deep_scrub": "30554'5390240",
>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>                 "log_size": 3100,
>                 "ondisk_log_size": 3100,
>                 "stats_invalid": false,
>                 "dirty_stats_invalid": false,
>                 "omap_stats_invalid": false,
>                 "hitset_stats_invalid": false,
>                 "hitset_bytes_stats_invalid": false,
>                 "pin_stats_invalid": true,
>                 "stat_sum": {
>                     "num_bytes": 16841281553,
>                     "num_objects": 4123,
>                     "num_object_clones": 0,
>                     "num_object_copies": 12369,
>                     "num_objects_missing_on_primary": 0,
>                     "num_objects_missing": 0,
>                     "num_objects_degraded": 4123,
>                     "num_objects_misplaced": 0,
>                     "num_objects_unfound": 0,
>                     "num_objects_dirty": 4123,
>                     "num_whiteouts": 0,
>                     "num_read": 6870027,
>                     "num_read_kb": 291425720,
>                     "num_write": 9972836,
>                     "num_write_kb": 184701865,
>                     "num_scrub_errors": 1,
>                     "num_shallow_scrub_errors": 1,
>                     "num_deep_scrub_errors": 0,
>                     "num_objects_recovered": 103596,
>                     "num_bytes_recovered": 424099565959,
>                     "num_keys_recovered": 110,
>                     "num_objects_omap": 1,
>                     "num_objects_hit_set_archive": 0,
>                     "num_bytes_hit_set_archive": 0,
>                     "num_flush": 0,
>                     "num_flush_kb": 0,
>                     "num_evict": 0,
>                     "num_evict_kb": 0,
>                     "num_promote": 0,
>                     "num_flush_mode_high": 0,
>                     "num_flush_mode_low": 0,
>                     "num_evict_mode_some": 0,
>                     "num_evict_mode_full": 0,
>                     "num_objects_pinned": 0
>                 },
>                 "up": [
>                     37,
>                     44,
>                     16
>                 ],
>                 "acting": [
>                     37,
>                     44,
>                     16
>                 ],
>                 "blocked_by": [],
>                 "up_primary": 37,
>                 "acting_primary": 37
>             },
>             "empty": 0,
>             "dne": 0,
>             "incomplete": 0,
>             "last_epoch_started": 30580,
>             "hit_set_history": {
>                 "current_last_update": "0'0",
>                 "history": []
>             }
>         },
>         {
>             "peer": "44",
>             "pgid": "0.190",
>             "last_update": "30581'5420535",
>             "last_complete": "30570'5390575",
>             "log_tail": "30537'5387475",
>             "last_user_version": 5390575,
>             "last_backfill": "MAX",
>             "last_backfill_bitwise": 1,
>             "purged_snaps": "[]",
>             "history": {
>                 "epoch_created": 1,
>                 "last_epoch_started": 30580,
>                 "last_epoch_clean": 30581,
>                 "last_epoch_split": 0,
>                 "last_epoch_marked_full": 0,
>                 "same_up_since": 30578,
>                 "same_interval_since": 30579,
>                 "same_primary_since": 30565,
>                 "last_scrub": "30554'5390240",
>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>                 "last_deep_scrub": "30554'5390240",
>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555"
>             },
>             "stats": {
>                 "version": "30568'5390574",
>                 "reported_seq": "5139846",
>                 "reported_epoch": "30570",
>                 "state": "active+undersized+degraded+inconsistent",
>                 "last_fresh": "2018-07-16 13:36:07.003551",
>                 "last_change": "2018-07-16 13:36:07.002580",
>                 "last_active": "2018-07-16 13:36:07.003551",
>                 "last_peered": "2018-07-16 13:36:07.003551",
>                 "last_clean": "2018-07-16 13:35:50.922619",
>                 "last_became_active": "2018-07-16 13:36:07.002580",
>                 "last_became_peered": "2018-07-16 13:36:07.002580",
>                 "last_unstale": "2018-07-16 13:36:07.003551",
>                 "last_undegraded": "2018-07-16 13:36:05.922413",
>                 "last_fullsized": "2018-07-16 13:36:05.922413",
>                 "mapping_epoch": 30578,
>                 "log_start": "30537'5387475",
>                 "ondisk_log_start": "30537'5387475",
>                 "created": 1,
>                 "last_epoch_clean": 30570,
>                 "parent": "0.0",
>                 "parent_split_bits": 0,
>                 "last_scrub": "30554'5390240",
>                 "last_scrub_stamp": "2018-07-16 12:27:03.547524",
>                 "last_deep_scrub": "30554'5390240",
>                 "last_deep_scrub_stamp": "2018-07-16 12:27:03.547524",
>                 "last_clean_scrub_stamp": "2018-07-13 08:45:32.622555",
>                 "log_size": 3099,
>                 "ondisk_log_size": 3099,
>                 "stats_invalid": false,
>                 "dirty_stats_invalid": false,
>                 "omap_stats_invalid": false,
>                 "hitset_stats_invalid": false,
>                 "hitset_bytes_stats_invalid": false,
>                 "pin_stats_invalid": true,
>                 "stat_sum": {
>                     "num_bytes": 16841281553,
>                     "num_objects": 4123,
>                     "num_object_clones": 0,
>                     "num_object_copies": 12369,
>                     "num_objects_missing_on_primary": 0,
>                     "num_objects_missing": 0,
>                     "num_objects_degraded": 4123,
>                     "num_objects_misplaced": 0,
>                     "num_objects_unfound": 0,
>                     "num_objects_dirty": 4123,
>                     "num_whiteouts": 0,
>                     "num_read": 6870027,
>                     "num_read_kb": 291425720,
>                     "num_write": 9972832,
>                     "num_write_kb": 184701853,
>                     "num_scrub_errors": 1,
>                     "num_shallow_scrub_errors": 1,
>                     "num_deep_scrub_errors": 0,
>                     "num_objects_recovered": 103594,
>                     "num_bytes_recovered": 424091177351,
>                     "num_keys_recovered": 110,
>                     "num_objects_omap": 1,
>                     "num_objects_hit_set_archive": 0,
>                     "num_bytes_hit_set_archive": 0,
>                     "num_flush": 0,
>                     "num_flush_kb": 0,
>                     "num_evict": 0,
>                     "num_evict_kb": 0,
>                     "num_promote": 0,
>                     "num_flush_mode_high": 0,
>                     "num_flush_mode_low": 0,
>                     "num_evict_mode_some": 0,
>                     "num_evict_mode_full": 0,
>                     "num_objects_pinned": 0
>                 },
>                 "up": [
>                     37,
>                     44,
>                     16
>                 ],
>                 "acting": [
>                     37,
>                     44,
>                     16
>                 ],
>                 "blocked_by": [],
>                 "up_primary": 37,
>                 "acting_primary": 37
>             },
>             "empty": 0,
>             "dne": 0,
>             "incomplete": 0,
>             "last_epoch_started": 30580,
>             "hit_set_history": {
>                 "current_last_update": "0'0",
>                 "history": []
>             }
>         }
>     ],
>     "recovery_state": [
>         {
>             "name": "Started\/Primary\/Active",
>             "enter_time": "2018-07-16 13:37:13.050211",
>             "might_have_unfound": [
>                 {
>                     "osd": "16",
>                     "status": "already probed"
>                 },
>                 {
>                     "osd": "44",
>                     "status": "already probed"
>                 }
>             ],
>             "recovery_progress": {
>                 "backfill_targets": [],
>                 "waiting_on_backfill": [],
>                 "last_backfill_started": "MIN",
>                 "backfill_info": {
>                     "begin": "MIN",
>                     "end": "MIN",
>                     "objects": []
>                 },
>                 "peer_backfill_info": [],
>                 "backfills_in_flight": [],
>                 "recovering": [],
>                 "pg_backend": {
>                     "pull_from_peer": [],
>                     "pushing": []
>                 }
>             },
>             "scrub": {
>                 "scrubber.epoch_start": "0",
>                 "scrubber.active": 0,
>                 "scrubber.state": "INACTIVE",
>                 "scrubber.start": "MIN",
>                 "scrubber.end": "MIN",
>                 "scrubber.subset_last_update": "0'0",
>                 "scrubber.deep": false,
>                 "scrubber.seed": 0,
>                 "scrubber.waiting_on": 0,
>                 "scrubber.waiting_on_whom": []
>             }
>         },
>         {
>             "name": "Started",
>             "enter_time": "2018-07-16 13:37:11.980264"
>         }
>     ],
>     "agent_state": {}
> }
>
>
> On 17/07/18 02:19, Brad Hubbard wrote:
>> Can we see a pg query of 0.190 ?
>>
>> On Tue, Jul 17, 2018 at 1:05 AM, Ana Aviles <ana@xxxxxxxxxxxx> wrote:
>>> Hello,
>>>
>>> We have a cluster that was running hammer (0.94.10). We hit a bug where
>>> right after seemingly fixing an inconsistent PG, the primary OSD would
>>> crash and restart. Next deep-scrub will again return inconsistent PG.
>>>
>>> We filled in a bug issue
>>> https://tracker.ceph.com/issues/24652#change-115654 that was closed
>>> since it was a known bug fixed in newer versions of Ceph.
>>>
>>> Now the cluster is running jewel (10.2.11). There is again one
>>> inconsistent PG with 1 error which not able to fix and with no
>>> reference to the inconsistent object.
>>>
>>>
>>> scrub 0 missing, 1 inconsistent objects
>>> scrub 1 errors
>>>
>>>
>>> We have the logs with debug level 20 while repairing the PG. The one for
>>> the primary OSD is: 94e20123-fcda-49d7-98a2-919507dfbc92
>>>
>>> Thanks!
>>> Kind regards,
>>>
>>>
>>> --
>>> Ana Avilés
>>> Greenhost - sustainable hosting & digital security
>>> E: ana@xxxxxxxxxxxx
>>> T: +31 20 4890444
>>> W: https://greenhost.nl
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>
> --
> Ana Avilés
> Greenhost - sustainable hosting & digital security
> E: ana@xxxxxxxxxxxx
> T: +31 20 4890444
> W: https://greenhost.nl



-- 
Cheers,
Brad
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux