Re: fixing unrepairable inconsistent PG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 29, 2018 at 2:38 AM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
> Hi Brad,
>
> This has helped to repair the issue. Many thanks for your help on this!!!

No problem.

>
> I had so many objects with broken omap checksum, that I spent at least a few hours identifying those and using the commands you've listed to repair. They were all related to one pool called .rgw.buckets.index . All other pools look okay so far.

So originally you said you were having trouble with "one inconsistent
and stubborn PG" When did that become "so many objects"?

>
> I am wondering what could have got horribly wrong with the above pool?

Is that pool 18? I notice it seems to be size 2, what is min_size on that pool?

As to working out what went wrong. What event(s) coincided with or
preceded the problem? What history can you provide? What data can you
provide from the time leading up to when the issue was first seen?

>
> Cheers
>
> Andrei
> ----- Original Message -----
>> From: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
>> To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>> Sent: Thursday, 28 June, 2018 01:08:34
>> Subject: Re:  fixing unrepairable inconsistent PG
>
>> Try the following. You can do this with all osds up and running.
>>
>> # rados -p [name_of_pool_18] setomapval .dir.default.80018061.2
>> temporary-key anything
>> # ceph pg deep-scrub 18.2
>>
>> Once you are sure the scrub has completed and the pg is no longer
>> inconsistent you can remove the temporary key.
>>
>> # rados -p [name_of_pool_18] rmomapkey .dir.default.80018061.2 temporary-key
>>
>>
>> On Wed, Jun 27, 2018 at 9:42 PM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
>>> Here is one more thing:
>>>
>>> rados list-inconsistent-obj 18.2
>>> {
>>>    "inconsistents" : [
>>>       {
>>>          "object" : {
>>>             "locator" : "",
>>>             "version" : 632942,
>>>             "nspace" : "",
>>>             "name" : ".dir.default.80018061.2",
>>>             "snap" : "head"
>>>          },
>>>          "union_shard_errors" : [
>>>             "omap_digest_mismatch_info"
>>>          ],
>>>          "shards" : [
>>>             {
>>>                "osd" : 21,
>>>                "primary" : true,
>>>                "data_digest" : "0xffffffff",
>>>                "omap_digest" : "0x25e8a1da",
>>>                "errors" : [
>>>                   "omap_digest_mismatch_info"
>>>                ],
>>>                "size" : 0
>>>             },
>>>             {
>>>                "data_digest" : "0xffffffff",
>>>                "primary" : false,
>>>                "osd" : 28,
>>>                "errors" : [
>>>                   "omap_digest_mismatch_info"
>>>                ],
>>>                "omap_digest" : "0x25e8a1da",
>>>                "size" : 0
>>>             }
>>>          ],
>>>          "errors" : [],
>>>          "selected_object_info" : {
>>>             "mtime" : "2018-06-19 16:31:44.759717",
>>>             "alloc_hint_flags" : 0,
>>>             "size" : 0,
>>>             "last_reqid" : "client.410876514.0:1",
>>>             "local_mtime" : "2018-06-19 16:31:44.760139",
>>>             "data_digest" : "0xffffffff",
>>>             "truncate_seq" : 0,
>>>             "legacy_snaps" : [],
>>>             "expected_write_size" : 0,
>>>             "watchers" : {},
>>>             "flags" : [
>>>                "dirty",
>>>                "data_digest",
>>>                "omap_digest"
>>>             ],
>>>             "oid" : {
>>>                "pool" : 18,
>>>                "hash" : 1156456354,
>>>                "key" : "",
>>>                "oid" : ".dir.default.80018061.2",
>>>                "namespace" : "",
>>>                "snapid" : -2,
>>>                "max" : 0
>>>             },
>>>             "truncate_size" : 0,
>>>             "version" : "120985'632942",
>>>             "expected_object_size" : 0,
>>>             "omap_digest" : "0xffffffff",
>>>             "lost" : 0,
>>>             "manifest" : {
>>>                "redirect_target" : {
>>>                   "namespace" : "",
>>>                   "snapid" : 0,
>>>                   "max" : 0,
>>>                   "pool" : -9223372036854775808,
>>>                   "hash" : 0,
>>>                   "oid" : "",
>>>                   "key" : ""
>>>                },
>>>                "type" : 0
>>>             },
>>>             "prior_version" : "0'0",
>>>             "user_version" : 632942
>>>          }
>>>       }
>>>    ],
>>>    "epoch" : 121151
>>> }
>>>
>>> Cheers
>>>
>>> ----- Original Message -----
>>>> From: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
>>>> To: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
>>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>>>> Sent: Wednesday, 27 June, 2018 09:10:07
>>>> Subject: Re:  fixing unrepairable inconsistent PG
>>>
>>>> Hi Brad,
>>>>
>>>> Thanks, that helped to get the query info on the inconsistent PG 18.2:
>>>>
>>>> {
>>>>    "state": "active+clean+inconsistent",
>>>>    "snap_trimq": "[]",
>>>>    "snap_trimq_len": 0,
>>>>    "epoch": 121293,
>>>>    "up": [
>>>>        21,
>>>>        28
>>>>    ],
>>>>    "acting": [
>>>>        21,
>>>>        28
>>>>    ],
>>>>    "actingbackfill": [
>>>>        "21",
>>>>        "28"
>>>>    ],
>>>>    "info": {
>>>>        "pgid": "18.2",
>>>>        "last_update": "121290'698339",
>>>>        "last_complete": "121290'698339",
>>>>        "log_tail": "121272'696825",
>>>>        "last_user_version": 698319,
>>>>        "last_backfill": "MAX",
>>>>        "last_backfill_bitwise": 0,
>>>>        "purged_snaps": [],
>>>>        "history": {
>>>>            "epoch_created": 24431,
>>>>            "epoch_pool_created": 24431,
>>>>            "last_epoch_started": 121152,
>>>>            "last_interval_started": 121151,
>>>>            "last_epoch_clean": 121152,
>>>>            "last_interval_clean": 121151,
>>>>            "last_epoch_split": 0,
>>>>            "last_epoch_marked_full": 106367,
>>>>            "same_up_since": 121148,
>>>>            "same_interval_since": 121151,
>>>>            "same_primary_since": 121020,
>>>>            "last_scrub": "121290'698339",
>>>>            "last_scrub_stamp": "2018-06-27 03:55:44.291060",
>>>>            "last_deep_scrub": "121290'698339",
>>>>            "last_deep_scrub_stamp": "2018-06-27 03:55:44.291060",
>>>>            "last_clean_scrub_stamp": "2018-06-11 15:28:20.335739"
>>>>        },
>>>>        "stats": {
>>>>            "version": "121290'698339",
>>>>            "reported_seq": "1055277",
>>>>            "reported_epoch": "121293",
>>>>            "state": "active+clean+inconsistent",
>>>>            "last_fresh": "2018-06-27 08:33:20.764603",
>>>>            "last_change": "2018-06-27 03:55:44.291146",
>>>>            "last_active": "2018-06-27 08:33:20.764603",
>>>>            "last_peered": "2018-06-27 08:33:20.764603",
>>>>            "last_clean": "2018-06-27 08:33:20.764603",
>>>>            "last_became_active": "2018-06-21 16:35:46.487783",
>>>>            "last_became_peered": "2018-06-21 16:35:46.487783",
>>>>            "last_unstale": "2018-06-27 08:33:20.764603",
>>>>            "last_undegraded": "2018-06-27 08:33:20.764603",
>>>>            "last_fullsized": "2018-06-27 08:33:20.764603",
>>>>            "mapping_epoch": 121151,
>>>>            "log_start": "121272'696825",
>>>>            "ondisk_log_start": "121272'696825",
>>>>            "created": 24431,
>>>>            "last_epoch_clean": 121152,
>>>>            "parent": "0.0",
>>>>            "parent_split_bits": 0,
>>>>            "last_scrub": "121290'698339",
>>>>            "last_scrub_stamp": "2018-06-27 03:55:44.291060",
>>>>            "last_deep_scrub": "121290'698339",
>>>>            "last_deep_scrub_stamp": "2018-06-27 03:55:44.291060",
>>>>            "last_clean_scrub_stamp": "2018-06-11 15:28:20.335739",
>>>>            "log_size": 1514,
>>>>            "ondisk_log_size": 1514,
>>>>            "stats_invalid": false,
>>>>            "dirty_stats_invalid": false,
>>>>            "omap_stats_invalid": false,
>>>>            "hitset_stats_invalid": false,
>>>>            "hitset_bytes_stats_invalid": false,
>>>>            "pin_stats_invalid": true,
>>>>            "snaptrimq_len": 0,
>>>>            "stat_sum": {
>>>>                "num_bytes": 0,
>>>>                "num_objects": 116,
>>>>                "num_object_clones": 0,
>>>>                "num_object_copies": 232,
>>>>                "num_objects_missing_on_primary": 0,
>>>>                "num_objects_missing": 0,
>>>>                "num_objects_degraded": 0,
>>>>                "num_objects_misplaced": 0,
>>>>                "num_objects_unfound": 0,
>>>>                "num_objects_dirty": 111,
>>>>                "num_whiteouts": 0,
>>>>                "num_read": 168436,
>>>>                "num_read_kb": 25417188,
>>>>                "num_write": 3370202,
>>>>                "num_write_kb": 0,
>>>>                "num_scrub_errors": 2,
>>>>                "num_shallow_scrub_errors": 0,
>>>>                "num_deep_scrub_errors": 2,
>>>>                "num_objects_recovered": 207,
>>>>                "num_bytes_recovered": 0,
>>>>                "num_keys_recovered": 9482826,
>>>>                "num_objects_omap": 107,
>>>>                "num_objects_hit_set_archive": 0,
>>>>                "num_bytes_hit_set_archive": 0,
>>>>                "num_flush": 0,
>>>>                "num_flush_kb": 0,
>>>>                "num_evict": 0,
>>>>                "num_evict_kb": 0,
>>>>                "num_promote": 0,
>>>>                "num_flush_mode_high": 0,
>>>>                "num_flush_mode_low": 0,
>>>>                "num_evict_mode_some": 0,
>>>>                "num_evict_mode_full": 0,
>>>>                "num_objects_pinned": 0,
>>>>                "num_legacy_snapsets": 0
>>>>            },
>>>>            "up": [
>>>>                21,
>>>>                28
>>>>            ],
>>>>            "acting": [
>>>>               21
>>>>               28
>>>>            ],
>>>>            "blocked_by": [],
>>>>            "up_primary": 21,
>>>>            "acting_primary": 21
>>>>        },
>>>>        "empty": 0,
>>>>        "dne": 0,
>>>>        "incomplete": 0,
>>>>        "last_epoch_started": 121152,
>>>>        "hit_set_history": {
>>>>            "current_last_update": "0'0",
>>>>            "history": []
>>>>        }
>>>>    },
>>>>    "peer_info": [
>>>>        {
>>>>            "peer": "28",
>>>>            "pgid": "18.2",
>>>>            "last_update": "121290'698339",
>>>>            "last_complete": "121172'661331",
>>>>            "log_tail": "121127'652751",
>>>>            "last_user_version": 0,
>>>>            "last_backfill": "MAX",
>>>>            "last_backfill_bitwise": 1,
>>>>            "purged_snaps": [],
>>>>            "history": {
>>>>                "epoch_created": 24431,
>>>>                "epoch_pool_created": 24431,
>>>>                "last_epoch_started": 121152,
>>>>                "last_interval_started": 121151,
>>>>                "last_epoch_clean": 121152,
>>>>                "last_interval_clean": 121151,
>>>>                "last_epoch_split": 0,
>>>>                "last_epoch_marked_full": 106367,
>>>>                "same_up_since": 121148,
>>>>                "same_interval_since": 121151,
>>>>                "same_primary_since": 121020,
>>>>                "last_scrub": "121290'698339",
>>>>                "last_scrub_stamp": "2018-06-27 03:55:44.291060",
>>>>                "last_deep_scrub": "121290'698339",
>>>>                "last_deep_scrub_stamp": "2018-06-27 03:55:44.291060",
>>>>                "last_clean_scrub_stamp": "2018-06-11 15:28:20.335739"
>>>>            },
>>>>            "stats": {
>>>>                "version": "121131'654251",
>>>>                "reported_seq": "959540",
>>>>                "reported_epoch": "121150",
>>>>                "state": "active+undersized+degraded+remapped+inconsistent+backfilling",
>>>>                "last_fresh": "2018-06-21 16:35:44.468284",
>>>>                "last_change": "2018-06-21 16:34:12.447803",
>>>>                "last_active": "2018-06-21 16:35:44.468284",
>>>>                "last_peered": "2018-06-21 16:35:44.468284",
>>>>                "last_clean": "2018-06-21 16:27:07.835328",
>>>>                "last_became_active": "2018-06-21 16:33:24.246631",
>>>>                "last_became_peered": "2018-06-21 16:33:24.246631",
>>>>                "last_unstale": "2018-06-21 16:35:44.468284",
>>>>                "last_undegraded": "2018-06-21 16:33:23.997020",
>>>>                "last_fullsized": "2018-06-21 16:33:23.994195",
>>>>                "mapping_epoch": 121151,
>>>>                "log_start": "121127'652725",
>>>>                "created": 24431,
>>>>                "last_epoch_clean": 121145,
>>>>                "parent": "0.0",
>>>>                "parent_split_bits": 0,
>>>>                "last_scrub": "121131'654251",
>>>>                "last_scrub_stamp": "2018-06-21 16:27:07.835266",
>>>>                "last_deep_scrub": "121131'654251",
>>>>                "last_deep_scrub_stamp": "2018-06-21 16:27:07.835266",
>>>>                "last_clean_scrub_stamp": "2018-06-11 15:28:20.335739",
>>>>                "log_size": 1526,
>>>>                "ondisk_log_size": 1526,
>>>>                "stats_invalid": false,
>>>>                "dirty_stats_invalid": false,
>>>>                "omap_stats_invalid": false,
>>>>                "hitset_stats_invalid": false,
>>>>                "hitset_bytes_stats_invalid": false,
>>>>                "pin_stats_invalid": true,
>>>>                "snaptrimq_len": 0,
>>>>                "stat_sum": {
>>>>                    "num_bytes": 0,
>>>>                    "num_objects": 69,
>>>>                    "num_object_clones": 0,
>>>>                    "num_object_copies": 138,
>>>>                    "num_objects_missing_on_primary": 0,
>>>>                    "num_objects_missing": 0,
>>>>                    "num_objects_degraded": 1,
>>>>                    "num_objects_misplaced": 0,
>>>>                    "num_objects_unfound": 0,
>>>>                    "num_objects_dirty": 64,
>>>>                    "num_whiteouts": 0,
>>>>                    "num_read": 14057,
>>>>                    "num_read_kb": 454200,
>>>>                    "num_write": 797911,
>>>>                    "num_write_kb": 0,
>>>>                    "num_scrub_errors": 0,
>>>>                    "num_shallow_scrub_errors": 0,
>>>>                    "num_deep_scrub_errors": 0,
>>>>                    "num_objects_recovered": 207,
>>>>                    "num_bytes_recovered": 0,
>>>>                    "num_keys_recovered": 9482826,
>>>>                    "num_objects_omap": 60,
>>>>                    "num_objects_hit_set_archive": 0,
>>>>                    "num_bytes_hit_set_archive": 0,
>>>>                    "num_flush": 0,
>>>>                    "num_flush_kb": 0,
>>>>                    "num_evict": 0,
>>>>                    "num_evict_kb": 0,
>>>>                    "num_promote": 0,
>>>>                    "num_flush_mode_high": 0,
>>>>                    "num_flush_mode_low": 0,
>>>>                    "num_evict_mode_some": 0,
>>>>                    "num_evict_mode_full": 0,
>>>>                    "num_objects_pinned": 0,
>>>>                    "num_legacy_snapsets": 0
>>>>                },
>>>>                "up": [
>>>>                    21,
>>>>                    28
>>>>                ],
>>>>                "acting": [
>>>>                    21,
>>>>                    28
>>>>                ],
>>>>                "blocked_by": [],
>>>>                "up_primary": 21,
>>>>                "acting_primary": 21
>>>>            },
>>>>            "empty": 0,
>>>>            "dne": 0,
>>>>            "incomplete": 0,
>>>>            "last_epoch_started": 121152,
>>>>            "hit_set_history": {
>>>>                "current_last_update": "0'0",
>>>>                "history": []
>>>>            }
>>>>        }
>>>>    ],
>>>>    "recovery_state": [
>>>>        {
>>>>            "name": "Started/Primary/Active",
>>>>            "enter_time": "2018-06-21 16:35:46.478007",
>>>>            "might_have_unfound": [],
>>>>            "recovery_progress": {
>>>>                "backfill_targets": [],
>>>>                "waiting_on_backfill": [],
>>>>                "last_backfill_started": "MIN",
>>>>                "backfill_info": {
>>>>                    "begin": "MIN",
>>>>                    "end": "MIN",
>>>>                    "objects": []
>>>>                },
>>>>                "peer_backfill_info": [],
>>>>                "backfills_in_flight": [],
>>>>                "recovering": [],
>>>>                "pg_backend": {
>>>>                    "pull_from_peer": [],
>>>>                    "pushing": []
>>>>                }
>>>>            },
>>>>            "scrub": {
>>>>                "scrubber.epoch_start": "121151",
>>>>                "scrubber.active": false,
>>>>                "scrubber.state": "INACTIVE",
>>>>                "scrubber.start": "MIN",
>>>>                "scrubber.end": "MIN",
>>>>                "scrubber.subset_last_update": "0'0",
>>>>                "scrubber.deep": false,
>>>>                "scrubber.seed": 0,
>>>>                "scrubber.waiting_on": 0,
>>>>                "scrubber.waiting_on_whom": []
>>>>            }
>>>>        },
>>>>        {
>>>>            "name": "Started",
>>>>            "enter_time": "2018-06-21 16:35:45.052939"
>>>>        }
>>>>    ],
>>>>    "agent_state": {}
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>> Thanks for trying to help out.
>>>>
>>>> Cheers
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
>>>>> To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
>>>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>>>>> Sent: Wednesday, 27 June, 2018 00:18:19
>>>>> Subject: Re:  fixing unrepairable inconsistent PG
>>>>
>>>>> Try setting the osd caps to 'allow *' for client.admin or running the
>>>>> command using an id that has that access such as
>>>>> mgr.arh-ibstorage1-ib.
>>>>>
>>>>> On Wed, Jun 27, 2018 at 1:32 AM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
>>>>>> Hi Brad,
>>>>>>
>>>>>> Here is the output of the "ceph auth list" command (I have removed the key: line
>>>>>> which was present in every single entry, including the osd.21):
>>>>>>
>>>>>> # ceph auth list
>>>>>> installed auth entries:
>>>>>>
>>>>>> mds.arh-ibstorage1-ib
>>>>>>         caps: [mds] allow
>>>>>>         caps: [mgr] allow profile mds
>>>>>>         caps: [mon] allow profile mds
>>>>>>         caps: [osd] allow *
>>>>>> mds.arh-ibstorage2-ib
>>>>>>         caps: [mds] allow
>>>>>>         caps: [mgr] allow profile mds
>>>>>>         caps: [mon] allow profile mds
>>>>>>         caps: [osd] allow *
>>>>>> osd.0
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.1
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.10
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.11
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.12
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.13
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.14
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.15
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.16
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.17
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.18
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.19
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.2
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.20
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.21
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.22
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.23
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.24
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.25
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.27
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.28
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.29
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.3
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.30
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.4
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.5
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.6
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.7
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.8
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> osd.9
>>>>>>         caps: [mgr] allow profile osd
>>>>>>         caps: [mon] allow profile osd
>>>>>>         caps: [osd] allow *
>>>>>> client.admin
>>>>>>         caps: [mds] allow rwx
>>>>>>         caps: [mgr] allow *
>>>>>>         caps: [mon] allow rwx
>>>>>>         caps: [osd] allow rwx
>>>>>> client.arh-ibstorage1-ib.csprdc.arhont.com
>>>>>>         caps: [mgr] allow r
>>>>>>         caps: [mon] allow rw
>>>>>>         caps: [osd] allow rwx
>>>>>> client.bootstrap-mds
>>>>>>         caps: [mgr] allow r
>>>>>>         caps: [mon] allow profile bootstrap-mds
>>>>>> client.bootstrap-mgr
>>>>>>         caps: [mon] allow profile bootstrap-mgr
>>>>>> client.bootstrap-osd
>>>>>>         caps: [mgr] allow r
>>>>>>         caps: [mon] allow profile bootstrap-osd
>>>>>> client.bootstrap-rgw
>>>>>>         caps: [mgr] allow r
>>>>>>         caps: [mon] allow profile bootstrap-rgw
>>>>>> client.ceph-monitors
>>>>>>         caps: [mgr] allow r
>>>>>>         caps: [mon] allow r
>>>>>> client.libvirt
>>>>>>         caps: [mgr] allow r
>>>>>>         caps: [mon] allow r
>>>>>>         caps: [osd] allow class-read object_prefix rbd_children, allow rwx
>>>>>>         pool=libvirt-pool
>>>>>> client.primary-ubuntu-1
>>>>>>         caps: [mgr] allow r
>>>>>>         caps: [mon] allow r
>>>>>>         caps: [osd] allow rwx pool=Primary-ubuntu-1
>>>>>> client.radosgw1.gateway
>>>>>>         caps: [mgr] allow r
>>>>>>         caps: [mon] allow rwx
>>>>>>         caps: [osd] allow rwx
>>>>>> client.radosgw2.gateway
>>>>>>         caps: [mgr] allow r
>>>>>>         caps: [mon] allow rw
>>>>>>         caps: [osd] allow rwx
>>>>>> client.ssdcs
>>>>>>         caps: [mgr] allow r
>>>>>>         caps: [mon] allow r
>>>>>>         caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=ssdcs
>>>>>>
>>>>>> mgr.arh-ibstorage1-ib
>>>>>>         caps: [mds] allow *
>>>>>>         caps: [mon] allow profile mgr
>>>>>>         caps: [osd] allow *
>>>>>> mgr.arh-ibstorage2-ib
>>>>>>         caps: [mds] allow *
>>>>>>         caps: [mon] allow profile mgr
>>>>>>         caps: [osd] allow *
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I have ran this command on all pgs in the cluster and it shows the same error
>>>>>> message for all of them. For example:
>>>>>>
>>>>>> Error EPERM: problem getting command descriptions from pg.5.1c9
>>>>>>
>>>>>> Andrei
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
>>>>>>> To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
>>>>>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>>>>>>> Sent: Tuesday, 26 June, 2018 01:10:34
>>>>>>> Subject: Re:  fixing unrepairable inconsistent PG
>>>>>>
>>>>>>> Interesing...
>>>>>>>
>>>>>>> Can I see the output of "ceph auth list" and can you test whether you
>>>>>>> can query any other pg that has osd.21 as its primary?
>>>>>>>
>>>>>>> On Mon, Jun 25, 2018 at 8:04 PM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
>>>>>>>> Hi Brad,
>>>>>>>>
>>>>>>>> here is the output:
>>>>>>>>
>>>>>>>> --------------
>>>>>>>>
>>>>>>>> root@arh-ibstorage1-ib:/home/andrei# ceph --debug_ms 5 --debug_auth 20 pg 18.2
>>>>>>>> query
>>>>>>>> 2018-06-25 10:59:12.100302 7fe23eaa1700  2 Event(0x7fe2400e0140 nevent=5000
>>>>>>>> time_id=1).set_owner idx=0 owner=140609690670848
>>>>>>>> 2018-06-25 10:59:12.100398 7fe23e2a0700  2 Event(0x7fe24010d030 nevent=5000
>>>>>>>> time_id=1).set_owner idx=1 owner=140609682278144
>>>>>>>> 2018-06-25 10:59:12.100445 7fe23da9f700  2 Event(0x7fe240139ec0 nevent=5000
>>>>>>>> time_id=1).set_owner idx=2 owner=140609673885440
>>>>>>>> 2018-06-25 10:59:12.100793 7fe244b28700  1  Processor -- start
>>>>>>>> 2018-06-25 10:59:12.100869 7fe244b28700  1 -- - start start
>>>>>>>> 2018-06-25 10:59:12.100882 7fe244b28700  5 adding auth protocol: cephx
>>>>>>>> 2018-06-25 10:59:12.101046 7fe244b28700  2 auth: KeyRing::load: loaded key file
>>>>>>>> /etc/ceph/ceph.client.admin.keyring
>>>>>>>> 2018-06-25 10:59:12.101244 7fe244b28700  1 -- - --> 192.168.168.201:6789/0 --
>>>>>>>> auth(proto 0 30 bytes epoch 0) v1 -- 0x7fe240174b80 con 0
>>>>>>>> 2018-06-25 10:59:12.101264 7fe244b28700  1 -- - --> 192.168.168.202:6789/0 --
>>>>>>>> auth(proto 0 30 bytes epoch 0) v1 -- 0x7fe240175010 con 0
>>>>>>>> 2018-06-25 10:59:12.101690 7fe23e2a0700  1 -- 192.168.168.201:0/3046734987
>>>>>>>> learned_addr learned my addr 192.168.168.201:0/3046734987
>>>>>>>> 2018-06-25 10:59:12.101890 7fe23e2a0700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ
>>>>>>>> pgs=0 cs=0 l=1)._process_connection got newly_acked_seq 0 vs out_seq 0
>>>>>>>> 2018-06-25 10:59:12.102030 7fe23da9f700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.201:6789/0 conn(0x7fe24017a420 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ
>>>>>>>> pgs=0 cs=0 l=1)._process_connection got newly_acked_seq 0 vs out_seq 0
>>>>>>>> 2018-06-25 10:59:12.102450 7fe23e2a0700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1
>>>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=472363 cs=1 l=1). rx mon.1
>>>>>>>> seq 1 0x7fe234002670 mon_map magic: 0 v1
>>>>>>>> 2018-06-25 10:59:12.102494 7fe23e2a0700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1
>>>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=472363 cs=1 l=1). rx mon.1
>>>>>>>> seq 2 0x7fe234002b70 auth_reply(proto 2 0 (0) Success) v1
>>>>>>>> 2018-06-25 10:59:12.102542 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>>>> mon.1 192.168.168.202:6789/0 1 ==== mon_map magic: 0 v1 ==== 505+0+0
>>>>>>>> (2386987630 0 0) 0x7fe234002670 con 0x7fe240176dc0
>>>>>>>> 2018-06-25 10:59:12.102629 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>>>> mon.1 192.168.168.202:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ====
>>>>>>>> 33+0+0 (1469975654 0 0) 0x7fe234002b70 con 0x7fe240176dc0
>>>>>>>> 2018-06-25 10:59:12.102655 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-25 10:59:12.102657 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-25 10:59:12.102658 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-25 10:59:12.102661 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service auth
>>>>>>>> 2018-06-25 10:59:12.102662 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>>>> 0 need 53
>>>>>>>> 2018-06-25 10:59:12.102666 7fe23ca9d700 10 cephx client: handle_response ret = 0
>>>>>>>> 2018-06-25 10:59:12.102671 7fe23ca9d700 10 cephx client:  got initial server
>>>>>>>> challenge 6522ec95fb2eb487
>>>>>>>> 2018-06-25 10:59:12.102673 7fe23ca9d700 10 cephx client: validate_tickets:
>>>>>>>> want=53 need=53 have=0
>>>>>>>> 2018-06-25 10:59:12.102674 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-25 10:59:12.102675 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-25 10:59:12.102676 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-25 10:59:12.102676 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service auth
>>>>>>>> 2018-06-25 10:59:12.102677 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>>>> 0 need 53
>>>>>>>> 2018-06-25 10:59:12.102678 7fe23ca9d700 10 cephx client: want=53 need=53 have=0
>>>>>>>> 2018-06-25 10:59:12.102680 7fe23ca9d700 10 cephx client: build_request
>>>>>>>> 2018-06-25 10:59:12.102702 7fe23da9f700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.201:6789/0 conn(0x7fe24017a420 :-1
>>>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=333625 cs=1 l=1). rx mon.0
>>>>>>>> seq 1 0x7fe228001490 mon_map magic: 0 v1
>>>>>>>> 2018-06-25 10:59:12.102739 7fe23ca9d700 10 cephx client: get auth session key:
>>>>>>>> client_challenge 80f2a24093f783c5
>>>>>>>> 2018-06-25 10:59:12.102743 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>>>> 192.168.168.202:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7fe224002080
>>>>>>>> con 0
>>>>>>>> 2018-06-25 10:59:12.102737 7fe23da9f700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.201:6789/0 conn(0x7fe24017a420 :-1
>>>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=333625 cs=1 l=1). rx mon.0
>>>>>>>> seq 2 0x7fe2280019c0 auth_reply(proto 2 0 (0) Success) v1
>>>>>>>> 2018-06-25 10:59:12.102776 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>>>> mon.0 192.168.168.201:6789/0 1 ==== mon_map magic: 0 v1 ==== 505+0+0
>>>>>>>> (2386987630 0 0) 0x7fe228001490 con 0x7fe24017a420
>>>>>>>> 2018-06-25 10:59:12.102821 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>>>> mon.0 192.168.168.201:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ====
>>>>>>>> 33+0+0 (3800394028 0 0) 0x7fe2280019c0 con 0x7fe24017a420
>>>>>>>> 2018-06-25 10:59:12.102833 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-25 10:59:12.102834 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-25 10:59:12.102835 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-25 10:59:12.102836 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service auth
>>>>>>>> 2018-06-25 10:59:12.102837 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>>>> 0 need 53
>>>>>>>> 2018-06-25 10:59:12.102839 7fe23ca9d700 10 cephx client: handle_response ret = 0
>>>>>>>> 2018-06-25 10:59:12.102841 7fe23ca9d700 10 cephx client:  got initial server
>>>>>>>> challenge ccd69ce967642f7
>>>>>>>> 2018-06-25 10:59:12.102842 7fe23ca9d700 10 cephx client: validate_tickets:
>>>>>>>> want=53 need=53 have=0
>>>>>>>> 2018-06-25 10:59:12.102843 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-25 10:59:12.102843 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-25 10:59:12.102844 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-25 10:59:12.102845 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service auth
>>>>>>>> 2018-06-25 10:59:12.102845 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>>>> 0 need 53
>>>>>>>> 2018-06-25 10:59:12.102846 7fe23ca9d700 10 cephx client: want=53 need=53 have=0
>>>>>>>> 2018-06-25 10:59:12.102848 7fe23ca9d700 10 cephx client: build_request
>>>>>>>> 2018-06-25 10:59:12.102881 7fe23ca9d700 10 cephx client: get auth session key:
>>>>>>>> client_challenge 6ddb6fdc4176ea6a
>>>>>>>> 2018-06-25 10:59:12.102884 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>>>> 192.168.168.201:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7fe2240032d0
>>>>>>>> con 0
>>>>>>>> 2018-06-25 10:59:12.103402 7fe23e2a0700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1
>>>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=472363 cs=1 l=1). rx mon.1
>>>>>>>> seq 3 0x7fe234002200 auth_reply(proto 2 0 (0) Success) v1
>>>>>>>> 2018-06-25 10:59:12.103449 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>>>> mon.1 192.168.168.202:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ====
>>>>>>>> 206+0+0 (195487815 0 0) 0x7fe234002200 con 0x7fe240176dc0
>>>>>>>> 2018-06-25 10:59:12.103468 7fe23ca9d700 10 cephx client: handle_response ret = 0
>>>>>>>> 2018-06-25 10:59:12.103469 7fe23ca9d700 10 cephx client:  get_auth_session_key
>>>>>>>> 2018-06-25 10:59:12.103471 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>>>> got 1 keys
>>>>>>>> 2018-06-25 10:59:12.103472 7fe23ca9d700 10 cephx: got key for service_id auth
>>>>>>>> 2018-06-25 10:59:12.103508 7fe23ca9d700 10 cephx:  ticket.secret_id=3687
>>>>>>>> 2018-06-25 10:59:12.103510 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>>>> service auth secret_id 3687 session_key [KEY] validity=43200.000000
>>>>>>>> 2018-06-25 10:59:12.103527 7fe23ca9d700 10 cephx: ticket expires=2018-06-25
>>>>>>>> 22:59:12.103526 renew_after=2018-06-25 19:59:12.103526
>>>>>>>> 2018-06-25 10:59:12.103533 7fe23ca9d700 10 cephx client:  want=53 need=53 have=0
>>>>>>>> 2018-06-25 10:59:12.103534 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-25 10:59:12.103535 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-25 10:59:12.103536 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-25 10:59:12.103537 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>>>> 32 need 21
>>>>>>>> 2018-06-25 10:59:12.103539 7fe23ca9d700 10 cephx client: validate_tickets:
>>>>>>>> want=53 need=21 have=32
>>>>>>>> 2018-06-25 10:59:12.103540 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-25 10:59:12.103541 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-25 10:59:12.103542 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-25 10:59:12.103542 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>>>> 32 need 21
>>>>>>>> 2018-06-25 10:59:12.103543 7fe23ca9d700 10 cephx client: want=53 need=21 have=32
>>>>>>>> 2018-06-25 10:59:12.103544 7fe23ca9d700 10 cephx client: build_request
>>>>>>>> 2018-06-25 10:59:12.103545 7fe23ca9d700 10 cephx client: get service keys:
>>>>>>>> want=53 need=21 have=32
>>>>>>>> 2018-06-25 10:59:12.103570 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>>>> 192.168.168.202:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x7fe224007010
>>>>>>>> con 0
>>>>>>>> 2018-06-25 10:59:12.103657 7fe23da9f700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.201:6789/0 conn(0x7fe24017a420 :-1
>>>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=333625 cs=1 l=1). rx mon.0
>>>>>>>> seq 3 0x7fe228001020 auth_reply(proto 2 0 (0) Success) v1
>>>>>>>> 2018-06-25 10:59:12.103709 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>>>> mon.0 192.168.168.201:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ====
>>>>>>>> 206+0+0 (2366624548 0 0) 0x7fe228001020 con 0x7fe24017a420
>>>>>>>> 2018-06-25 10:59:12.103729 7fe23ca9d700 10 cephx client: handle_response ret = 0
>>>>>>>> 2018-06-25 10:59:12.103731 7fe23ca9d700 10 cephx client:  get_auth_session_key
>>>>>>>> 2018-06-25 10:59:12.103733 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>>>> got 1 keys
>>>>>>>> 2018-06-25 10:59:12.103734 7fe23ca9d700 10 cephx: got key for service_id auth
>>>>>>>> 2018-06-25 10:59:12.103774 7fe23ca9d700 10 cephx:  ticket.secret_id=3687
>>>>>>>> 2018-06-25 10:59:12.103776 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>>>> service auth secret_id 3687 session_key [KEY] validity=43200.000000
>>>>>>>> 2018-06-25 10:59:12.103792 7fe23ca9d700 10 cephx: ticket expires=2018-06-25
>>>>>>>> 22:59:12.103791 renew_after=2018-06-25 19:59:12.103791
>>>>>>>> 2018-06-25 10:59:12.103798 7fe23ca9d700 10 cephx client:  want=53 need=53 have=0
>>>>>>>> 2018-06-25 10:59:12.103799 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-25 10:59:12.103800 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-25 10:59:12.103801 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-25 10:59:12.103802 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>>>> 32 need 21
>>>>>>>> 2018-06-25 10:59:12.103804 7fe23ca9d700 10 cephx client: validate_tickets:
>>>>>>>> want=53 need=21 have=32
>>>>>>>> 2018-06-25 10:59:12.103806 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-25 10:59:12.103806 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-25 10:59:12.103807 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-25 10:59:12.103808 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>>>> 32 need 21
>>>>>>>> 2018-06-25 10:59:12.103808 7fe23ca9d700 10 cephx client: want=53 need=21 have=32
>>>>>>>> 2018-06-25 10:59:12.103812 7fe23ca9d700 10 cephx client: build_request
>>>>>>>> 2018-06-25 10:59:12.103813 7fe23ca9d700 10 cephx client: get service keys:
>>>>>>>> want=53 need=21 have=32
>>>>>>>> 2018-06-25 10:59:12.103834 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>>>> 192.168.168.201:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x7fe224009dd0
>>>>>>>> con 0
>>>>>>>> 2018-06-25 10:59:12.104168 7fe23e2a0700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1
>>>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=472363 cs=1 l=1). rx mon.1
>>>>>>>> seq 4 0x7fe234002200 auth_reply(proto 2 0 (0) Success) v1
>>>>>>>> 2018-06-25 10:59:12.104201 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>>>> mon.1 192.168.168.202:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ====
>>>>>>>> 580+0+0 (56981162 0 0) 0x7fe234002200 con 0x7fe240176dc0
>>>>>>>> 2018-06-25 10:59:12.104223 7fe23ca9d700 10 cephx client: handle_response ret = 0
>>>>>>>> 2018-06-25 10:59:12.104226 7fe23ca9d700 10 cephx client:
>>>>>>>> get_principal_session_key session_key [KEY]
>>>>>>>> 2018-06-25 10:59:12.104238 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>>>> got 3 keys
>>>>>>>> 2018-06-25 10:59:12.104240 7fe23ca9d700 10 cephx: got key for service_id mon
>>>>>>>> 2018-06-25 10:59:12.104276 7fe23ca9d700 10 cephx:  ticket.secret_id=44205
>>>>>>>> 2018-06-25 10:59:12.104277 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>>>> service mon secret_id 44205 session_key [KEY] validity=3600.000000
>>>>>>>> 2018-06-25 10:59:12.104285 7fe23ca9d700 10 cephx: ticket expires=2018-06-25
>>>>>>>> 11:59:12.104284 renew_after=2018-06-25 11:44:12.104284
>>>>>>>> 2018-06-25 10:59:12.104290 7fe23ca9d700 10 cephx: got key for service_id osd
>>>>>>>> 2018-06-25 10:59:12.104313 7fe23ca9d700 10 cephx:  ticket.secret_id=44205
>>>>>>>> 2018-06-25 10:59:12.104314 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>>>> service osd secret_id 44205 session_key [KEY] validity=3600.000000
>>>>>>>> 2018-06-25 10:59:12.104329 7fe23ca9d700 10 cephx: ticket expires=2018-06-25
>>>>>>>> 11:59:12.104329 renew_after=2018-06-25 11:44:12.104329
>>>>>>>> 2018-06-25 10:59:12.104333 7fe23ca9d700 10 cephx: got key for service_id mgr
>>>>>>>> 2018-06-25 10:59:12.104355 7fe23ca9d700 10 cephx:  ticket.secret_id=204
>>>>>>>> 2018-06-25 10:59:12.104356 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>>>> service mgr secret_id 204 session_key [KEY] validity=3600.000000
>>>>>>>> 2018-06-25 10:59:12.104368 7fe23ca9d700 10 cephx: ticket expires=2018-06-25
>>>>>>>> 11:59:12.104368 renew_after=2018-06-25 11:44:12.104368
>>>>>>>> 2018-06-25 10:59:12.104373 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>>>> 53 need 0
>>>>>>>> 2018-06-25 10:59:12.104376 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.201:6789/0 conn(0x7fe24017a420 :-1 s=STATE_OPEN pgs=333625 cs=1
>>>>>>>> l=1).mark_down
>>>>>>>> 2018-06-25 10:59:12.104384 7fe23ca9d700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.201:6789/0 conn(0x7fe24017a420 :-1 s=STATE_OPEN pgs=333625 cs=1
>>>>>>>> l=1)._stop
>>>>>>>> 2018-06-25 10:59:12.104426 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>>>> 192.168.168.202:6789/0 -- mon_subscribe({monmap=0+}) v2 -- 0x7fe240180bb0 con 0
>>>>>>>> 2018-06-25 10:59:12.104442 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>>>> 53 need 0
>>>>>>>> 2018-06-25 10:59:12.104444 7fe23ca9d700 20 cephx client: need_tickets: want=53
>>>>>>>> have=53 need=0
>>>>>>>> 2018-06-25 10:59:12.104481 7fe244b28700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>>>> 192.168.168.202:6789/0 -- mon_subscribe({mgrmap=0+}) v2 -- 0x7fe240175010 con 0
>>>>>>>> 2018-06-25 10:59:12.104573 7fe244b28700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>>>> 192.168.168.202:6789/0 -- mon_subscribe({osdmap=0}) v2 -- 0x7fe24017ea90 con 0
>>>>>>>> 2018-06-25 10:59:12.104979 7fe23e2a0700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1
>>>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=472363 cs=1 l=1). rx mon.1
>>>>>>>> seq 5 0x7fe234002b90 mon_map magic: 0 v1
>>>>>>>> 2018-06-25 10:59:12.105008 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>>>> mon.1 192.168.168.202:6789/0 5 ==== mon_map magic: 0 v1 ==== 505+0+0
>>>>>>>> (2386987630 0 0) 0x7fe234002b90 con 0x7fe240176dc0
>>>>>>>> 2018-06-25 10:59:12.105022 7fe23e2a0700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1
>>>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=472363 cs=1 l=1). rx mon.1
>>>>>>>> seq 6 0x7fe234001a60 mgrmap(e 139) v1
>>>>>>>> 2018-06-25 10:59:12.105058 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>>>> mon.1 192.168.168.202:6789/0 6 ==== mgrmap(e 139) v1 ==== 381+0+0 (56579516 0
>>>>>>>> 0) 0x7fe234001a60 con 0x7fe240176dc0
>>>>>>>> 2018-06-25 10:59:12.105066 7fe23e2a0700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1
>>>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=472363 cs=1 l=1). rx mon.1
>>>>>>>> seq 7 0x7fe234002610 osd_map(121251..121251 src has 120729..121251) v3
>>>>>>>> 2018-06-25 10:59:12.105110 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>>>> mon.1 192.168.168.202:6789/0 7 ==== osd_map(121251..121251 src has
>>>>>>>> 120729..121251) v3 ==== 18118+0+0 (421862548 0 0) 0x7fe234002610 con
>>>>>>>> 0x7fe240176dc0
>>>>>>>> 2018-06-25 10:59:12.105405 7fe23da9f700 10 cephx client: build_authorizer for
>>>>>>>> service mgr
>>>>>>>> 2018-06-25 10:59:12.105685 7fe23da9f700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.201:6840/32624 conn(0x7fe2240127b0 :-1
>>>>>>>> s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=1)._process_connection got
>>>>>>>> newly_acked_seq 0 vs out_seq 0
>>>>>>>> 2018-06-25 10:59:12.105720 7fe23da9f700 10 In get_auth_session_handler for
>>>>>>>> protocol 2
>>>>>>>> 2018-06-25 10:59:12.108653 7fe244b28700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>>>> 192.168.168.203:6828/43673 -- command(tid 1: {"prefix":
>>>>>>>> "get_command_descriptions", "pgid": "18.2"}) v1 -- 0x7fe240184580 con 0
>>>>>>>> 2018-06-25 10:59:12.109327 7fe23eaa1700 10 cephx client: build_authorizer for
>>>>>>>> service osd
>>>>>>>> 2018-06-25 10:59:12.109828 7fe23eaa1700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.203:6828/43673 conn(0x7fe240180f20 :-1
>>>>>>>> s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=1)._process_connection got
>>>>>>>> newly_acked_seq 0 vs out_seq 0
>>>>>>>> 2018-06-25 10:59:12.109875 7fe23eaa1700 10 In get_auth_session_handler for
>>>>>>>> protocol 2
>>>>>>>> 2018-06-25 10:59:12.109921 7fe23eaa1700 10 _calc_signature seq 1 front_crc_ =
>>>>>>>> 2696387361 middle_crc = 0 data_crc = 0 sig = 10077981589201542762
>>>>>>>> 2018-06-25 10:59:12.109930 7fe23eaa1700 20 Putting signature in client
>>>>>>>> message(seq # 1): sig = 10077981589201542762
>>>>>>>> 2018-06-25 10:59:12.110382 7fe23eaa1700 10 _calc_signature seq 1 front_crc_ =
>>>>>>>> 1943489909 middle_crc = 0 data_crc = 0 sig = 6955259887491975287
>>>>>>>> 2018-06-25 10:59:12.110394 7fe23eaa1700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.203:6828/43673 conn(0x7fe240180f20 :-1
>>>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=26353 cs=1 l=1). rx osd.21
>>>>>>>> seq 1 0x7fe238000f20 command_reply(tid 1: -1 ) v1
>>>>>>>> 2018-06-25 10:59:12.110436 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>>>> osd.21 192.168.168.203:6828/43673 1 ==== command_reply(tid 1: -1 ) v1 ====
>>>>>>>> 8+0+0 (1943489909 0 0) 0x7fe238000f20 con 0x7fe240180f20
>>>>>>>> Error EPERM: problem getting command descriptions from pg.18.2
>>>>>>>> 2018-06-25 10:59:12.112168 7fe244b28700  1 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.203:6828/43673 conn(0x7fe240180f20 :-1 s=STATE_OPEN pgs=26353 cs=1
>>>>>>>> l=1).mark_down
>>>>>>>> 2018-06-25 10:59:12.112190 7fe244b28700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.203:6828/43673 conn(0x7fe240180f20 :-1 s=STATE_OPEN pgs=26353 cs=1
>>>>>>>> l=1)._stop
>>>>>>>> 2018-06-25 10:59:12.112337 7fe244b28700  1 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.201:6840/32624 conn(0x7fe2240127b0 :-1 s=STATE_OPEN pgs=575947 cs=1
>>>>>>>> l=1).mark_down
>>>>>>>> 2018-06-25 10:59:12.112348 7fe244b28700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.201:6840/32624 conn(0x7fe2240127b0 :-1 s=STATE_OPEN pgs=575947 cs=1
>>>>>>>> l=1)._stop
>>>>>>>> 2018-06-25 10:59:12.112367 7fe244b28700  1 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1 s=STATE_OPEN pgs=472363 cs=1
>>>>>>>> l=1).mark_down
>>>>>>>> 2018-06-25 10:59:12.112372 7fe244b28700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1 s=STATE_OPEN pgs=472363 cs=1
>>>>>>>> l=1)._stop
>>>>>>>> 2018-06-25 10:59:12.112519 7fe244b28700  1 -- 192.168.168.201:0/3046734987
>>>>>>>> shutdown_connections
>>>>>>>> 2018-06-25 10:59:12.112530 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>>>> shutdown_connections mark down 192.168.168.201:6840/32624 0x7fe2240127b0
>>>>>>>> 2018-06-25 10:59:12.112538 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>>>> shutdown_connections mark down 192.168.168.201:6789/0 0x7fe24017a420
>>>>>>>> 2018-06-25 10:59:12.112543 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>>>> shutdown_connections mark down 192.168.168.203:6828/43673 0x7fe240180f20
>>>>>>>> 2018-06-25 10:59:12.112549 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>>>> shutdown_connections mark down 192.168.168.202:6789/0 0x7fe240176dc0
>>>>>>>> 2018-06-25 10:59:12.112554 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>>>> shutdown_connections delete 0x7fe2240127b0
>>>>>>>> 2018-06-25 10:59:12.112570 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>>>> shutdown_connections delete 0x7fe240176dc0
>>>>>>>> 2018-06-25 10:59:12.112577 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>>>> shutdown_connections delete 0x7fe24017a420
>>>>>>>> 2018-06-25 10:59:12.112582 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>>>> shutdown_connections delete 0x7fe240180f20
>>>>>>>> 2018-06-25 10:59:12.112701 7fe244b28700  1 -- 192.168.168.201:0/3046734987
>>>>>>>> shutdown_connections
>>>>>>>> 2018-06-25 10:59:12.112752 7fe244b28700  1 -- 192.168.168.201:0/3046734987 wait
>>>>>>>> complete.
>>>>>>>> 2018-06-25 10:59:12.112764 7fe244b28700  1 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.201:0/3046734987 conn(0x7fe240167220 :-1 s=STATE_NONE pgs=0 cs=0
>>>>>>>> l=0).mark_down
>>>>>>>> 2018-06-25 10:59:12.112770 7fe244b28700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>>>> 192.168.168.201:0/3046734987 conn(0x7fe240167220 :-1 s=STATE_NONE pgs=0 cs=0
>>>>>>>> l=0)._stop
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>> From: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
>>>>>>>>> To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
>>>>>>>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>>>>>>>>> Sent: Monday, 25 June, 2018 02:28:55
>>>>>>>>> Subject: Re:  fixing unrepairable inconsistent PG
>>>>>>>>
>>>>>>>>> Can you try the following?
>>>>>>>>>
>>>>>>>>> $ ceph --debug_ms 5 --debug_auth 20 pg 18.2 query
>>>>>>>>>
>>>>>>>>> On Fri, Jun 22, 2018 at 7:54 PM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
>>>>>>>>>> Hi Brad,
>>>>>>>>>>
>>>>>>>>>> here is the output of the command (replaced the real auth key with [KEY]):
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ----------------
>>>>>>>>>>
>>>>>>>>>> 2018-06-22 10:47:27.659895 7f70ef9e6700 10 monclient: build_initial_monmap
>>>>>>>>>> 2018-06-22 10:47:27.661995 7f70ef9e6700 10 monclient: init
>>>>>>>>>> 2018-06-22 10:47:27.662002 7f70ef9e6700  5 adding auth protocol: cephx
>>>>>>>>>> 2018-06-22 10:47:27.662004 7f70ef9e6700 10 monclient: auth_supported 2 method
>>>>>>>>>> cephx
>>>>>>>>>> 2018-06-22 10:47:27.662221 7f70ef9e6700  2 auth: KeyRing::load: loaded key file
>>>>>>>>>> /etc/ceph/ceph.client.admin.keyring
>>>>>>>>>> 2018-06-22 10:47:27.662338 7f70ef9e6700 10 monclient: _reopen_session rank -1
>>>>>>>>>> 2018-06-22 10:47:27.662425 7f70ef9e6700 10 monclient(hunting): picked
>>>>>>>>>> mon.noname-b con 0x7f70e8176c80 addr 192.168.168.202:6789/0
>>>>>>>>>> 2018-06-22 10:47:27.662484 7f70ef9e6700 10 monclient(hunting): picked
>>>>>>>>>> mon.noname-a con 0x7f70e817a2e0 addr 192.168.168.201:6789/0
>>>>>>>>>> 2018-06-22 10:47:27.662534 7f70ef9e6700 10 monclient(hunting): _renew_subs
>>>>>>>>>> 2018-06-22 10:47:27.662544 7f70ef9e6700 10 monclient(hunting): authenticate will
>>>>>>>>>> time out at 2018-06-22 10:52:27.662543
>>>>>>>>>> 2018-06-22 10:47:27.663831 7f70d77fe700 10 monclient(hunting): handle_monmap
>>>>>>>>>> mon_map magic: 0 v1
>>>>>>>>>> 2018-06-22 10:47:27.663885 7f70d77fe700 10 monclient(hunting):  got monmap 20,
>>>>>>>>>> mon.noname-b is now rank -1
>>>>>>>>>> 2018-06-22 10:47:27.663889 7f70d77fe700 10 monclient(hunting): dump:
>>>>>>>>>> epoch 20
>>>>>>>>>> fsid 51e9f641-372e-44ec-92a4-b9fe55cbf9fe
>>>>>>>>>> last_changed 2018-06-16 23:14:48.936175
>>>>>>>>>> created 0.000000
>>>>>>>>>> 0: 192.168.168.201:6789/0 mon.arh-ibstorage1-ib
>>>>>>>>>> 1: 192.168.168.202:6789/0 mon.arh-ibstorage2-ib
>>>>>>>>>> 2: 192.168.168.203:6789/0 mon.arh-ibstorage3-ib
>>>>>>>>>>
>>>>>>>>>> 2018-06-22 10:47:27.664005 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mon
>>>>>>>>>> 2018-06-22 10:47:27.664020 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service osd
>>>>>>>>>> 2018-06-22 10:47:27.664021 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mgr
>>>>>>>>>> 2018-06-22 10:47:27.664025 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service auth
>>>>>>>>>> 2018-06-22 10:47:27.664026 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>>>> 0 need 53
>>>>>>>>>> 2018-06-22 10:47:27.664032 7f70d77fe700 10 monclient(hunting): my global_id is
>>>>>>>>>> 411322261
>>>>>>>>>> 2018-06-22 10:47:27.664035 7f70d77fe700 10 cephx client: handle_response ret = 0
>>>>>>>>>> 2018-06-22 10:47:27.664046 7f70d77fe700 10 cephx client:  got initial server
>>>>>>>>>> challenge d66f2dffc2113d43
>>>>>>>>>> 2018-06-22 10:47:27.664049 7f70d77fe700 10 cephx client: validate_tickets:
>>>>>>>>>> want=53 need=53 have=0
>>>>>>>>>>
>>>>>>>>>> 2018-06-22 10:47:27.664052 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mon
>>>>>>>>>> 2018-06-22 10:47:27.664053 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service osd
>>>>>>>>>> 2018-06-22 10:47:27.664054 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mgr
>>>>>>>>>> 2018-06-22 10:47:27.664055 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service auth
>>>>>>>>>> 2018-06-22 10:47:27.664056 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>>>> 0 need 53
>>>>>>>>>> 2018-06-22 10:47:27.664057 7f70d77fe700 10 cephx client: want=53 need=53 have=0
>>>>>>>>>> 2018-06-22 10:47:27.664061 7f70d77fe700 10 cephx client: build_request
>>>>>>>>>> 2018-06-22 10:47:27.664145 7f70d77fe700 10 cephx client: get auth session key:
>>>>>>>>>> client_challenge d4c95f637e641b55
>>>>>>>>>> 2018-06-22 10:47:27.664175 7f70d77fe700 10 monclient(hunting): handle_monmap
>>>>>>>>>> mon_map magic: 0 v1
>>>>>>>>>> 2018-06-22 10:47:27.664208 7f70d77fe700 10 monclient(hunting):  got monmap 20,
>>>>>>>>>> mon.arh-ibstorage1-ib is now rank 0
>>>>>>>>>> 2018-06-22 10:47:27.664211 7f70d77fe700 10 monclient(hunting): dump:
>>>>>>>>>> epoch 20
>>>>>>>>>> fsid 51e9f641-372e-44ec-92a4-b9fe55cbf9fe
>>>>>>>>>> last_changed 2018-06-16 23:14:48.936175
>>>>>>>>>> created 0.000000
>>>>>>>>>> 0: 192.168.168.201:6789/0 mon.arh-ibstorage1-ib
>>>>>>>>>> 1: 192.168.168.202:6789/0 mon.arh-ibstorage2-ib
>>>>>>>>>> 2: 192.168.168.203:6789/0 mon.arh-ibstorage3-ib
>>>>>>>>>>
>>>>>>>>>> 2018-06-22 10:47:27.664241 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mon
>>>>>>>>>> 2018-06-22 10:47:27.664244 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service osd
>>>>>>>>>> 2018-06-22 10:47:27.664245 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mgr
>>>>>>>>>> 2018-06-22 10:47:27.664246 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service auth
>>>>>>>>>> 2018-06-22 10:47:27.664247 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>>>> 0 need 53
>>>>>>>>>> 2018-06-22 10:47:27.664251 7f70d77fe700 10 monclient(hunting): my global_id is
>>>>>>>>>> 411323061
>>>>>>>>>> 2018-06-22 10:47:27.664253 7f70d77fe700 10 cephx client: handle_response ret = 0
>>>>>>>>>> 2018-06-22 10:47:27.664256 7f70d77fe700 10 cephx client:  got initial server
>>>>>>>>>> challenge d5d3c1e5bcf3c0b8
>>>>>>>>>> 2018-06-22 10:47:27.664258 7f70d77fe700 10 cephx client: validate_tickets:
>>>>>>>>>> want=53 need=53 have=0
>>>>>>>>>> 2018-06-22 10:47:27.664260 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mon
>>>>>>>>>> 2018-06-22 10:47:27.664261 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service osd
>>>>>>>>>> 2018-06-22 10:47:27.664262 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mgr
>>>>>>>>>> 2018-06-22 10:47:27.664263 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service auth
>>>>>>>>>> 2018-06-22 10:47:27.664264 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>>>> 0 need 53
>>>>>>>>>> 2018-06-22 10:47:27.664265 7f70d77fe700 10 cephx client: want=53 need=53 have=0
>>>>>>>>>> 2018-06-22 10:47:27.664268 7f70d77fe700 10 cephx client: build_request
>>>>>>>>>> 2018-06-22 10:47:27.664328 7f70d77fe700 10 cephx client: get auth session key:
>>>>>>>>>> client_challenge d31821a6437d4974
>>>>>>>>>> 2018-06-22 10:47:27.664651 7f70d77fe700 10 cephx client: handle_response ret = 0
>>>>>>>>>> 2018-06-22 10:47:27.664667 7f70d77fe700 10 cephx client:  get_auth_session_key
>>>>>>>>>> 2018-06-22 10:47:27.664673 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>>>> got 1 keys
>>>>>>>>>> 2018-06-22 10:47:27.664676 7f70d77fe700 10 cephx: got key for service_id auth
>>>>>>>>>> 2018-06-22 10:47:27.664766 7f70d77fe700 10 cephx:  ticket.secret_id=3681
>>>>>>>>>> 2018-06-22 10:47:27.664774 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>>>> service auth secret_id 3681 session_key [KEY] validity=43200.000000
>>>>>>>>>> 2018-06-22 10:47:27.664806 7f70d77fe700 10 cephx: ticket expires=2018-06-22
>>>>>>>>>> 22:47:27.664805 renew_after=2018-06-22 19:47:27.664805
>>>>>>>>>> 2018-06-22 10:47:27.664825 7f70d77fe700 10 cephx client:  want=53 need=53 have=0
>>>>>>>>>> 2018-06-22 10:47:27.664827 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mon
>>>>>>>>>> 2018-06-22 10:47:27.664829 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service osd
>>>>>>>>>> 2018-06-22 10:47:27.664830 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mgr
>>>>>>>>>> 2018-06-22 10:47:27.664832 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>>>> 32 need 21
>>>>>>>>>> 2018-06-22 10:47:27.664836 7f70d77fe700 10 cephx client: validate_tickets:
>>>>>>>>>> want=53 need=21 have=32
>>>>>>>>>> 2018-06-22 10:47:27.664837 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mon
>>>>>>>>>> 2018-06-22 10:47:27.664839 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service osd
>>>>>>>>>> 2018-06-22 10:47:27.664840 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mgr
>>>>>>>>>> 2018-06-22 10:47:27.664841 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>>>> 32 need 21
>>>>>>>>>> 2018-06-22 10:47:27.664842 7f70d77fe700 10 cephx client: want=53 need=21 have=32
>>>>>>>>>> 2018-06-22 10:47:27.664844 7f70d77fe700 10 cephx client: build_request
>>>>>>>>>> 2018-06-22 10:47:27.664846 7f70d77fe700 10 cephx client: get service keys:
>>>>>>>>>> want=53 need=21 have=32
>>>>>>>>>> 2018-06-22 10:47:27.664928 7f70d77fe700 10 cephx client: handle_response ret = 0
>>>>>>>>>> 2018-06-22 10:47:27.664933 7f70d77fe700 10 cephx client:  get_auth_session_key
>>>>>>>>>> 2018-06-22 10:47:27.664935 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>>>> got 1 keys
>>>>>>>>>> 2018-06-22 10:47:27.664937 7f70d77fe700 10 cephx: got key for service_id auth
>>>>>>>>>> 2018-06-22 10:47:27.664985 7f70d77fe700 10 cephx:  ticket.secret_id=3681
>>>>>>>>>> 2018-06-22 10:47:27.664987 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>>>> service auth secret_id 3681 session_key [KEY] validity=43200.000000
>>>>>>>>>> 2018-06-22 10:47:27.665009 7f70d77fe700 10 cephx: ticket expires=2018-06-22
>>>>>>>>>> 22:47:27.665008 renew_after=2018-06-22 19:47:27.665008
>>>>>>>>>> 2018-06-22 10:47:27.665017 7f70d77fe700 10 cephx client:  want=53 need=53 have=0
>>>>>>>>>> 2018-06-22 10:47:27.665019 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mon
>>>>>>>>>> 2018-06-22 10:47:27.665020 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service osd
>>>>>>>>>> 2018-06-22 10:47:27.665024 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mgr
>>>>>>>>>> 2018-06-22 10:47:27.665026 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>>>> 32 need 21
>>>>>>>>>> 2018-06-22 10:47:27.665029 7f70d77fe700 10 cephx client: validate_tickets:
>>>>>>>>>> want=53 need=21 have=32
>>>>>>>>>> 2018-06-22 10:47:27.665031 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mon
>>>>>>>>>> 2018-06-22 10:47:27.665032 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service osd
>>>>>>>>>> 2018-06-22 10:47:27.665033 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>>>> for service mgr
>>>>>>>>>> 2018-06-22 10:47:27.665034 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>>>> 32 need 21
>>>>>>>>>> 2018-06-22 10:47:27.665035 7f70d77fe700 10 cephx client: want=53 need=21 have=32
>>>>>>>>>> 2018-06-22 10:47:27.665037 7f70d77fe700 10 cephx client: build_request
>>>>>>>>>> 2018-06-22 10:47:27.665039 7f70d77fe700 10 cephx client: get service keys:
>>>>>>>>>> want=53 need=21 have=32
>>>>>>>>>> 2018-06-22 10:47:27.665354 7f70d77fe700 10 cephx client: handle_response ret = 0
>>>>>>>>>> 2018-06-22 10:47:27.665365 7f70d77fe700 10 cephx client:
>>>>>>>>>> get_principal_session_key session_key [KEY]
>>>>>>>>>> 2018-06-22 10:47:27.665377 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>>>> got 3 keys
>>>>>>>>>> 2018-06-22 10:47:27.665379 7f70d77fe700 10 cephx: got key for service_id mon
>>>>>>>>>> 2018-06-22 10:47:27.665419 7f70d77fe700 10 cephx:  ticket.secret_id=44133
>>>>>>>>>> 2018-06-22 10:47:27.665425 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>>>> service mon secret_id 44133 session_key [KEY] validity=3600.000000
>>>>>>>>>> 2018-06-22 10:47:27.665437 7f70d77fe700 10 cephx: ticket expires=2018-06-22
>>>>>>>>>> 11:47:27.665436 renew_after=2018-06-22 11:32:27.665436
>>>>>>>>>> 2018-06-22 10:47:27.665443 7f70d77fe700 10 cephx: got key for service_id osd
>>>>>>>>>> 2018-06-22 10:47:27.665476 7f70d77fe700 10 cephx:  ticket.secret_id=44133
>>>>>>>>>> 2018-06-22 10:47:27.665478 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>>>> service osd secret_id 44133 session_key [KEY] validity=3600.000000
>>>>>>>>>> 2018-06-22 10:47:27.665497 7f70d77fe700 10 cephx: ticket expires=2018-06-22
>>>>>>>>>> 11:47:27.665496 renew_after=2018-06-22 11:32:27.665496
>>>>>>>>>> 2018-06-22 10:47:27.665506 7f70d77fe700 10 cephx: got key for service_id mgr
>>>>>>>>>> 2018-06-22 10:47:27.665539 7f70d77fe700 10 cephx:  ticket.secret_id=132
>>>>>>>>>> 2018-06-22 10:47:27.665546 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>>>> service mgr secret_id 132 session_key [KEY] validity=3600.000000
>>>>>>>>>> 2018-06-22 10:47:27.665564 7f70d77fe700 10 cephx: ticket expires=2018-06-22
>>>>>>>>>> 11:47:27.665564 renew_after=2018-06-22 11:32:27.665564
>>>>>>>>>> 2018-06-22 10:47:27.665573 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>>>> 53 need 0
>>>>>>>>>> 2018-06-22 10:47:27.665602 7f70d77fe700  1 monclient: found
>>>>>>>>>> mon.arh-ibstorage2-ib
>>>>>>>>>> 2018-06-22 10:47:27.665617 7f70d77fe700 20 monclient: _un_backoff
>>>>>>>>>> reopen_interval_multipler now 1
>>>>>>>>>> 2018-06-22 10:47:27.665636 7f70d77fe700 10 monclient: _send_mon_message to
>>>>>>>>>> mon.arh-ibstorage2-ib at 192.168.168.202:6789/0
>>>>>>>>>> 2018-06-22 10:47:27.665656 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>>>> 53 need 0
>>>>>>>>>> 2018-06-22 10:47:27.665658 7f70d77fe700 20 cephx client: need_tickets: want=53
>>>>>>>>>> have=53 need=0
>>>>>>>>>> 2018-06-22 10:47:27.665661 7f70d77fe700 20 monclient: _check_auth_rotating not
>>>>>>>>>> needed by client.admin
>>>>>>>>>> 2018-06-22 10:47:27.665678 7f70ef9e6700  5 monclient: authenticate success,
>>>>>>>>>> global_id 411322261
>>>>>>>>>> 2018-06-22 10:47:27.665694 7f70ef9e6700 10 monclient: _renew_subs
>>>>>>>>>> 2018-06-22 10:47:27.665698 7f70ef9e6700 10 monclient: _send_mon_message to
>>>>>>>>>> mon.arh-ibstorage2-ib at 192.168.168.202:6789/0
>>>>>>>>>> 2018-06-22 10:47:27.665817 7f70ef9e6700 10 monclient: _renew_subs
>>>>>>>>>> 2018-06-22 10:47:27.665828 7f70ef9e6700 10 monclient: _send_mon_message to
>>>>>>>>>> mon.arh-ibstorage2-ib at 192.168.168.202:6789/0
>>>>>>>>>> 2018-06-22 10:47:27.666069 7f70d77fe700 10 monclient: handle_monmap mon_map
>>>>>>>>>> magic: 0 v1
>>>>>>>>>> 2018-06-22 10:47:27.666102 7f70d77fe700 10 monclient:  got monmap 20,
>>>>>>>>>> mon.arh-ibstorage2-ib is now rank 1
>>>>>>>>>> 2018-06-22 10:47:27.666110 7f70d77fe700 10 monclient: dump:
>>>>>>>>>>
>>>>>>>>>> epoch 20
>>>>>>>>>> fsid 51e9f641-372e-44ec-92a4-b9fe55cbf9fe
>>>>>>>>>> last_changed 2018-06-16 23:14:48.936175
>>>>>>>>>> created 0.000000
>>>>>>>>>> 0: 192.168.168.201:6789/0 mon.arh-ibstorage1-ib
>>>>>>>>>> 1: 192.168.168.202:6789/0 mon.arh-ibstorage2-ib
>>>>>>>>>> 2: 192.168.168.203:6789/0 mon.arh-ibstorage3-ib
>>>>>>>>>>
>>>>>>>>>> 2018-06-22 10:47:27.666617 7f70eca43700 10 cephx client: build_authorizer for
>>>>>>>>>> service mgr
>>>>>>>>>> 2018-06-22 10:47:27.667043 7f70eca43700 10 In get_auth_session_handler for
>>>>>>>>>> protocol 2
>>>>>>>>>> 2018-06-22 10:47:27.678417 7f70eda45700 10 cephx client: build_authorizer for
>>>>>>>>>> service osd
>>>>>>>>>> 2018-06-22 10:47:27.678914 7f70eda45700 10 In get_auth_session_handler for
>>>>>>>>>> protocol 2
>>>>>>>>>> 2018-06-22 10:47:27.679003 7f70eda45700 10 _calc_signature seq 1 front_crc_ =
>>>>>>>>>> 2696387361 middle_crc = 0 data_crc = 0 sig = 929021353460216573
>>>>>>>>>> 2018-06-22 10:47:27.679026 7f70eda45700 20 Putting signature in client
>>>>>>>>>> message(seq # 1): sig = 929021353460216573
>>>>>>>>>> 2018-06-22 10:47:27.679520 7f70eda45700 10 _calc_signature seq 1 front_crc_ =
>>>>>>>>>> 1943489909 middle_crc = 0 data_crc = 0 sig = 10026640535487722288
>>>>>>>>>> Error EPERM: problem getting command descriptions from pg.18.2
>>>>>>>>>> 2018-06-22 10:47:27.681798 7f70ef9e6700 10 monclient: shutdown
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -----------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> From what I can see the auth works:
>>>>>>>>>>
>>>>>>>>>> 2018-06-22 10:47:27.665678 7f70ef9e6700  5 monclient: authenticate success,
>>>>>>>>>> global_id 411322261
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>> From: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
>>>>>>>>>>> To: "Andrei" <andrei@xxxxxxxxxx>
>>>>>>>>>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>>>>>>>>>>> Sent: Friday, 22 June, 2018 02:05:51
>>>>>>>>>>> Subject: Re:  fixing unrepairable inconsistent PG
>>>>>>>>>>
>>>>>>>>>>> That seems like an authentication issue?
>>>>>>>>>>>
>>>>>>>>>>> Try running it like so...
>>>>>>>>>>>
>>>>>>>>>>> $ ceph --debug_monc 20 --debug_auth 20 pg 18.2 query
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 21, 2018 at 12:18 AM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
>>>>>>>>>>>> Hi Brad,
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, but it doesn't show much:
>>>>>>>>>>>>
>>>>>>>>>>>> ceph pg 18.2 query
>>>>>>>>>>>> Error EPERM: problem getting command descriptions from pg.18.2
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>> From: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
>>>>>>>>>>>>> To: "andrei" <andrei@xxxxxxxxxx>
>>>>>>>>>>>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>>>>>>>>>>>>> Sent: Wednesday, 20 June, 2018 00:02:07
>>>>>>>>>>>>> Subject: Re:  fixing unrepairable inconsistent PG
>>>>>>>>>>>>
>>>>>>>>>>>>> Can you post the output of a pg query?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 19, 2018 at 11:44 PM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
>>>>>>>>>>>>>> A quick update on my issue. I have noticed that while I was trying to move
>>>>>>>>>>>>>> the problem object on osds, the file attributes got lost on one of the osds,
>>>>>>>>>>>>>> which is I guess why the error messages showed the no attribute bit.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I then copied the attributes metadata to the problematic object and
>>>>>>>>>>>>>> restarted the osds in question. Following a pg repair I got a different
>>>>>>>>>>>>>> error:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2018-06-19 13:51:05.846033 osd.21 osd.21 192.168.168.203:6828/24339 2 :
>>>>>>>>>>>>>> cluster [ERR] 18.2 shard 21: soid 18:45f87722:::.dir.default.80018061.2:head
>>>>>>>>>>>>>> omap_digest 0x25e8a1da != omap_digest 0x21c7f871 from auth oi
>>>>>>>>>>>>>> 18:45f87722:::.dir.default.80018061.2:head(106137'603495 osd.21.0:41403910
>>>>>>>>>>>>>> dirty|omap|data_digest|omap_digest s 0 uv 603494 dd ffffffff od 21c7f871
>>>>>>>>>>>>>> alloc_hint [0 0 0])
>>>>>>>>>>>>>> 2018-06-19 13:51:05.846042 osd.21 osd.21 192.168.168.203:6828/24339 3 :
>>>>>>>>>>>>>> cluster [ERR] 18.2 shard 28: soid 18:45f87722:::.dir.default.80018061.2:head
>>>>>>>>>>>>>> omap_digest 0x25e8a1da != omap_digest 0x21c7f871 from auth oi
>>>>>>>>>>>>>> 18:45f87722:::.dir.default.80018061.2:head(106137'603495 osd.21.0:41403910
>>>>>>>>>>>>>> dirty|omap|data_digest|omap_digest s 0 uv 603494 dd ffffffff od 21c7f871
>>>>>>>>>>>>>> alloc_hint [0 0 0])
>>>>>>>>>>>>>> 2018-06-19 13:51:05.846046 osd.21 osd.21 192.168.168.203:6828/24339 4 :
>>>>>>>>>>>>>> cluster [ERR] 18.2 soid 18:45f87722:::.dir.default.80018061.2:head: failed
>>>>>>>>>>>>>> to pick suitable auth object
>>>>>>>>>>>>>> 2018-06-19 13:51:05.846118 osd.21 osd.21 192.168.168.203:6828/24339 5 :
>>>>>>>>>>>>>> cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no '_'
>>>>>>>>>>>>>> attr
>>>>>>>>>>>>>> 2018-06-19 13:51:05.846129 osd.21 osd.21 192.168.168.203:6828/24339 6 :
>>>>>>>>>>>>>> cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no
>>>>>>>>>>>>>> 'snapset' attr
>>>>>>>>>>>>>> 2018-06-19 13:51:09.810878 osd.21 osd.21 192.168.168.203:6828/24339 7 :
>>>>>>>>>>>>>> cluster [ERR] 18.2 repair 4 errors, 0 fixed
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It mentions that there is an incorrect omap_digest . How do I go about
>>>>>>>>>>>>>> fixing this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ________________________________
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> From: "andrei" <andrei@xxxxxxxxxx>
>>>>>>>>>>>>>> To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>>>>>>>>>>>>>> Sent: Tuesday, 19 June, 2018 11:16:22
>>>>>>>>>>>>>> Subject:  fixing unrepairable inconsistent PG
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello everyone
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am having trouble repairing one inconsistent and stubborn PG. I get the
>>>>>>>>>>>>>> following error in ceph.log:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2018-06-19 11:00:00.000225 mon.arh-ibstorage1-ib mon.0
>>>>>>>>>>>>>> 192.168.168.201:6789/0 675 : cluster [ERR] overall HEALTH_ERR noout flag(s)
>>>>>>>>>>>>>> set; 4 scrub errors; Possible data damage: 1 pg inconsistent; application
>>>>>>>>>>>>>> not enabled on 4 pool(s)
>>>>>>>>>>>>>> 2018-06-19 11:09:24.586392 mon.arh-ibstorage1-ib mon.0
>>>>>>>>>>>>>> 192.168.168.201:6789/0 841 : cluster [ERR] Health check update: Possible
>>>>>>>>>>>>>> data damage: 1 pg inconsistent, 1 pg repair (PG_DAMAGED)
>>>>>>>>>>>>>> 2018-06-19 11:09:27.139504 osd.21 osd.21 192.168.168.203:6828/4003 2 :
>>>>>>>>>>>>>> cluster [ERR] 18.2 soid 18:45f87722:::.dir.default.80018061.2:head: failed
>>>>>>>>>>>>>> to pick suitable object info
>>>>>>>>>>>>>> 2018-06-19 11:09:27.139545 osd.21 osd.21 192.168.168.203:6828/4003 3 :
>>>>>>>>>>>>>> cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no '_'
>>>>>>>>>>>>>> attr
>>>>>>>>>>>>>> 2018-06-19 11:09:27.139550 osd.21 osd.21 192.168.168.203:6828/4003 4 :
>>>>>>>>>>>>>> cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no
>>>>>>>>>>>>>> 'snapset' attr
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2018-06-19 11:09:35.484402 osd.21 osd.21 192.168.168.203:6828/4003 5 :
>>>>>>>>>>>>>> cluster [ERR] 18.2 repair 4 errors, 0 fixed
>>>>>>>>>>>>>> 2018-06-19 11:09:40.601657 mon.arh-ibstorage1-ib mon.0
>>>>>>>>>>>>>> 192.168.168.201:6789/0 844 : cluster [ERR] Health check update: Possible
>>>>>>>>>>>>>> data damage: 1 pg inconsistent (PG_DAMAGED)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have tried to follow a few instructions on the PG repair, including
>>>>>>>>>>>>>> removal of the 'broken' object .dir.default.80018061.2
>>>>>>>>>>>>>>  from primary osd following by the pg repair. After that didn't work, I've
>>>>>>>>>>>>>> done the same for the secondary osd. Still the same issue.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Looking at the actual object on the file system, the file size is 0 for both
>>>>>>>>>>>>>> primary and secondary objects. The md5sum is the same too. The broken PG
>>>>>>>>>>>>>> belongs to the radosgw bucket called .rgw.buckets.index
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What else can I try to get the thing fixed?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Brad
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Brad
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Cheers,
>>>>>>>>> Brad
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Cheers,
>>>>>>> Brad
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Cheers,
>>>>> Brad
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> Cheers,
>> Brad



-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux