Re: fixing unrepairable inconsistent PG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Try the following. You can do this with all osds up and running.

# rados -p [name_of_pool_18] setomapval .dir.default.80018061.2
temporary-key anything
# ceph pg deep-scrub 18.2

Once you are sure the scrub has completed and the pg is no longer
inconsistent you can remove the temporary key.

# rados -p [name_of_pool_18] rmomapkey .dir.default.80018061.2 temporary-key


On Wed, Jun 27, 2018 at 9:42 PM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
> Here is one more thing:
>
> rados list-inconsistent-obj 18.2
> {
>    "inconsistents" : [
>       {
>          "object" : {
>             "locator" : "",
>             "version" : 632942,
>             "nspace" : "",
>             "name" : ".dir.default.80018061.2",
>             "snap" : "head"
>          },
>          "union_shard_errors" : [
>             "omap_digest_mismatch_info"
>          ],
>          "shards" : [
>             {
>                "osd" : 21,
>                "primary" : true,
>                "data_digest" : "0xffffffff",
>                "omap_digest" : "0x25e8a1da",
>                "errors" : [
>                   "omap_digest_mismatch_info"
>                ],
>                "size" : 0
>             },
>             {
>                "data_digest" : "0xffffffff",
>                "primary" : false,
>                "osd" : 28,
>                "errors" : [
>                   "omap_digest_mismatch_info"
>                ],
>                "omap_digest" : "0x25e8a1da",
>                "size" : 0
>             }
>          ],
>          "errors" : [],
>          "selected_object_info" : {
>             "mtime" : "2018-06-19 16:31:44.759717",
>             "alloc_hint_flags" : 0,
>             "size" : 0,
>             "last_reqid" : "client.410876514.0:1",
>             "local_mtime" : "2018-06-19 16:31:44.760139",
>             "data_digest" : "0xffffffff",
>             "truncate_seq" : 0,
>             "legacy_snaps" : [],
>             "expected_write_size" : 0,
>             "watchers" : {},
>             "flags" : [
>                "dirty",
>                "data_digest",
>                "omap_digest"
>             ],
>             "oid" : {
>                "pool" : 18,
>                "hash" : 1156456354,
>                "key" : "",
>                "oid" : ".dir.default.80018061.2",
>                "namespace" : "",
>                "snapid" : -2,
>                "max" : 0
>             },
>             "truncate_size" : 0,
>             "version" : "120985'632942",
>             "expected_object_size" : 0,
>             "omap_digest" : "0xffffffff",
>             "lost" : 0,
>             "manifest" : {
>                "redirect_target" : {
>                   "namespace" : "",
>                   "snapid" : 0,
>                   "max" : 0,
>                   "pool" : -9223372036854775808,
>                   "hash" : 0,
>                   "oid" : "",
>                   "key" : ""
>                },
>                "type" : 0
>             },
>             "prior_version" : "0'0",
>             "user_version" : 632942
>          }
>       }
>    ],
>    "epoch" : 121151
> }
>
> Cheers
>
> ----- Original Message -----
>> From: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
>> To: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>> Sent: Wednesday, 27 June, 2018 09:10:07
>> Subject: Re:  fixing unrepairable inconsistent PG
>
>> Hi Brad,
>>
>> Thanks, that helped to get the query info on the inconsistent PG 18.2:
>>
>> {
>>    "state": "active+clean+inconsistent",
>>    "snap_trimq": "[]",
>>    "snap_trimq_len": 0,
>>    "epoch": 121293,
>>    "up": [
>>        21,
>>        28
>>    ],
>>    "acting": [
>>        21,
>>        28
>>    ],
>>    "actingbackfill": [
>>        "21",
>>        "28"
>>    ],
>>    "info": {
>>        "pgid": "18.2",
>>        "last_update": "121290'698339",
>>        "last_complete": "121290'698339",
>>        "log_tail": "121272'696825",
>>        "last_user_version": 698319,
>>        "last_backfill": "MAX",
>>        "last_backfill_bitwise": 0,
>>        "purged_snaps": [],
>>        "history": {
>>            "epoch_created": 24431,
>>            "epoch_pool_created": 24431,
>>            "last_epoch_started": 121152,
>>            "last_interval_started": 121151,
>>            "last_epoch_clean": 121152,
>>            "last_interval_clean": 121151,
>>            "last_epoch_split": 0,
>>            "last_epoch_marked_full": 106367,
>>            "same_up_since": 121148,
>>            "same_interval_since": 121151,
>>            "same_primary_since": 121020,
>>            "last_scrub": "121290'698339",
>>            "last_scrub_stamp": "2018-06-27 03:55:44.291060",
>>            "last_deep_scrub": "121290'698339",
>>            "last_deep_scrub_stamp": "2018-06-27 03:55:44.291060",
>>            "last_clean_scrub_stamp": "2018-06-11 15:28:20.335739"
>>        },
>>        "stats": {
>>            "version": "121290'698339",
>>            "reported_seq": "1055277",
>>            "reported_epoch": "121293",
>>            "state": "active+clean+inconsistent",
>>            "last_fresh": "2018-06-27 08:33:20.764603",
>>            "last_change": "2018-06-27 03:55:44.291146",
>>            "last_active": "2018-06-27 08:33:20.764603",
>>            "last_peered": "2018-06-27 08:33:20.764603",
>>            "last_clean": "2018-06-27 08:33:20.764603",
>>            "last_became_active": "2018-06-21 16:35:46.487783",
>>            "last_became_peered": "2018-06-21 16:35:46.487783",
>>            "last_unstale": "2018-06-27 08:33:20.764603",
>>            "last_undegraded": "2018-06-27 08:33:20.764603",
>>            "last_fullsized": "2018-06-27 08:33:20.764603",
>>            "mapping_epoch": 121151,
>>            "log_start": "121272'696825",
>>            "ondisk_log_start": "121272'696825",
>>            "created": 24431,
>>            "last_epoch_clean": 121152,
>>            "parent": "0.0",
>>            "parent_split_bits": 0,
>>            "last_scrub": "121290'698339",
>>            "last_scrub_stamp": "2018-06-27 03:55:44.291060",
>>            "last_deep_scrub": "121290'698339",
>>            "last_deep_scrub_stamp": "2018-06-27 03:55:44.291060",
>>            "last_clean_scrub_stamp": "2018-06-11 15:28:20.335739",
>>            "log_size": 1514,
>>            "ondisk_log_size": 1514,
>>            "stats_invalid": false,
>>            "dirty_stats_invalid": false,
>>            "omap_stats_invalid": false,
>>            "hitset_stats_invalid": false,
>>            "hitset_bytes_stats_invalid": false,
>>            "pin_stats_invalid": true,
>>            "snaptrimq_len": 0,
>>            "stat_sum": {
>>                "num_bytes": 0,
>>                "num_objects": 116,
>>                "num_object_clones": 0,
>>                "num_object_copies": 232,
>>                "num_objects_missing_on_primary": 0,
>>                "num_objects_missing": 0,
>>                "num_objects_degraded": 0,
>>                "num_objects_misplaced": 0,
>>                "num_objects_unfound": 0,
>>                "num_objects_dirty": 111,
>>                "num_whiteouts": 0,
>>                "num_read": 168436,
>>                "num_read_kb": 25417188,
>>                "num_write": 3370202,
>>                "num_write_kb": 0,
>>                "num_scrub_errors": 2,
>>                "num_shallow_scrub_errors": 0,
>>                "num_deep_scrub_errors": 2,
>>                "num_objects_recovered": 207,
>>                "num_bytes_recovered": 0,
>>                "num_keys_recovered": 9482826,
>>                "num_objects_omap": 107,
>>                "num_objects_hit_set_archive": 0,
>>                "num_bytes_hit_set_archive": 0,
>>                "num_flush": 0,
>>                "num_flush_kb": 0,
>>                "num_evict": 0,
>>                "num_evict_kb": 0,
>>                "num_promote": 0,
>>                "num_flush_mode_high": 0,
>>                "num_flush_mode_low": 0,
>>                "num_evict_mode_some": 0,
>>                "num_evict_mode_full": 0,
>>                "num_objects_pinned": 0,
>>                "num_legacy_snapsets": 0
>>            },
>>            "up": [
>>                21,
>>                28
>>            ],
>>            "acting": [
>>               21
>>               28
>>            ],
>>            "blocked_by": [],
>>            "up_primary": 21,
>>            "acting_primary": 21
>>        },
>>        "empty": 0,
>>        "dne": 0,
>>        "incomplete": 0,
>>        "last_epoch_started": 121152,
>>        "hit_set_history": {
>>            "current_last_update": "0'0",
>>            "history": []
>>        }
>>    },
>>    "peer_info": [
>>        {
>>            "peer": "28",
>>            "pgid": "18.2",
>>            "last_update": "121290'698339",
>>            "last_complete": "121172'661331",
>>            "log_tail": "121127'652751",
>>            "last_user_version": 0,
>>            "last_backfill": "MAX",
>>            "last_backfill_bitwise": 1,
>>            "purged_snaps": [],
>>            "history": {
>>                "epoch_created": 24431,
>>                "epoch_pool_created": 24431,
>>                "last_epoch_started": 121152,
>>                "last_interval_started": 121151,
>>                "last_epoch_clean": 121152,
>>                "last_interval_clean": 121151,
>>                "last_epoch_split": 0,
>>                "last_epoch_marked_full": 106367,
>>                "same_up_since": 121148,
>>                "same_interval_since": 121151,
>>                "same_primary_since": 121020,
>>                "last_scrub": "121290'698339",
>>                "last_scrub_stamp": "2018-06-27 03:55:44.291060",
>>                "last_deep_scrub": "121290'698339",
>>                "last_deep_scrub_stamp": "2018-06-27 03:55:44.291060",
>>                "last_clean_scrub_stamp": "2018-06-11 15:28:20.335739"
>>            },
>>            "stats": {
>>                "version": "121131'654251",
>>                "reported_seq": "959540",
>>                "reported_epoch": "121150",
>>                "state": "active+undersized+degraded+remapped+inconsistent+backfilling",
>>                "last_fresh": "2018-06-21 16:35:44.468284",
>>                "last_change": "2018-06-21 16:34:12.447803",
>>                "last_active": "2018-06-21 16:35:44.468284",
>>                "last_peered": "2018-06-21 16:35:44.468284",
>>                "last_clean": "2018-06-21 16:27:07.835328",
>>                "last_became_active": "2018-06-21 16:33:24.246631",
>>                "last_became_peered": "2018-06-21 16:33:24.246631",
>>                "last_unstale": "2018-06-21 16:35:44.468284",
>>                "last_undegraded": "2018-06-21 16:33:23.997020",
>>                "last_fullsized": "2018-06-21 16:33:23.994195",
>>                "mapping_epoch": 121151,
>>                "log_start": "121127'652725",
>>                "created": 24431,
>>                "last_epoch_clean": 121145,
>>                "parent": "0.0",
>>                "parent_split_bits": 0,
>>                "last_scrub": "121131'654251",
>>                "last_scrub_stamp": "2018-06-21 16:27:07.835266",
>>                "last_deep_scrub": "121131'654251",
>>                "last_deep_scrub_stamp": "2018-06-21 16:27:07.835266",
>>                "last_clean_scrub_stamp": "2018-06-11 15:28:20.335739",
>>                "log_size": 1526,
>>                "ondisk_log_size": 1526,
>>                "stats_invalid": false,
>>                "dirty_stats_invalid": false,
>>                "omap_stats_invalid": false,
>>                "hitset_stats_invalid": false,
>>                "hitset_bytes_stats_invalid": false,
>>                "pin_stats_invalid": true,
>>                "snaptrimq_len": 0,
>>                "stat_sum": {
>>                    "num_bytes": 0,
>>                    "num_objects": 69,
>>                    "num_object_clones": 0,
>>                    "num_object_copies": 138,
>>                    "num_objects_missing_on_primary": 0,
>>                    "num_objects_missing": 0,
>>                    "num_objects_degraded": 1,
>>                    "num_objects_misplaced": 0,
>>                    "num_objects_unfound": 0,
>>                    "num_objects_dirty": 64,
>>                    "num_whiteouts": 0,
>>                    "num_read": 14057,
>>                    "num_read_kb": 454200,
>>                    "num_write": 797911,
>>                    "num_write_kb": 0,
>>                    "num_scrub_errors": 0,
>>                    "num_shallow_scrub_errors": 0,
>>                    "num_deep_scrub_errors": 0,
>>                    "num_objects_recovered": 207,
>>                    "num_bytes_recovered": 0,
>>                    "num_keys_recovered": 9482826,
>>                    "num_objects_omap": 60,
>>                    "num_objects_hit_set_archive": 0,
>>                    "num_bytes_hit_set_archive": 0,
>>                    "num_flush": 0,
>>                    "num_flush_kb": 0,
>>                    "num_evict": 0,
>>                    "num_evict_kb": 0,
>>                    "num_promote": 0,
>>                    "num_flush_mode_high": 0,
>>                    "num_flush_mode_low": 0,
>>                    "num_evict_mode_some": 0,
>>                    "num_evict_mode_full": 0,
>>                    "num_objects_pinned": 0,
>>                    "num_legacy_snapsets": 0
>>                },
>>                "up": [
>>                    21,
>>                    28
>>                ],
>>                "acting": [
>>                    21,
>>                    28
>>                ],
>>                "blocked_by": [],
>>                "up_primary": 21,
>>                "acting_primary": 21
>>            },
>>            "empty": 0,
>>            "dne": 0,
>>            "incomplete": 0,
>>            "last_epoch_started": 121152,
>>            "hit_set_history": {
>>                "current_last_update": "0'0",
>>                "history": []
>>            }
>>        }
>>    ],
>>    "recovery_state": [
>>        {
>>            "name": "Started/Primary/Active",
>>            "enter_time": "2018-06-21 16:35:46.478007",
>>            "might_have_unfound": [],
>>            "recovery_progress": {
>>                "backfill_targets": [],
>>                "waiting_on_backfill": [],
>>                "last_backfill_started": "MIN",
>>                "backfill_info": {
>>                    "begin": "MIN",
>>                    "end": "MIN",
>>                    "objects": []
>>                },
>>                "peer_backfill_info": [],
>>                "backfills_in_flight": [],
>>                "recovering": [],
>>                "pg_backend": {
>>                    "pull_from_peer": [],
>>                    "pushing": []
>>                }
>>            },
>>            "scrub": {
>>                "scrubber.epoch_start": "121151",
>>                "scrubber.active": false,
>>                "scrubber.state": "INACTIVE",
>>                "scrubber.start": "MIN",
>>                "scrubber.end": "MIN",
>>                "scrubber.subset_last_update": "0'0",
>>                "scrubber.deep": false,
>>                "scrubber.seed": 0,
>>                "scrubber.waiting_on": 0,
>>                "scrubber.waiting_on_whom": []
>>            }
>>        },
>>        {
>>            "name": "Started",
>>            "enter_time": "2018-06-21 16:35:45.052939"
>>        }
>>    ],
>>    "agent_state": {}
>> }
>>
>>
>>
>>
>> Thanks for trying to help out.
>>
>> Cheers
>>
>>
>>
>> ----- Original Message -----
>>> From: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
>>> To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>>> Sent: Wednesday, 27 June, 2018 00:18:19
>>> Subject: Re:  fixing unrepairable inconsistent PG
>>
>>> Try setting the osd caps to 'allow *' for client.admin or running the
>>> command using an id that has that access such as
>>> mgr.arh-ibstorage1-ib.
>>>
>>> On Wed, Jun 27, 2018 at 1:32 AM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
>>>> Hi Brad,
>>>>
>>>> Here is the output of the "ceph auth list" command (I have removed the key: line
>>>> which was present in every single entry, including the osd.21):
>>>>
>>>> # ceph auth list
>>>> installed auth entries:
>>>>
>>>> mds.arh-ibstorage1-ib
>>>>         caps: [mds] allow
>>>>         caps: [mgr] allow profile mds
>>>>         caps: [mon] allow profile mds
>>>>         caps: [osd] allow *
>>>> mds.arh-ibstorage2-ib
>>>>         caps: [mds] allow
>>>>         caps: [mgr] allow profile mds
>>>>         caps: [mon] allow profile mds
>>>>         caps: [osd] allow *
>>>> osd.0
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.1
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.10
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.11
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.12
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.13
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.14
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.15
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.16
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.17
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.18
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.19
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.2
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.20
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.21
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.22
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.23
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.24
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.25
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.27
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.28
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.29
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.3
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.30
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.4
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.5
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.6
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.7
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.8
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> osd.9
>>>>         caps: [mgr] allow profile osd
>>>>         caps: [mon] allow profile osd
>>>>         caps: [osd] allow *
>>>> client.admin
>>>>         caps: [mds] allow rwx
>>>>         caps: [mgr] allow *
>>>>         caps: [mon] allow rwx
>>>>         caps: [osd] allow rwx
>>>> client.arh-ibstorage1-ib.csprdc.arhont.com
>>>>         caps: [mgr] allow r
>>>>         caps: [mon] allow rw
>>>>         caps: [osd] allow rwx
>>>> client.bootstrap-mds
>>>>         caps: [mgr] allow r
>>>>         caps: [mon] allow profile bootstrap-mds
>>>> client.bootstrap-mgr
>>>>         caps: [mon] allow profile bootstrap-mgr
>>>> client.bootstrap-osd
>>>>         caps: [mgr] allow r
>>>>         caps: [mon] allow profile bootstrap-osd
>>>> client.bootstrap-rgw
>>>>         caps: [mgr] allow r
>>>>         caps: [mon] allow profile bootstrap-rgw
>>>> client.ceph-monitors
>>>>         caps: [mgr] allow r
>>>>         caps: [mon] allow r
>>>> client.libvirt
>>>>         caps: [mgr] allow r
>>>>         caps: [mon] allow r
>>>>         caps: [osd] allow class-read object_prefix rbd_children, allow rwx
>>>>         pool=libvirt-pool
>>>> client.primary-ubuntu-1
>>>>         caps: [mgr] allow r
>>>>         caps: [mon] allow r
>>>>         caps: [osd] allow rwx pool=Primary-ubuntu-1
>>>> client.radosgw1.gateway
>>>>         caps: [mgr] allow r
>>>>         caps: [mon] allow rwx
>>>>         caps: [osd] allow rwx
>>>> client.radosgw2.gateway
>>>>         caps: [mgr] allow r
>>>>         caps: [mon] allow rw
>>>>         caps: [osd] allow rwx
>>>> client.ssdcs
>>>>         caps: [mgr] allow r
>>>>         caps: [mon] allow r
>>>>         caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=ssdcs
>>>>
>>>> mgr.arh-ibstorage1-ib
>>>>         caps: [mds] allow *
>>>>         caps: [mon] allow profile mgr
>>>>         caps: [osd] allow *
>>>> mgr.arh-ibstorage2-ib
>>>>         caps: [mds] allow *
>>>>         caps: [mon] allow profile mgr
>>>>         caps: [osd] allow *
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> I have ran this command on all pgs in the cluster and it shows the same error
>>>> message for all of them. For example:
>>>>
>>>> Error EPERM: problem getting command descriptions from pg.5.1c9
>>>>
>>>> Andrei
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
>>>>> To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
>>>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>>>>> Sent: Tuesday, 26 June, 2018 01:10:34
>>>>> Subject: Re:  fixing unrepairable inconsistent PG
>>>>
>>>>> Interesing...
>>>>>
>>>>> Can I see the output of "ceph auth list" and can you test whether you
>>>>> can query any other pg that has osd.21 as its primary?
>>>>>
>>>>> On Mon, Jun 25, 2018 at 8:04 PM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
>>>>>> Hi Brad,
>>>>>>
>>>>>> here is the output:
>>>>>>
>>>>>> --------------
>>>>>>
>>>>>> root@arh-ibstorage1-ib:/home/andrei# ceph --debug_ms 5 --debug_auth 20 pg 18.2
>>>>>> query
>>>>>> 2018-06-25 10:59:12.100302 7fe23eaa1700  2 Event(0x7fe2400e0140 nevent=5000
>>>>>> time_id=1).set_owner idx=0 owner=140609690670848
>>>>>> 2018-06-25 10:59:12.100398 7fe23e2a0700  2 Event(0x7fe24010d030 nevent=5000
>>>>>> time_id=1).set_owner idx=1 owner=140609682278144
>>>>>> 2018-06-25 10:59:12.100445 7fe23da9f700  2 Event(0x7fe240139ec0 nevent=5000
>>>>>> time_id=1).set_owner idx=2 owner=140609673885440
>>>>>> 2018-06-25 10:59:12.100793 7fe244b28700  1  Processor -- start
>>>>>> 2018-06-25 10:59:12.100869 7fe244b28700  1 -- - start start
>>>>>> 2018-06-25 10:59:12.100882 7fe244b28700  5 adding auth protocol: cephx
>>>>>> 2018-06-25 10:59:12.101046 7fe244b28700  2 auth: KeyRing::load: loaded key file
>>>>>> /etc/ceph/ceph.client.admin.keyring
>>>>>> 2018-06-25 10:59:12.101244 7fe244b28700  1 -- - --> 192.168.168.201:6789/0 --
>>>>>> auth(proto 0 30 bytes epoch 0) v1 -- 0x7fe240174b80 con 0
>>>>>> 2018-06-25 10:59:12.101264 7fe244b28700  1 -- - --> 192.168.168.202:6789/0 --
>>>>>> auth(proto 0 30 bytes epoch 0) v1 -- 0x7fe240175010 con 0
>>>>>> 2018-06-25 10:59:12.101690 7fe23e2a0700  1 -- 192.168.168.201:0/3046734987
>>>>>> learned_addr learned my addr 192.168.168.201:0/3046734987
>>>>>> 2018-06-25 10:59:12.101890 7fe23e2a0700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ
>>>>>> pgs=0 cs=0 l=1)._process_connection got newly_acked_seq 0 vs out_seq 0
>>>>>> 2018-06-25 10:59:12.102030 7fe23da9f700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.201:6789/0 conn(0x7fe24017a420 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ
>>>>>> pgs=0 cs=0 l=1)._process_connection got newly_acked_seq 0 vs out_seq 0
>>>>>> 2018-06-25 10:59:12.102450 7fe23e2a0700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1
>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=472363 cs=1 l=1). rx mon.1
>>>>>> seq 1 0x7fe234002670 mon_map magic: 0 v1
>>>>>> 2018-06-25 10:59:12.102494 7fe23e2a0700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1
>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=472363 cs=1 l=1). rx mon.1
>>>>>> seq 2 0x7fe234002b70 auth_reply(proto 2 0 (0) Success) v1
>>>>>> 2018-06-25 10:59:12.102542 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>> mon.1 192.168.168.202:6789/0 1 ==== mon_map magic: 0 v1 ==== 505+0+0
>>>>>> (2386987630 0 0) 0x7fe234002670 con 0x7fe240176dc0
>>>>>> 2018-06-25 10:59:12.102629 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>> mon.1 192.168.168.202:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ====
>>>>>> 33+0+0 (1469975654 0 0) 0x7fe234002b70 con 0x7fe240176dc0
>>>>>> 2018-06-25 10:59:12.102655 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mon
>>>>>> 2018-06-25 10:59:12.102657 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service osd
>>>>>> 2018-06-25 10:59:12.102658 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mgr
>>>>>> 2018-06-25 10:59:12.102661 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service auth
>>>>>> 2018-06-25 10:59:12.102662 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>> 0 need 53
>>>>>> 2018-06-25 10:59:12.102666 7fe23ca9d700 10 cephx client: handle_response ret = 0
>>>>>> 2018-06-25 10:59:12.102671 7fe23ca9d700 10 cephx client:  got initial server
>>>>>> challenge 6522ec95fb2eb487
>>>>>> 2018-06-25 10:59:12.102673 7fe23ca9d700 10 cephx client: validate_tickets:
>>>>>> want=53 need=53 have=0
>>>>>> 2018-06-25 10:59:12.102674 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mon
>>>>>> 2018-06-25 10:59:12.102675 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service osd
>>>>>> 2018-06-25 10:59:12.102676 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mgr
>>>>>> 2018-06-25 10:59:12.102676 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service auth
>>>>>> 2018-06-25 10:59:12.102677 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>> 0 need 53
>>>>>> 2018-06-25 10:59:12.102678 7fe23ca9d700 10 cephx client: want=53 need=53 have=0
>>>>>> 2018-06-25 10:59:12.102680 7fe23ca9d700 10 cephx client: build_request
>>>>>> 2018-06-25 10:59:12.102702 7fe23da9f700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.201:6789/0 conn(0x7fe24017a420 :-1
>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=333625 cs=1 l=1). rx mon.0
>>>>>> seq 1 0x7fe228001490 mon_map magic: 0 v1
>>>>>> 2018-06-25 10:59:12.102739 7fe23ca9d700 10 cephx client: get auth session key:
>>>>>> client_challenge 80f2a24093f783c5
>>>>>> 2018-06-25 10:59:12.102743 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>> 192.168.168.202:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7fe224002080
>>>>>> con 0
>>>>>> 2018-06-25 10:59:12.102737 7fe23da9f700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.201:6789/0 conn(0x7fe24017a420 :-1
>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=333625 cs=1 l=1). rx mon.0
>>>>>> seq 2 0x7fe2280019c0 auth_reply(proto 2 0 (0) Success) v1
>>>>>> 2018-06-25 10:59:12.102776 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>> mon.0 192.168.168.201:6789/0 1 ==== mon_map magic: 0 v1 ==== 505+0+0
>>>>>> (2386987630 0 0) 0x7fe228001490 con 0x7fe24017a420
>>>>>> 2018-06-25 10:59:12.102821 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>> mon.0 192.168.168.201:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ====
>>>>>> 33+0+0 (3800394028 0 0) 0x7fe2280019c0 con 0x7fe24017a420
>>>>>> 2018-06-25 10:59:12.102833 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mon
>>>>>> 2018-06-25 10:59:12.102834 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service osd
>>>>>> 2018-06-25 10:59:12.102835 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mgr
>>>>>> 2018-06-25 10:59:12.102836 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service auth
>>>>>> 2018-06-25 10:59:12.102837 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>> 0 need 53
>>>>>> 2018-06-25 10:59:12.102839 7fe23ca9d700 10 cephx client: handle_response ret = 0
>>>>>> 2018-06-25 10:59:12.102841 7fe23ca9d700 10 cephx client:  got initial server
>>>>>> challenge ccd69ce967642f7
>>>>>> 2018-06-25 10:59:12.102842 7fe23ca9d700 10 cephx client: validate_tickets:
>>>>>> want=53 need=53 have=0
>>>>>> 2018-06-25 10:59:12.102843 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mon
>>>>>> 2018-06-25 10:59:12.102843 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service osd
>>>>>> 2018-06-25 10:59:12.102844 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mgr
>>>>>> 2018-06-25 10:59:12.102845 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service auth
>>>>>> 2018-06-25 10:59:12.102845 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>> 0 need 53
>>>>>> 2018-06-25 10:59:12.102846 7fe23ca9d700 10 cephx client: want=53 need=53 have=0
>>>>>> 2018-06-25 10:59:12.102848 7fe23ca9d700 10 cephx client: build_request
>>>>>> 2018-06-25 10:59:12.102881 7fe23ca9d700 10 cephx client: get auth session key:
>>>>>> client_challenge 6ddb6fdc4176ea6a
>>>>>> 2018-06-25 10:59:12.102884 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>> 192.168.168.201:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7fe2240032d0
>>>>>> con 0
>>>>>> 2018-06-25 10:59:12.103402 7fe23e2a0700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1
>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=472363 cs=1 l=1). rx mon.1
>>>>>> seq 3 0x7fe234002200 auth_reply(proto 2 0 (0) Success) v1
>>>>>> 2018-06-25 10:59:12.103449 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>> mon.1 192.168.168.202:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ====
>>>>>> 206+0+0 (195487815 0 0) 0x7fe234002200 con 0x7fe240176dc0
>>>>>> 2018-06-25 10:59:12.103468 7fe23ca9d700 10 cephx client: handle_response ret = 0
>>>>>> 2018-06-25 10:59:12.103469 7fe23ca9d700 10 cephx client:  get_auth_session_key
>>>>>> 2018-06-25 10:59:12.103471 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>> got 1 keys
>>>>>> 2018-06-25 10:59:12.103472 7fe23ca9d700 10 cephx: got key for service_id auth
>>>>>> 2018-06-25 10:59:12.103508 7fe23ca9d700 10 cephx:  ticket.secret_id=3687
>>>>>> 2018-06-25 10:59:12.103510 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>> service auth secret_id 3687 session_key [KEY] validity=43200.000000
>>>>>> 2018-06-25 10:59:12.103527 7fe23ca9d700 10 cephx: ticket expires=2018-06-25
>>>>>> 22:59:12.103526 renew_after=2018-06-25 19:59:12.103526
>>>>>> 2018-06-25 10:59:12.103533 7fe23ca9d700 10 cephx client:  want=53 need=53 have=0
>>>>>> 2018-06-25 10:59:12.103534 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mon
>>>>>> 2018-06-25 10:59:12.103535 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service osd
>>>>>> 2018-06-25 10:59:12.103536 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mgr
>>>>>> 2018-06-25 10:59:12.103537 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>> 32 need 21
>>>>>> 2018-06-25 10:59:12.103539 7fe23ca9d700 10 cephx client: validate_tickets:
>>>>>> want=53 need=21 have=32
>>>>>> 2018-06-25 10:59:12.103540 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mon
>>>>>> 2018-06-25 10:59:12.103541 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service osd
>>>>>> 2018-06-25 10:59:12.103542 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mgr
>>>>>> 2018-06-25 10:59:12.103542 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>> 32 need 21
>>>>>> 2018-06-25 10:59:12.103543 7fe23ca9d700 10 cephx client: want=53 need=21 have=32
>>>>>> 2018-06-25 10:59:12.103544 7fe23ca9d700 10 cephx client: build_request
>>>>>> 2018-06-25 10:59:12.103545 7fe23ca9d700 10 cephx client: get service keys:
>>>>>> want=53 need=21 have=32
>>>>>> 2018-06-25 10:59:12.103570 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>> 192.168.168.202:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x7fe224007010
>>>>>> con 0
>>>>>> 2018-06-25 10:59:12.103657 7fe23da9f700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.201:6789/0 conn(0x7fe24017a420 :-1
>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=333625 cs=1 l=1). rx mon.0
>>>>>> seq 3 0x7fe228001020 auth_reply(proto 2 0 (0) Success) v1
>>>>>> 2018-06-25 10:59:12.103709 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>> mon.0 192.168.168.201:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ====
>>>>>> 206+0+0 (2366624548 0 0) 0x7fe228001020 con 0x7fe24017a420
>>>>>> 2018-06-25 10:59:12.103729 7fe23ca9d700 10 cephx client: handle_response ret = 0
>>>>>> 2018-06-25 10:59:12.103731 7fe23ca9d700 10 cephx client:  get_auth_session_key
>>>>>> 2018-06-25 10:59:12.103733 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>> got 1 keys
>>>>>> 2018-06-25 10:59:12.103734 7fe23ca9d700 10 cephx: got key for service_id auth
>>>>>> 2018-06-25 10:59:12.103774 7fe23ca9d700 10 cephx:  ticket.secret_id=3687
>>>>>> 2018-06-25 10:59:12.103776 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>> service auth secret_id 3687 session_key [KEY] validity=43200.000000
>>>>>> 2018-06-25 10:59:12.103792 7fe23ca9d700 10 cephx: ticket expires=2018-06-25
>>>>>> 22:59:12.103791 renew_after=2018-06-25 19:59:12.103791
>>>>>> 2018-06-25 10:59:12.103798 7fe23ca9d700 10 cephx client:  want=53 need=53 have=0
>>>>>> 2018-06-25 10:59:12.103799 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mon
>>>>>> 2018-06-25 10:59:12.103800 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service osd
>>>>>> 2018-06-25 10:59:12.103801 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mgr
>>>>>> 2018-06-25 10:59:12.103802 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>> 32 need 21
>>>>>> 2018-06-25 10:59:12.103804 7fe23ca9d700 10 cephx client: validate_tickets:
>>>>>> want=53 need=21 have=32
>>>>>> 2018-06-25 10:59:12.103806 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mon
>>>>>> 2018-06-25 10:59:12.103806 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service osd
>>>>>> 2018-06-25 10:59:12.103807 7fe23ca9d700 10 cephx: set_have_need_key no handler
>>>>>> for service mgr
>>>>>> 2018-06-25 10:59:12.103808 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>> 32 need 21
>>>>>> 2018-06-25 10:59:12.103808 7fe23ca9d700 10 cephx client: want=53 need=21 have=32
>>>>>> 2018-06-25 10:59:12.103812 7fe23ca9d700 10 cephx client: build_request
>>>>>> 2018-06-25 10:59:12.103813 7fe23ca9d700 10 cephx client: get service keys:
>>>>>> want=53 need=21 have=32
>>>>>> 2018-06-25 10:59:12.103834 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>> 192.168.168.201:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x7fe224009dd0
>>>>>> con 0
>>>>>> 2018-06-25 10:59:12.104168 7fe23e2a0700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1
>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=472363 cs=1 l=1). rx mon.1
>>>>>> seq 4 0x7fe234002200 auth_reply(proto 2 0 (0) Success) v1
>>>>>> 2018-06-25 10:59:12.104201 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>> mon.1 192.168.168.202:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ====
>>>>>> 580+0+0 (56981162 0 0) 0x7fe234002200 con 0x7fe240176dc0
>>>>>> 2018-06-25 10:59:12.104223 7fe23ca9d700 10 cephx client: handle_response ret = 0
>>>>>> 2018-06-25 10:59:12.104226 7fe23ca9d700 10 cephx client:
>>>>>> get_principal_session_key session_key [KEY]
>>>>>> 2018-06-25 10:59:12.104238 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>> got 3 keys
>>>>>> 2018-06-25 10:59:12.104240 7fe23ca9d700 10 cephx: got key for service_id mon
>>>>>> 2018-06-25 10:59:12.104276 7fe23ca9d700 10 cephx:  ticket.secret_id=44205
>>>>>> 2018-06-25 10:59:12.104277 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>> service mon secret_id 44205 session_key [KEY] validity=3600.000000
>>>>>> 2018-06-25 10:59:12.104285 7fe23ca9d700 10 cephx: ticket expires=2018-06-25
>>>>>> 11:59:12.104284 renew_after=2018-06-25 11:44:12.104284
>>>>>> 2018-06-25 10:59:12.104290 7fe23ca9d700 10 cephx: got key for service_id osd
>>>>>> 2018-06-25 10:59:12.104313 7fe23ca9d700 10 cephx:  ticket.secret_id=44205
>>>>>> 2018-06-25 10:59:12.104314 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>> service osd secret_id 44205 session_key [KEY] validity=3600.000000
>>>>>> 2018-06-25 10:59:12.104329 7fe23ca9d700 10 cephx: ticket expires=2018-06-25
>>>>>> 11:59:12.104329 renew_after=2018-06-25 11:44:12.104329
>>>>>> 2018-06-25 10:59:12.104333 7fe23ca9d700 10 cephx: got key for service_id mgr
>>>>>> 2018-06-25 10:59:12.104355 7fe23ca9d700 10 cephx:  ticket.secret_id=204
>>>>>> 2018-06-25 10:59:12.104356 7fe23ca9d700 10 cephx: verify_service_ticket_reply
>>>>>> service mgr secret_id 204 session_key [KEY] validity=3600.000000
>>>>>> 2018-06-25 10:59:12.104368 7fe23ca9d700 10 cephx: ticket expires=2018-06-25
>>>>>> 11:59:12.104368 renew_after=2018-06-25 11:44:12.104368
>>>>>> 2018-06-25 10:59:12.104373 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>> 53 need 0
>>>>>> 2018-06-25 10:59:12.104376 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.201:6789/0 conn(0x7fe24017a420 :-1 s=STATE_OPEN pgs=333625 cs=1
>>>>>> l=1).mark_down
>>>>>> 2018-06-25 10:59:12.104384 7fe23ca9d700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.201:6789/0 conn(0x7fe24017a420 :-1 s=STATE_OPEN pgs=333625 cs=1
>>>>>> l=1)._stop
>>>>>> 2018-06-25 10:59:12.104426 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>> 192.168.168.202:6789/0 -- mon_subscribe({monmap=0+}) v2 -- 0x7fe240180bb0 con 0
>>>>>> 2018-06-25 10:59:12.104442 7fe23ca9d700 10 cephx: validate_tickets want 53 have
>>>>>> 53 need 0
>>>>>> 2018-06-25 10:59:12.104444 7fe23ca9d700 20 cephx client: need_tickets: want=53
>>>>>> have=53 need=0
>>>>>> 2018-06-25 10:59:12.104481 7fe244b28700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>> 192.168.168.202:6789/0 -- mon_subscribe({mgrmap=0+}) v2 -- 0x7fe240175010 con 0
>>>>>> 2018-06-25 10:59:12.104573 7fe244b28700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>> 192.168.168.202:6789/0 -- mon_subscribe({osdmap=0}) v2 -- 0x7fe24017ea90 con 0
>>>>>> 2018-06-25 10:59:12.104979 7fe23e2a0700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1
>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=472363 cs=1 l=1). rx mon.1
>>>>>> seq 5 0x7fe234002b90 mon_map magic: 0 v1
>>>>>> 2018-06-25 10:59:12.105008 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>> mon.1 192.168.168.202:6789/0 5 ==== mon_map magic: 0 v1 ==== 505+0+0
>>>>>> (2386987630 0 0) 0x7fe234002b90 con 0x7fe240176dc0
>>>>>> 2018-06-25 10:59:12.105022 7fe23e2a0700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1
>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=472363 cs=1 l=1). rx mon.1
>>>>>> seq 6 0x7fe234001a60 mgrmap(e 139) v1
>>>>>> 2018-06-25 10:59:12.105058 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>> mon.1 192.168.168.202:6789/0 6 ==== mgrmap(e 139) v1 ==== 381+0+0 (56579516 0
>>>>>> 0) 0x7fe234001a60 con 0x7fe240176dc0
>>>>>> 2018-06-25 10:59:12.105066 7fe23e2a0700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1
>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=472363 cs=1 l=1). rx mon.1
>>>>>> seq 7 0x7fe234002610 osd_map(121251..121251 src has 120729..121251) v3
>>>>>> 2018-06-25 10:59:12.105110 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>> mon.1 192.168.168.202:6789/0 7 ==== osd_map(121251..121251 src has
>>>>>> 120729..121251) v3 ==== 18118+0+0 (421862548 0 0) 0x7fe234002610 con
>>>>>> 0x7fe240176dc0
>>>>>> 2018-06-25 10:59:12.105405 7fe23da9f700 10 cephx client: build_authorizer for
>>>>>> service mgr
>>>>>> 2018-06-25 10:59:12.105685 7fe23da9f700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.201:6840/32624 conn(0x7fe2240127b0 :-1
>>>>>> s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=1)._process_connection got
>>>>>> newly_acked_seq 0 vs out_seq 0
>>>>>> 2018-06-25 10:59:12.105720 7fe23da9f700 10 In get_auth_session_handler for
>>>>>> protocol 2
>>>>>> 2018-06-25 10:59:12.108653 7fe244b28700  1 -- 192.168.168.201:0/3046734987 -->
>>>>>> 192.168.168.203:6828/43673 -- command(tid 1: {"prefix":
>>>>>> "get_command_descriptions", "pgid": "18.2"}) v1 -- 0x7fe240184580 con 0
>>>>>> 2018-06-25 10:59:12.109327 7fe23eaa1700 10 cephx client: build_authorizer for
>>>>>> service osd
>>>>>> 2018-06-25 10:59:12.109828 7fe23eaa1700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.203:6828/43673 conn(0x7fe240180f20 :-1
>>>>>> s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=1)._process_connection got
>>>>>> newly_acked_seq 0 vs out_seq 0
>>>>>> 2018-06-25 10:59:12.109875 7fe23eaa1700 10 In get_auth_session_handler for
>>>>>> protocol 2
>>>>>> 2018-06-25 10:59:12.109921 7fe23eaa1700 10 _calc_signature seq 1 front_crc_ =
>>>>>> 2696387361 middle_crc = 0 data_crc = 0 sig = 10077981589201542762
>>>>>> 2018-06-25 10:59:12.109930 7fe23eaa1700 20 Putting signature in client
>>>>>> message(seq # 1): sig = 10077981589201542762
>>>>>> 2018-06-25 10:59:12.110382 7fe23eaa1700 10 _calc_signature seq 1 front_crc_ =
>>>>>> 1943489909 middle_crc = 0 data_crc = 0 sig = 6955259887491975287
>>>>>> 2018-06-25 10:59:12.110394 7fe23eaa1700  5 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.203:6828/43673 conn(0x7fe240180f20 :-1
>>>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=26353 cs=1 l=1). rx osd.21
>>>>>> seq 1 0x7fe238000f20 command_reply(tid 1: -1 ) v1
>>>>>> 2018-06-25 10:59:12.110436 7fe23ca9d700  1 -- 192.168.168.201:0/3046734987 <==
>>>>>> osd.21 192.168.168.203:6828/43673 1 ==== command_reply(tid 1: -1 ) v1 ====
>>>>>> 8+0+0 (1943489909 0 0) 0x7fe238000f20 con 0x7fe240180f20
>>>>>> Error EPERM: problem getting command descriptions from pg.18.2
>>>>>> 2018-06-25 10:59:12.112168 7fe244b28700  1 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.203:6828/43673 conn(0x7fe240180f20 :-1 s=STATE_OPEN pgs=26353 cs=1
>>>>>> l=1).mark_down
>>>>>> 2018-06-25 10:59:12.112190 7fe244b28700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.203:6828/43673 conn(0x7fe240180f20 :-1 s=STATE_OPEN pgs=26353 cs=1
>>>>>> l=1)._stop
>>>>>> 2018-06-25 10:59:12.112337 7fe244b28700  1 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.201:6840/32624 conn(0x7fe2240127b0 :-1 s=STATE_OPEN pgs=575947 cs=1
>>>>>> l=1).mark_down
>>>>>> 2018-06-25 10:59:12.112348 7fe244b28700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.201:6840/32624 conn(0x7fe2240127b0 :-1 s=STATE_OPEN pgs=575947 cs=1
>>>>>> l=1)._stop
>>>>>> 2018-06-25 10:59:12.112367 7fe244b28700  1 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1 s=STATE_OPEN pgs=472363 cs=1
>>>>>> l=1).mark_down
>>>>>> 2018-06-25 10:59:12.112372 7fe244b28700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.202:6789/0 conn(0x7fe240176dc0 :-1 s=STATE_OPEN pgs=472363 cs=1
>>>>>> l=1)._stop
>>>>>> 2018-06-25 10:59:12.112519 7fe244b28700  1 -- 192.168.168.201:0/3046734987
>>>>>> shutdown_connections
>>>>>> 2018-06-25 10:59:12.112530 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>> shutdown_connections mark down 192.168.168.201:6840/32624 0x7fe2240127b0
>>>>>> 2018-06-25 10:59:12.112538 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>> shutdown_connections mark down 192.168.168.201:6789/0 0x7fe24017a420
>>>>>> 2018-06-25 10:59:12.112543 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>> shutdown_connections mark down 192.168.168.203:6828/43673 0x7fe240180f20
>>>>>> 2018-06-25 10:59:12.112549 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>> shutdown_connections mark down 192.168.168.202:6789/0 0x7fe240176dc0
>>>>>> 2018-06-25 10:59:12.112554 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>> shutdown_connections delete 0x7fe2240127b0
>>>>>> 2018-06-25 10:59:12.112570 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>> shutdown_connections delete 0x7fe240176dc0
>>>>>> 2018-06-25 10:59:12.112577 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>> shutdown_connections delete 0x7fe24017a420
>>>>>> 2018-06-25 10:59:12.112582 7fe244b28700  5 -- 192.168.168.201:0/3046734987
>>>>>> shutdown_connections delete 0x7fe240180f20
>>>>>> 2018-06-25 10:59:12.112701 7fe244b28700  1 -- 192.168.168.201:0/3046734987
>>>>>> shutdown_connections
>>>>>> 2018-06-25 10:59:12.112752 7fe244b28700  1 -- 192.168.168.201:0/3046734987 wait
>>>>>> complete.
>>>>>> 2018-06-25 10:59:12.112764 7fe244b28700  1 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.201:0/3046734987 conn(0x7fe240167220 :-1 s=STATE_NONE pgs=0 cs=0
>>>>>> l=0).mark_down
>>>>>> 2018-06-25 10:59:12.112770 7fe244b28700  2 -- 192.168.168.201:0/3046734987 >>
>>>>>> 192.168.168.201:0/3046734987 conn(0x7fe240167220 :-1 s=STATE_NONE pgs=0 cs=0
>>>>>> l=0)._stop
>>>>>>
>>>>>>
>>>>>> ----------
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
>>>>>>> To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
>>>>>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>>>>>>> Sent: Monday, 25 June, 2018 02:28:55
>>>>>>> Subject: Re:  fixing unrepairable inconsistent PG
>>>>>>
>>>>>>> Can you try the following?
>>>>>>>
>>>>>>> $ ceph --debug_ms 5 --debug_auth 20 pg 18.2 query
>>>>>>>
>>>>>>> On Fri, Jun 22, 2018 at 7:54 PM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
>>>>>>>> Hi Brad,
>>>>>>>>
>>>>>>>> here is the output of the command (replaced the real auth key with [KEY]):
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------
>>>>>>>>
>>>>>>>> 2018-06-22 10:47:27.659895 7f70ef9e6700 10 monclient: build_initial_monmap
>>>>>>>> 2018-06-22 10:47:27.661995 7f70ef9e6700 10 monclient: init
>>>>>>>> 2018-06-22 10:47:27.662002 7f70ef9e6700  5 adding auth protocol: cephx
>>>>>>>> 2018-06-22 10:47:27.662004 7f70ef9e6700 10 monclient: auth_supported 2 method
>>>>>>>> cephx
>>>>>>>> 2018-06-22 10:47:27.662221 7f70ef9e6700  2 auth: KeyRing::load: loaded key file
>>>>>>>> /etc/ceph/ceph.client.admin.keyring
>>>>>>>> 2018-06-22 10:47:27.662338 7f70ef9e6700 10 monclient: _reopen_session rank -1
>>>>>>>> 2018-06-22 10:47:27.662425 7f70ef9e6700 10 monclient(hunting): picked
>>>>>>>> mon.noname-b con 0x7f70e8176c80 addr 192.168.168.202:6789/0
>>>>>>>> 2018-06-22 10:47:27.662484 7f70ef9e6700 10 monclient(hunting): picked
>>>>>>>> mon.noname-a con 0x7f70e817a2e0 addr 192.168.168.201:6789/0
>>>>>>>> 2018-06-22 10:47:27.662534 7f70ef9e6700 10 monclient(hunting): _renew_subs
>>>>>>>> 2018-06-22 10:47:27.662544 7f70ef9e6700 10 monclient(hunting): authenticate will
>>>>>>>> time out at 2018-06-22 10:52:27.662543
>>>>>>>> 2018-06-22 10:47:27.663831 7f70d77fe700 10 monclient(hunting): handle_monmap
>>>>>>>> mon_map magic: 0 v1
>>>>>>>> 2018-06-22 10:47:27.663885 7f70d77fe700 10 monclient(hunting):  got monmap 20,
>>>>>>>> mon.noname-b is now rank -1
>>>>>>>> 2018-06-22 10:47:27.663889 7f70d77fe700 10 monclient(hunting): dump:
>>>>>>>> epoch 20
>>>>>>>> fsid 51e9f641-372e-44ec-92a4-b9fe55cbf9fe
>>>>>>>> last_changed 2018-06-16 23:14:48.936175
>>>>>>>> created 0.000000
>>>>>>>> 0: 192.168.168.201:6789/0 mon.arh-ibstorage1-ib
>>>>>>>> 1: 192.168.168.202:6789/0 mon.arh-ibstorage2-ib
>>>>>>>> 2: 192.168.168.203:6789/0 mon.arh-ibstorage3-ib
>>>>>>>>
>>>>>>>> 2018-06-22 10:47:27.664005 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-22 10:47:27.664020 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-22 10:47:27.664021 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-22 10:47:27.664025 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service auth
>>>>>>>> 2018-06-22 10:47:27.664026 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>> 0 need 53
>>>>>>>> 2018-06-22 10:47:27.664032 7f70d77fe700 10 monclient(hunting): my global_id is
>>>>>>>> 411322261
>>>>>>>> 2018-06-22 10:47:27.664035 7f70d77fe700 10 cephx client: handle_response ret = 0
>>>>>>>> 2018-06-22 10:47:27.664046 7f70d77fe700 10 cephx client:  got initial server
>>>>>>>> challenge d66f2dffc2113d43
>>>>>>>> 2018-06-22 10:47:27.664049 7f70d77fe700 10 cephx client: validate_tickets:
>>>>>>>> want=53 need=53 have=0
>>>>>>>>
>>>>>>>> 2018-06-22 10:47:27.664052 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-22 10:47:27.664053 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-22 10:47:27.664054 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-22 10:47:27.664055 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service auth
>>>>>>>> 2018-06-22 10:47:27.664056 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>> 0 need 53
>>>>>>>> 2018-06-22 10:47:27.664057 7f70d77fe700 10 cephx client: want=53 need=53 have=0
>>>>>>>> 2018-06-22 10:47:27.664061 7f70d77fe700 10 cephx client: build_request
>>>>>>>> 2018-06-22 10:47:27.664145 7f70d77fe700 10 cephx client: get auth session key:
>>>>>>>> client_challenge d4c95f637e641b55
>>>>>>>> 2018-06-22 10:47:27.664175 7f70d77fe700 10 monclient(hunting): handle_monmap
>>>>>>>> mon_map magic: 0 v1
>>>>>>>> 2018-06-22 10:47:27.664208 7f70d77fe700 10 monclient(hunting):  got monmap 20,
>>>>>>>> mon.arh-ibstorage1-ib is now rank 0
>>>>>>>> 2018-06-22 10:47:27.664211 7f70d77fe700 10 monclient(hunting): dump:
>>>>>>>> epoch 20
>>>>>>>> fsid 51e9f641-372e-44ec-92a4-b9fe55cbf9fe
>>>>>>>> last_changed 2018-06-16 23:14:48.936175
>>>>>>>> created 0.000000
>>>>>>>> 0: 192.168.168.201:6789/0 mon.arh-ibstorage1-ib
>>>>>>>> 1: 192.168.168.202:6789/0 mon.arh-ibstorage2-ib
>>>>>>>> 2: 192.168.168.203:6789/0 mon.arh-ibstorage3-ib
>>>>>>>>
>>>>>>>> 2018-06-22 10:47:27.664241 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-22 10:47:27.664244 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-22 10:47:27.664245 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-22 10:47:27.664246 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service auth
>>>>>>>> 2018-06-22 10:47:27.664247 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>> 0 need 53
>>>>>>>> 2018-06-22 10:47:27.664251 7f70d77fe700 10 monclient(hunting): my global_id is
>>>>>>>> 411323061
>>>>>>>> 2018-06-22 10:47:27.664253 7f70d77fe700 10 cephx client: handle_response ret = 0
>>>>>>>> 2018-06-22 10:47:27.664256 7f70d77fe700 10 cephx client:  got initial server
>>>>>>>> challenge d5d3c1e5bcf3c0b8
>>>>>>>> 2018-06-22 10:47:27.664258 7f70d77fe700 10 cephx client: validate_tickets:
>>>>>>>> want=53 need=53 have=0
>>>>>>>> 2018-06-22 10:47:27.664260 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-22 10:47:27.664261 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-22 10:47:27.664262 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-22 10:47:27.664263 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service auth
>>>>>>>> 2018-06-22 10:47:27.664264 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>> 0 need 53
>>>>>>>> 2018-06-22 10:47:27.664265 7f70d77fe700 10 cephx client: want=53 need=53 have=0
>>>>>>>> 2018-06-22 10:47:27.664268 7f70d77fe700 10 cephx client: build_request
>>>>>>>> 2018-06-22 10:47:27.664328 7f70d77fe700 10 cephx client: get auth session key:
>>>>>>>> client_challenge d31821a6437d4974
>>>>>>>> 2018-06-22 10:47:27.664651 7f70d77fe700 10 cephx client: handle_response ret = 0
>>>>>>>> 2018-06-22 10:47:27.664667 7f70d77fe700 10 cephx client:  get_auth_session_key
>>>>>>>> 2018-06-22 10:47:27.664673 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>> got 1 keys
>>>>>>>> 2018-06-22 10:47:27.664676 7f70d77fe700 10 cephx: got key for service_id auth
>>>>>>>> 2018-06-22 10:47:27.664766 7f70d77fe700 10 cephx:  ticket.secret_id=3681
>>>>>>>> 2018-06-22 10:47:27.664774 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>> service auth secret_id 3681 session_key [KEY] validity=43200.000000
>>>>>>>> 2018-06-22 10:47:27.664806 7f70d77fe700 10 cephx: ticket expires=2018-06-22
>>>>>>>> 22:47:27.664805 renew_after=2018-06-22 19:47:27.664805
>>>>>>>> 2018-06-22 10:47:27.664825 7f70d77fe700 10 cephx client:  want=53 need=53 have=0
>>>>>>>> 2018-06-22 10:47:27.664827 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-22 10:47:27.664829 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-22 10:47:27.664830 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-22 10:47:27.664832 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>> 32 need 21
>>>>>>>> 2018-06-22 10:47:27.664836 7f70d77fe700 10 cephx client: validate_tickets:
>>>>>>>> want=53 need=21 have=32
>>>>>>>> 2018-06-22 10:47:27.664837 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-22 10:47:27.664839 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-22 10:47:27.664840 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-22 10:47:27.664841 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>> 32 need 21
>>>>>>>> 2018-06-22 10:47:27.664842 7f70d77fe700 10 cephx client: want=53 need=21 have=32
>>>>>>>> 2018-06-22 10:47:27.664844 7f70d77fe700 10 cephx client: build_request
>>>>>>>> 2018-06-22 10:47:27.664846 7f70d77fe700 10 cephx client: get service keys:
>>>>>>>> want=53 need=21 have=32
>>>>>>>> 2018-06-22 10:47:27.664928 7f70d77fe700 10 cephx client: handle_response ret = 0
>>>>>>>> 2018-06-22 10:47:27.664933 7f70d77fe700 10 cephx client:  get_auth_session_key
>>>>>>>> 2018-06-22 10:47:27.664935 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>> got 1 keys
>>>>>>>> 2018-06-22 10:47:27.664937 7f70d77fe700 10 cephx: got key for service_id auth
>>>>>>>> 2018-06-22 10:47:27.664985 7f70d77fe700 10 cephx:  ticket.secret_id=3681
>>>>>>>> 2018-06-22 10:47:27.664987 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>> service auth secret_id 3681 session_key [KEY] validity=43200.000000
>>>>>>>> 2018-06-22 10:47:27.665009 7f70d77fe700 10 cephx: ticket expires=2018-06-22
>>>>>>>> 22:47:27.665008 renew_after=2018-06-22 19:47:27.665008
>>>>>>>> 2018-06-22 10:47:27.665017 7f70d77fe700 10 cephx client:  want=53 need=53 have=0
>>>>>>>> 2018-06-22 10:47:27.665019 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-22 10:47:27.665020 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-22 10:47:27.665024 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-22 10:47:27.665026 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>> 32 need 21
>>>>>>>> 2018-06-22 10:47:27.665029 7f70d77fe700 10 cephx client: validate_tickets:
>>>>>>>> want=53 need=21 have=32
>>>>>>>> 2018-06-22 10:47:27.665031 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mon
>>>>>>>> 2018-06-22 10:47:27.665032 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service osd
>>>>>>>> 2018-06-22 10:47:27.665033 7f70d77fe700 10 cephx: set_have_need_key no handler
>>>>>>>> for service mgr
>>>>>>>> 2018-06-22 10:47:27.665034 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>> 32 need 21
>>>>>>>> 2018-06-22 10:47:27.665035 7f70d77fe700 10 cephx client: want=53 need=21 have=32
>>>>>>>> 2018-06-22 10:47:27.665037 7f70d77fe700 10 cephx client: build_request
>>>>>>>> 2018-06-22 10:47:27.665039 7f70d77fe700 10 cephx client: get service keys:
>>>>>>>> want=53 need=21 have=32
>>>>>>>> 2018-06-22 10:47:27.665354 7f70d77fe700 10 cephx client: handle_response ret = 0
>>>>>>>> 2018-06-22 10:47:27.665365 7f70d77fe700 10 cephx client:
>>>>>>>> get_principal_session_key session_key [KEY]
>>>>>>>> 2018-06-22 10:47:27.665377 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>> got 3 keys
>>>>>>>> 2018-06-22 10:47:27.665379 7f70d77fe700 10 cephx: got key for service_id mon
>>>>>>>> 2018-06-22 10:47:27.665419 7f70d77fe700 10 cephx:  ticket.secret_id=44133
>>>>>>>> 2018-06-22 10:47:27.665425 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>> service mon secret_id 44133 session_key [KEY] validity=3600.000000
>>>>>>>> 2018-06-22 10:47:27.665437 7f70d77fe700 10 cephx: ticket expires=2018-06-22
>>>>>>>> 11:47:27.665436 renew_after=2018-06-22 11:32:27.665436
>>>>>>>> 2018-06-22 10:47:27.665443 7f70d77fe700 10 cephx: got key for service_id osd
>>>>>>>> 2018-06-22 10:47:27.665476 7f70d77fe700 10 cephx:  ticket.secret_id=44133
>>>>>>>> 2018-06-22 10:47:27.665478 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>> service osd secret_id 44133 session_key [KEY] validity=3600.000000
>>>>>>>> 2018-06-22 10:47:27.665497 7f70d77fe700 10 cephx: ticket expires=2018-06-22
>>>>>>>> 11:47:27.665496 renew_after=2018-06-22 11:32:27.665496
>>>>>>>> 2018-06-22 10:47:27.665506 7f70d77fe700 10 cephx: got key for service_id mgr
>>>>>>>> 2018-06-22 10:47:27.665539 7f70d77fe700 10 cephx:  ticket.secret_id=132
>>>>>>>> 2018-06-22 10:47:27.665546 7f70d77fe700 10 cephx: verify_service_ticket_reply
>>>>>>>> service mgr secret_id 132 session_key [KEY] validity=3600.000000
>>>>>>>> 2018-06-22 10:47:27.665564 7f70d77fe700 10 cephx: ticket expires=2018-06-22
>>>>>>>> 11:47:27.665564 renew_after=2018-06-22 11:32:27.665564
>>>>>>>> 2018-06-22 10:47:27.665573 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>> 53 need 0
>>>>>>>> 2018-06-22 10:47:27.665602 7f70d77fe700  1 monclient: found
>>>>>>>> mon.arh-ibstorage2-ib
>>>>>>>> 2018-06-22 10:47:27.665617 7f70d77fe700 20 monclient: _un_backoff
>>>>>>>> reopen_interval_multipler now 1
>>>>>>>> 2018-06-22 10:47:27.665636 7f70d77fe700 10 monclient: _send_mon_message to
>>>>>>>> mon.arh-ibstorage2-ib at 192.168.168.202:6789/0
>>>>>>>> 2018-06-22 10:47:27.665656 7f70d77fe700 10 cephx: validate_tickets want 53 have
>>>>>>>> 53 need 0
>>>>>>>> 2018-06-22 10:47:27.665658 7f70d77fe700 20 cephx client: need_tickets: want=53
>>>>>>>> have=53 need=0
>>>>>>>> 2018-06-22 10:47:27.665661 7f70d77fe700 20 monclient: _check_auth_rotating not
>>>>>>>> needed by client.admin
>>>>>>>> 2018-06-22 10:47:27.665678 7f70ef9e6700  5 monclient: authenticate success,
>>>>>>>> global_id 411322261
>>>>>>>> 2018-06-22 10:47:27.665694 7f70ef9e6700 10 monclient: _renew_subs
>>>>>>>> 2018-06-22 10:47:27.665698 7f70ef9e6700 10 monclient: _send_mon_message to
>>>>>>>> mon.arh-ibstorage2-ib at 192.168.168.202:6789/0
>>>>>>>> 2018-06-22 10:47:27.665817 7f70ef9e6700 10 monclient: _renew_subs
>>>>>>>> 2018-06-22 10:47:27.665828 7f70ef9e6700 10 monclient: _send_mon_message to
>>>>>>>> mon.arh-ibstorage2-ib at 192.168.168.202:6789/0
>>>>>>>> 2018-06-22 10:47:27.666069 7f70d77fe700 10 monclient: handle_monmap mon_map
>>>>>>>> magic: 0 v1
>>>>>>>> 2018-06-22 10:47:27.666102 7f70d77fe700 10 monclient:  got monmap 20,
>>>>>>>> mon.arh-ibstorage2-ib is now rank 1
>>>>>>>> 2018-06-22 10:47:27.666110 7f70d77fe700 10 monclient: dump:
>>>>>>>>
>>>>>>>> epoch 20
>>>>>>>> fsid 51e9f641-372e-44ec-92a4-b9fe55cbf9fe
>>>>>>>> last_changed 2018-06-16 23:14:48.936175
>>>>>>>> created 0.000000
>>>>>>>> 0: 192.168.168.201:6789/0 mon.arh-ibstorage1-ib
>>>>>>>> 1: 192.168.168.202:6789/0 mon.arh-ibstorage2-ib
>>>>>>>> 2: 192.168.168.203:6789/0 mon.arh-ibstorage3-ib
>>>>>>>>
>>>>>>>> 2018-06-22 10:47:27.666617 7f70eca43700 10 cephx client: build_authorizer for
>>>>>>>> service mgr
>>>>>>>> 2018-06-22 10:47:27.667043 7f70eca43700 10 In get_auth_session_handler for
>>>>>>>> protocol 2
>>>>>>>> 2018-06-22 10:47:27.678417 7f70eda45700 10 cephx client: build_authorizer for
>>>>>>>> service osd
>>>>>>>> 2018-06-22 10:47:27.678914 7f70eda45700 10 In get_auth_session_handler for
>>>>>>>> protocol 2
>>>>>>>> 2018-06-22 10:47:27.679003 7f70eda45700 10 _calc_signature seq 1 front_crc_ =
>>>>>>>> 2696387361 middle_crc = 0 data_crc = 0 sig = 929021353460216573
>>>>>>>> 2018-06-22 10:47:27.679026 7f70eda45700 20 Putting signature in client
>>>>>>>> message(seq # 1): sig = 929021353460216573
>>>>>>>> 2018-06-22 10:47:27.679520 7f70eda45700 10 _calc_signature seq 1 front_crc_ =
>>>>>>>> 1943489909 middle_crc = 0 data_crc = 0 sig = 10026640535487722288
>>>>>>>> Error EPERM: problem getting command descriptions from pg.18.2
>>>>>>>> 2018-06-22 10:47:27.681798 7f70ef9e6700 10 monclient: shutdown
>>>>>>>>
>>>>>>>>
>>>>>>>> -----------------
>>>>>>>>
>>>>>>>>
>>>>>>>> From what I can see the auth works:
>>>>>>>>
>>>>>>>> 2018-06-22 10:47:27.665678 7f70ef9e6700  5 monclient: authenticate success,
>>>>>>>> global_id 411322261
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>> From: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
>>>>>>>>> To: "Andrei" <andrei@xxxxxxxxxx>
>>>>>>>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>>>>>>>>> Sent: Friday, 22 June, 2018 02:05:51
>>>>>>>>> Subject: Re:  fixing unrepairable inconsistent PG
>>>>>>>>
>>>>>>>>> That seems like an authentication issue?
>>>>>>>>>
>>>>>>>>> Try running it like so...
>>>>>>>>>
>>>>>>>>> $ ceph --debug_monc 20 --debug_auth 20 pg 18.2 query
>>>>>>>>>
>>>>>>>>> On Thu, Jun 21, 2018 at 12:18 AM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
>>>>>>>>>> Hi Brad,
>>>>>>>>>>
>>>>>>>>>> Yes, but it doesn't show much:
>>>>>>>>>>
>>>>>>>>>> ceph pg 18.2 query
>>>>>>>>>> Error EPERM: problem getting command descriptions from pg.18.2
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>> From: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
>>>>>>>>>>> To: "andrei" <andrei@xxxxxxxxxx>
>>>>>>>>>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>>>>>>>>>>> Sent: Wednesday, 20 June, 2018 00:02:07
>>>>>>>>>>> Subject: Re:  fixing unrepairable inconsistent PG
>>>>>>>>>>
>>>>>>>>>>> Can you post the output of a pg query?
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 19, 2018 at 11:44 PM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
>>>>>>>>>>>> A quick update on my issue. I have noticed that while I was trying to move
>>>>>>>>>>>> the problem object on osds, the file attributes got lost on one of the osds,
>>>>>>>>>>>> which is I guess why the error messages showed the no attribute bit.
>>>>>>>>>>>>
>>>>>>>>>>>> I then copied the attributes metadata to the problematic object and
>>>>>>>>>>>> restarted the osds in question. Following a pg repair I got a different
>>>>>>>>>>>> error:
>>>>>>>>>>>>
>>>>>>>>>>>> 2018-06-19 13:51:05.846033 osd.21 osd.21 192.168.168.203:6828/24339 2 :
>>>>>>>>>>>> cluster [ERR] 18.2 shard 21: soid 18:45f87722:::.dir.default.80018061.2:head
>>>>>>>>>>>> omap_digest 0x25e8a1da != omap_digest 0x21c7f871 from auth oi
>>>>>>>>>>>> 18:45f87722:::.dir.default.80018061.2:head(106137'603495 osd.21.0:41403910
>>>>>>>>>>>> dirty|omap|data_digest|omap_digest s 0 uv 603494 dd ffffffff od 21c7f871
>>>>>>>>>>>> alloc_hint [0 0 0])
>>>>>>>>>>>> 2018-06-19 13:51:05.846042 osd.21 osd.21 192.168.168.203:6828/24339 3 :
>>>>>>>>>>>> cluster [ERR] 18.2 shard 28: soid 18:45f87722:::.dir.default.80018061.2:head
>>>>>>>>>>>> omap_digest 0x25e8a1da != omap_digest 0x21c7f871 from auth oi
>>>>>>>>>>>> 18:45f87722:::.dir.default.80018061.2:head(106137'603495 osd.21.0:41403910
>>>>>>>>>>>> dirty|omap|data_digest|omap_digest s 0 uv 603494 dd ffffffff od 21c7f871
>>>>>>>>>>>> alloc_hint [0 0 0])
>>>>>>>>>>>> 2018-06-19 13:51:05.846046 osd.21 osd.21 192.168.168.203:6828/24339 4 :
>>>>>>>>>>>> cluster [ERR] 18.2 soid 18:45f87722:::.dir.default.80018061.2:head: failed
>>>>>>>>>>>> to pick suitable auth object
>>>>>>>>>>>> 2018-06-19 13:51:05.846118 osd.21 osd.21 192.168.168.203:6828/24339 5 :
>>>>>>>>>>>> cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no '_'
>>>>>>>>>>>> attr
>>>>>>>>>>>> 2018-06-19 13:51:05.846129 osd.21 osd.21 192.168.168.203:6828/24339 6 :
>>>>>>>>>>>> cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no
>>>>>>>>>>>> 'snapset' attr
>>>>>>>>>>>> 2018-06-19 13:51:09.810878 osd.21 osd.21 192.168.168.203:6828/24339 7 :
>>>>>>>>>>>> cluster [ERR] 18.2 repair 4 errors, 0 fixed
>>>>>>>>>>>>
>>>>>>>>>>>> It mentions that there is an incorrect omap_digest . How do I go about
>>>>>>>>>>>> fixing this?
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers
>>>>>>>>>>>>
>>>>>>>>>>>> ________________________________
>>>>>>>>>>>>
>>>>>>>>>>>> From: "andrei" <andrei@xxxxxxxxxx>
>>>>>>>>>>>> To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>>>>>>>>>>>> Sent: Tuesday, 19 June, 2018 11:16:22
>>>>>>>>>>>> Subject:  fixing unrepairable inconsistent PG
>>>>>>>>>>>>
>>>>>>>>>>>> Hello everyone
>>>>>>>>>>>>
>>>>>>>>>>>> I am having trouble repairing one inconsistent and stubborn PG. I get the
>>>>>>>>>>>> following error in ceph.log:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2018-06-19 11:00:00.000225 mon.arh-ibstorage1-ib mon.0
>>>>>>>>>>>> 192.168.168.201:6789/0 675 : cluster [ERR] overall HEALTH_ERR noout flag(s)
>>>>>>>>>>>> set; 4 scrub errors; Possible data damage: 1 pg inconsistent; application
>>>>>>>>>>>> not enabled on 4 pool(s)
>>>>>>>>>>>> 2018-06-19 11:09:24.586392 mon.arh-ibstorage1-ib mon.0
>>>>>>>>>>>> 192.168.168.201:6789/0 841 : cluster [ERR] Health check update: Possible
>>>>>>>>>>>> data damage: 1 pg inconsistent, 1 pg repair (PG_DAMAGED)
>>>>>>>>>>>> 2018-06-19 11:09:27.139504 osd.21 osd.21 192.168.168.203:6828/4003 2 :
>>>>>>>>>>>> cluster [ERR] 18.2 soid 18:45f87722:::.dir.default.80018061.2:head: failed
>>>>>>>>>>>> to pick suitable object info
>>>>>>>>>>>> 2018-06-19 11:09:27.139545 osd.21 osd.21 192.168.168.203:6828/4003 3 :
>>>>>>>>>>>> cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no '_'
>>>>>>>>>>>> attr
>>>>>>>>>>>> 2018-06-19 11:09:27.139550 osd.21 osd.21 192.168.168.203:6828/4003 4 :
>>>>>>>>>>>> cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no
>>>>>>>>>>>> 'snapset' attr
>>>>>>>>>>>>
>>>>>>>>>>>> 2018-06-19 11:09:35.484402 osd.21 osd.21 192.168.168.203:6828/4003 5 :
>>>>>>>>>>>> cluster [ERR] 18.2 repair 4 errors, 0 fixed
>>>>>>>>>>>> 2018-06-19 11:09:40.601657 mon.arh-ibstorage1-ib mon.0
>>>>>>>>>>>> 192.168.168.201:6789/0 844 : cluster [ERR] Health check update: Possible
>>>>>>>>>>>> data damage: 1 pg inconsistent (PG_DAMAGED)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I have tried to follow a few instructions on the PG repair, including
>>>>>>>>>>>> removal of the 'broken' object .dir.default.80018061.2
>>>>>>>>>>>>  from primary osd following by the pg repair. After that didn't work, I've
>>>>>>>>>>>> done the same for the secondary osd. Still the same issue.
>>>>>>>>>>>>
>>>>>>>>>>>> Looking at the actual object on the file system, the file size is 0 for both
>>>>>>>>>>>> primary and secondary objects. The md5sum is the same too. The broken PG
>>>>>>>>>>>> belongs to the radosgw bucket called .rgw.buckets.index
>>>>>>>>>>>>
>>>>>>>>>>>> What else can I try to get the thing fixed?
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Brad
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Cheers,
>>>>>>>>> Brad
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Cheers,
>>>>>>> Brad
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Cheers,
>>>>> Brad
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Cheers,
>>> Brad
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux