Hummer upgrade stuck all OSDs down

Siniša Denić <sinisa.denic@xxxxxxxxxxx> · Wed, 12 Apr 2017 10:47:52 +0200 (CEST)

Hi to all, my cluster stuck after upgrade from hammer 0.94.5 to luminous.
Iit seems somehow osds stuck at hammer version despite

$ceph-osd --version
ceph version 12.0.1 (5456408827a1a31690514342624a4ff9b66be1d5)

All OSDs are down in preboot state, on every osd log it says  "osdmap SORTBITWISE OSDMap flag is NOT set; please set it"
When I try  osd set sortbitwise I get "Error EPERM: not all up OSDs have OSD_BITWISE_HOBJ_SORT feature"

 $ceph osd dump -f json-pretty | grep features
            "features": 37154696925806591,
            "features": 37154696925806591,
            "features": 37154696925806591,
            "features": 37154696925806591,
            "features": 37154696925806591,
            "features": 37154696925806591,
            "features": 37154696925806591,
            "features": 0,
            "features": 37154696925806591,
            "features": 0,

$ceph osd metadata|egrep "id|ceph_version"
        "id": 0,
        "ceph_version": "ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)",
        "id": 1,
        "ceph_version": "ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)",
        "id": 2,
        "ceph_version": "ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)",
        "id": 3,
        "ceph_version": "ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)",
        "id": 4,
        "ceph_version": "ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)",
        "id": 5,
        "ceph_version": "ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)",
        "id": 6,
        "ceph_version": "ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)",
        "id": 7
        "id": 8,
        "ceph_version": "ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)",
        "id": 9

On one of stuck OSD I gues this superblock is upgrade due to features 14, 15 so I can't downgrade to hammer
$ ceph-objectstore-tool --data-path=/var/lib/ceph/osd/ceph-1 --op dump-super
{
    "cluster_fsid": "630c11ff-ff8d-4bf0-8217-860eb684e78c",
    "osd_fsid": "69a974e1-fdfa-434e-8279-8411196a127f",
    "whoami": 1,
    "current_epoch": 9126,
    "oldest_map": 8591,
    "newest_map": 9126,
    "weight": 0.000000,
    "compat": {
        "compat": {},
        "ro_compat": {},
        "incompat": {
            "feature_1": "initial feature set(~v.18)",
            "feature_2": "pginfo object",
            "feature_3": "object locator",
            "feature_4": "last_epoch_clean",
            "feature_5": "categories",
            "feature_6": "hobjectpool",
            "feature_7": "biginfo",
            "feature_8": "leveldbinfo",
            "feature_9": "leveldblog",
            "feature_10": "snapmapper",
            "feature_12": "transaction hints",
            "feature_13": "pg meta object",
            "feature_14": "explicit missing set",
            "feature_15": "fastinfo pg attr"
        }
    },
    "clean_thru": 9126,
    "last_epoch_mounted": 0
}

root@ceph-node03:~# export ms="/home/ceph/monstore";for osd in /var/lib/ceph/osd/ceph-*; do ceph-objectstore-tool --data-path $osd --op update-mon-db --mon-store-path "$ms";done
mismatched full crc: 3120238035 != 1569055237
mismatched full crc: 3120238035 != 1569055237
mismatched full crc: 3120238035 != 1569055237

root@ceph-node02:~# export ms="/home/ceph/monstore"; for osd in /var/lib/ceph/osd/ceph-*; do ceph-objectstore-tool --data-path $osd --op update-mon-db --mon-store-path "$ms";done
mismatched full crc: 2310723283 != 1012422761
mismatched full crc: 2310723283 != 1012422761
missing #-1:edbd1965:::inc_osdmap.8591:0#

Some more info:

root@ceph-node01:/var/lib/ceph# ceph -s
    cluster 630c11ff-ff8d-4bf0-8217-860eb684e78c
     health HEALTH_ERR
            384 pgs are stuck inactive for more than 300 seconds
            236 pgs degraded
            384 pgs stale
            236 pgs stuck degraded
            384 pgs stuck stale
            277 pgs stuck unclean
            236 pgs stuck undersized
            236 pgs undersized
            recovery 761841/2108609 objects degraded (36.130%)
            recovery 17768/2108609 objects misplaced (0.843%)
     monmap e9: 1 mons at {ceph-node01=192.168.137.68:6789/0}
            election epoch 12729, quorum 0 ceph-node01
        mgr active: ceph-node01
     osdmap e9103: 10 osds: 0 up, 0 in
      pgmap v13292951: 384 pgs, 3 pools, 722 GB data, 1008 kobjects
            0 kB used, 0 kB / 0 kB avail
            761841/2108609 objects degraded (36.130%)
            17768/2108609 objects misplaced (0.843%)
                 236 stale+active+undersized+degraded
                 107 stale+active+clean
                  41 stale+active+remapped

root@ceph-node03:~# ceph daemon osd.5 status
{
    "cluster_fsid": "630c11ff-ff8d-4bf0-8217-860eb684e78c",
    "osd_fsid": "7fb02cdb-8e15-4d41-8c0f-5a59301933ec",
    "whoami": 5,
    "state": "preboot",
    "oldest_map": 8701,
    "newest_map": 9126,
    "num_pgs": 128
}

Can I somehow overcome this situation and what could happened during the upgrade?
I performed upgrade from hammer by ceph-deploy install --release luminous

Thank you, best regards.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com