Re: Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap

Cary <dynamic.cary@xxxxxxxxx> · Tue, 28 Nov 2017 03:45:43 +0000

I get this error when I try to start the OSD that has been downgraded
to 10.2.3-r2.

2017-11-28 03:42:35.989754 7fa5e6429940  1
filestore(/var/lib/ceph/osd/ceph-3) upgrade
2017-11-28 03:42:35.989788 7fa5e6429940  2 osd.3 0 boot
2017-11-28 03:42:35.990132 7fa5e6429940 -1 osd.3 0 The disk uses
features unsupported by the executable.
2017-11-28 03:42:35.990142 7fa5e6429940 -1 osd.3 0  ondisk features
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
attr,16=deletes in missing set}
2017-11-28 03:42:35.990150 7fa5e6429940 -1 osd.3 0  daemon features
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
objects,12=transaction hints,13=pg meta object}
2017-11-28 03:42:35.990160 7fa5e6429940 -1 osd.3 0 Cannot write to
disk! Missing features: compat={},rocompat={},incompat={14=explicit
missing set,15=fastinfo pg attr,16=deletes in missing set}
2017-11-28 03:42:35.990775 7fa5e6429940  1 journal close
/dev/disk/by-partlabel/ceph-3
2017-11-28 03:42:35.992960 7fa5e6429940 -1  ** ERROR: osd init failed:
(95) Operation not supported

Cary

On Tue, Nov 28, 2017 at 3:09 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Tue, 28 Nov 2017, Cary wrote:
>> Hello,
>>
>>  Could someone please help me complete my botched upgrade from Jewel
>> 10.2.3-r1 to Luminous 12.2.1. I have 9 Gentoo servers, 4 of which have
>> 2 OSDs each.
>>
>>  My OSD servers were accidentally rebooted before the monitor servers
>> causing them to be running Luminous before the monitors. All services
>> have been restarted and running ceph versions gives the following:
>>
>> # ceph versions
>> 2017-11-27 21:27:24.356940 7fed67efe700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>> 2017-11-27 21:27:24.368469 7fed67efe700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>>
>>     "mon": {
>>         "ceph version 12.2.1
>> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 4
>>     },
>>     "mgr": {
>>         "ceph version 12.2.1
>> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3
>>     },
>>     "osd": {},
>>     "mds": {
>>         "ceph version 12.2.1
>> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 1
>>     },
>>     "overall": {
>>         "ceph version 12.2.1
>> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 8
>>
>>
>>
>> For some reason the OSDs do not show what version they are running,
>> and a ceph osd tree shows all of the OSD as being down.
>>
>>  # ceph osd tree
>> 2017-11-27 21:32:51.969335 7f483d9c2700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>> 2017-11-27 21:32:51.980976 7f483d9c2700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>> ID CLASS WEIGHT   TYPE NAME              STATUS REWEIGHT PRI-AFF
>> -1       27.77998 root default
>> -3       27.77998     datacenter DC1
>> -6       27.77998         rack 1B06
>> -5        6.48000             host ceph3
>>  1        1.84000                 osd.1    down        0 1.00000
>>  3        4.64000                 osd.3    down        0 1.00000
>> -2        5.53999             host ceph4
>>  5        4.64000                 osd.5    down        0 1.00000
>>  8        0.89999                 osd.8    down        0 1.00000
>> -4        9.28000             host ceph6
>>  0        4.64000                 osd.0    down        0 1.00000
>>  2        4.64000                 osd.2    down        0 1.00000
>> -7        6.48000             host ceph7
>>  6        4.64000                 osd.6    down        0 1.00000
>>  7        1.84000                 osd.7    down        0 1.00000
>>
>> The OSD logs all have this message:
>>
>> 20235 osdmap REQUIRE_JEWEL OSDMap flag is NOT set; please set it.
>
> THis is an annoying corner condition.  12.2.2 (out soon!)  will have a
> --force option to set the flag even tho no osds are up.  Until then, the
> workaround is to downgrade one host to jewel, start one jewel osd, then
> set the flag.  Then upgrade to luminous again and restart all osds.
>
> sage
>
>
>>
>> When I try to set it with "ceph osd set require_jewel_osds" I get this error:
>>
>> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature
>>
>>
>>
>> A "ceph features" returns:
>>
>>     "mon": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 4
>>         }
>>     },
>>     "mds": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 1
>>         }
>>     },
>>     "osd": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 8
>>         }
>>     },
>>     "client": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 3
>>
>>  # ceph tell osd.* versions
>> 2017-11-28 02:29:28.565943 7f99c6aee700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>> 2017-11-28 02:29:28.578956 7f99c6aee700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>> Error ENXIO: problem getting command descriptions from osd.0
>> osd.0: problem getting command descriptions from osd.0
>> Error ENXIO: problem getting command descriptions from osd.1
>> osd.1: problem getting command descriptions from osd.1
>> Error ENXIO: problem getting command descriptions from osd.2
>> osd.2: problem getting command descriptions from osd.2
>> Error ENXIO: problem getting command descriptions from osd.3
>> osd.3: problem getting command descriptions from osd.3
>> Error ENXIO: problem getting command descriptions from osd.5
>> osd.5: problem getting command descriptions from osd.5
>> Error ENXIO: problem getting command descriptions from osd.6
>> osd.6: problem getting command descriptions from osd.6
>> Error ENXIO: problem getting command descriptions from osd.7
>> osd.7: problem getting command descriptions from osd.7
>> Error ENXIO: problem getting command descriptions from osd.8
>> osd.8: problem getting command descriptions from osd.8
>>
>>  # ceph daemon osd.1 status
>>
>>     "cluster_fsid": "CENSORED",
>>     "osd_fsid": "CENSORED",
>>     "whoami": 1,
>>     "state": "preboot",
>>     "oldest_map": 19482,
>>     "newest_map": 20235,
>>     "num_pgs": 141
>>
>>  # ceph -s
>> 2017-11-27 22:04:10.372471 7f89a3935700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>> 2017-11-27 22:04:10.375709 7f89a3935700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>>   cluster:
>>     id:     CENSORED
>>     health: HEALTH_ERR
>>             513 pgs are stuck inactive for more than 60 seconds
>>             126 pgs backfill_wait
>>             52 pgs backfilling
>>             435 pgs degraded
>>             513 pgs stale
>>             435 pgs stuck degraded
>>             513 pgs stuck stale
>>             435 pgs stuck unclean
>>             435 pgs stuck undersized
>>             435 pgs undersized
>>             recovery 854719/3688140 objects degraded (23.175%)
>>             recovery 838607/3688140 objects misplaced (22.738%)
>>             mds cluster is degraded
>>             crush map has straw_calc_version=0
>>
>>   services:
>>     mon: 4 daemons, quorum 0,1,3,2
>>     mgr: 0(active), standbys: 1, 5
>>     mds: cephfs-1/1/1 up  {0=a=up:replay}, 1 up:standby
>>     osd: 8 osds: 0 up, 0 in
>>
>>   data:
>>     pools:   7 pools, 513 pgs
>>     objects: 1199k objects, 4510 GB
>>     usage:   13669 GB used, 15150 GB / 28876 GB avail
>>     pgs:     854719/3688140 objects degraded (23.175%)
>>              838607/3688140 objects misplaced (22.738%)
>>              257 stale+active+undersized+degraded
>>              126 stale+active+undersized+degraded+remapped+backfill_wait
>>              78  stale+active+clean
>>              52  stale+active+undersized+degraded+remapped+backfilling
>>
>>
>> I ran "ceph auth list", and client.admin has the following permissions.
>> auid: 0
>> caps: [mds] allow
>> caps: [mgr] allow *
>> caps: [mon] allow *
>> caps: [osd] allow *
>>
>> Thank you for your time.
>>
>> Is there any way I can get these OSDs to join the cluster now, or
>> recover my data?
>>
>> Cary
>> -Dynamic
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html