I believe I see what I was doing wrong. I had to run "ceph-osd set require_jewel_osds --yes-i-really-mean-it" This is the error I am getting now. 2017-11-30 01:11:19.691 7fc171dbd5c0 -1 unrecognized arg set src/tcmalloc.cc:284] Attempt to free invalid pointer 0x56262bacf4c0 *** Caught signal (Aborted) ** in thread 7fc171dbd5c0 thread_name:ceph-osd ceph version 13.0.0-3574-gb1378b343a (b1378b343add5134ab881b38a93f47f3f9cb40bb) mimic (dev) 1: (()+0xa6be0e) [0x56262159be0e] 2: (()+0x13a40) [0x7fc16f5aca40] 3: (gsignal()+0x145) [0x7fc16e8ede95] 4: (abort()+0x17a) [0x7fc16e8efb9a] 5: (tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem)+0x234) [0x7fc170cf3084] 6: (()+0x1784b) [0x7fc170ce784b] 7: (rocksdb::LRUCache::~LRUCache()+0x65) [0x562621909795] 8: (std::_Sp_counted_ptr<rocksdb::BlockBasedTableFactory*, (__gnu_cxx::_Lock_policy)2>::_M_dispose()+0x1ca) [0x5626219fdb7a] 9: (rocksdb::ColumnFamilyOptions::~ColumnFamilyOptions()+0x385) [0x5626214d6685] 10: (()+0x382f0) [0x7fc16e8f12f0] 11: (()+0x3835a) [0x7fc16e8f135a] 12: (()+0xbad9c8) [0x5626216dd9c8] 13: (main()+0x3b5) [0x562620ece9f5] 14: (__libc_start_main()+0xf0) [0x7fc16e8d94f0] 15: (_start()+0x2a) [0x562620faa3fa] 2017-11-30 01:11:19.694 7fc171dbd5c0 -1 *** Caught signal (Aborted) ** in thread 7fc171dbd5c0 thread_name:ceph-osd ceph version 13.0.0-3574-gb1378b343a (b1378b343add5134ab881b38a93f47f3f9cb40bb) mimic (dev) 1: (()+0xa6be0e) [0x56262159be0e] 2: (()+0x13a40) [0x7fc16f5aca40] 3: (gsignal()+0x145) [0x7fc16e8ede95] 4: (abort()+0x17a) [0x7fc16e8efb9a] 5: (tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem)+0x234) [0x7fc170cf3084] 6: (()+0x1784b) [0x7fc170ce784b] 7: (rocksdb::LRUCache::~LRUCache()+0x65) [0x562621909795] 8: (std::_Sp_counted_ptr<rocksdb::BlockBasedTableFactory*, (__gnu_cxx::_Lock_policy)2>::_M_dispose()+0x1ca) [0x5626219fdb7a] 9: (rocksdb::ColumnFamilyOptions::~ColumnFamilyOptions()+0x385) [0x5626214d6685] 10: (()+0x382f0) [0x7fc16e8f12f0] 11: (()+0x3835a) [0x7fc16e8f135a] 12: (()+0xbad9c8) [0x5626216dd9c8] 13: (main()+0x3b5) [0x562620ece9f5] 14: (__libc_start_main()+0xf0) [0x7fc16e8d94f0] 15: (_start()+0x2a) [0x562620faa3fa] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Cary -Dynamic On Thu, Nov 30, 2017 at 12:50 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Thu, 30 Nov 2017, Cary wrote: >> Hello, >> >> I have emerged a 9999 build of Luminous 2.2.1 on one of my monitor > > The latest luminous mon will allow you to do the > > ceph osd set require_jewel_osds --yes-i-really-mean-it > > command without starting old osds. Once the flag is set the luminous osds > will start normally.. > > s > > >> nodes. I made sure only one Jewel OSD was being started. The log for >> the OSD: >> 017-11-30 00:30:27.786793 7f9200a598c0 1 >> filestore(/var/lib/ceph/osd/ceph-1) upgrade >> 2017-11-30 00:30:27.786821 7f9200a598c0 2 osd.1 0 boot >> 2017-11-30 00:30:27.787101 7f9200a598c0 -1 osd.1 0 The disk uses >> features unsupported by the executable. >> 2017-11-30 00:30:27.787110 7f9200a598c0 -1 osd.1 0 ondisk features >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo >> object,3=object >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction >> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg >> attr,16=deletes in missing set} >> 2017-11-30 00:30:27.787120 7f9200a598c0 -1 osd.1 0 daemon features >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo >> object,3=object >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded >> objects,12=transaction hints,13=pg meta object} >> 2017-11-30 00:30:27.787129 7f9200a598c0 -1 osd.1 0 Cannot write to >> disk! Missing features: compat={},rocompat={},incompat={14=explicit >> missing set,15=fastinfo pg attr,16=deletes in missing set} >> 2017-11-30 00:30:27.787355 7f9200a598c0 1 journal close >> /dev/disk/by-partlabel/ceph-1 >> 2017-11-30 00:30:27.795077 7f9200a598c0 -1 ** ERROR: osd init failed: >> (95) Operation not supported >> >> The OSD is not starting because of missing features. So the next >> command still fails. >> >> "ceph osd set require_jewel_osds --yes-i-really-really-mean-it" >> returns the error >> >> Invalid command: unused arguments: [u'--yes-i-really-really-mean-it'] >> >> I guess ceph-dencoder may be needed tp change disk features. Does >> anyone know what may need done here? Thank you, >> >> >> Cary >> -Dynamic >> >> On Tue, Nov 28, 2017 at 6:45 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> > On Tue, 28 Nov 2017, Cary wrote: >> >> Hello, >> >> >> >> I am getting an error when I run "ceph osd set require_jewel_osds >> >> --yes-i-really-mean-it". >> >> >> >> Error ENOENT: unknown feature '--yes-i-really-mean-it' >> > >> > I just tested on the latest luminous branch and this works. Did you >> > upgrade the mons to the latest luminous build and restart them? >> > (ceph-deploy install --dev luminous HOST, then restart mon daemon(s)). >> > >> > sage >> > >> > >> > > >> >> So I ran, "ceph osd set require_jewel_osds", and got this error: >> >> >> >> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature >> >> >> >> I verified all OSDs were stopped with "/etc/init.d/ceph-osd.N stop". >> >> Then verified each was down with "ceph osd down N". When setting them >> >> down, each replied "osd.N is already down". I started one of the OSDs >> >> on a host that was downgraded to 10.2.3-r2 I then attempted to set >> >> "ceph osd set require_jewel_osds", and get the same error. >> >> >> >> >> >> The log for the OSD is showing this error: >> >> >> >> 2017-11-28 17:40:08.928446 7f47b082f940 1 >> >> filestore(/var/lib/ceph/osd/ceph-1) upgrade >> >> 2017-11-28 17:40:08.928475 7f47b082f940 2 osd.1 0 boot >> >> 2017-11-28 17:40:08.928788 7f47b082f940 -1 osd.1 0 The disk uses >> >> features unsupported by the executable. >> >> 2017-11-28 17:40:08.928810 7f47b082f940 -1 osd.1 0 ondisk features >> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo >> >> object,3=object >> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction >> >> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg >> >> attr,16=deletes in missing set} >> >> 2017-11-28 17:40:08.928818 7f47b082f940 -1 osd.1 0 daemon features >> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo >> >> object,3=object >> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded >> >> objects,12=transaction hints,13=pg meta object} >> >> 2017-11-28 17:40:08.928827 7f47b082f940 -1 osd.1 0 Cannot write to >> >> disk! Missing features: compat={},rocompat={},incompat={14=explicit >> >> missing set,15=fastinfo pg attr,16=deletes in missing set} >> >> 2017-11-28 17:40:08.929353 7f47b082f940 1 journal close >> >> /dev/disk/by-partlabel/ceph-1 >> >> 2017-11-28 17:40:08.930488 7f47b082f940 -1 ** ERROR: osd init failed: >> >> (95) Operation not supported >> >> >> >> So the OSD is not starting because of missing features. It does not >> >> show up in "ceph features" output. >> >> >> >> Ceph features output: >> >> ceph features >> >> 2017-11-28 17:51:31.213636 7f6a2140a700 -1 WARNING: the following >> >> dangerous and experimental features are enabled: btrfs >> >> 2017-11-28 17:51:31.223068 7f6a2140a700 -1 WARNING: the following >> >> dangerous and experimental features are enabled: btrfs >> >> >> >> "mon": { >> >> "group": { >> >> "features": "0x1ffddff8eea4fffb", >> >> "release": "luminous", >> >> "num": 4 >> >> } >> >> }, >> >> "mds": { >> >> "group": { >> >> "features": "0x7fddff8ee84bffb", >> >> "release": "jewel", >> >> "num": 1 >> >> } >> >> }, >> >> "client": { >> >> "group": { >> >> "features": "0x1ffddff8eea4fffb", >> >> "release": "luminous", >> >> "num": 4 >> >> >> >> I attempted to set require_jewel_osds with the MGRs stopped, and had >> >> the same results. >> >> >> >> Output from ceph tell osd.1 version. I get the same error from all OSDs. >> >> >> >> # ceph tell osd.1 versions >> >> Error ENXIO: problem getting command descriptions from osd.1 >> >> >> >> Any thoughts? >> >> >> >> Cary >> >> -Dynamic >> >> >> >> On Tue, Nov 28, 2017 at 1:09 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> >> > On Tue, 28 Nov 2017, Cary wrote: >> >> >> I get this error when I try to start the OSD that has been downgraded >> >> >> to 10.2.3-r2. >> >> >> >> >> >> 2017-11-28 03:42:35.989754 7fa5e6429940 1 >> >> >> filestore(/var/lib/ceph/osd/ceph-3) upgrade >> >> >> 2017-11-28 03:42:35.989788 7fa5e6429940 2 osd.3 0 boot >> >> >> 2017-11-28 03:42:35.990132 7fa5e6429940 -1 osd.3 0 The disk uses >> >> >> features unsupported by the executable. >> >> >> 2017-11-28 03:42:35.990142 7fa5e6429940 -1 osd.3 0 ondisk features >> >> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo >> >> >> object,3=object >> >> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction >> >> >> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg >> >> >> attr,16=deletes in missing set} >> >> >> 2017-11-28 03:42:35.990150 7fa5e6429940 -1 osd.3 0 daemon features >> >> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo >> >> >> object,3=object >> >> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded >> >> >> objects,12=transaction hints,13=pg meta object} >> >> >> 2017-11-28 03:42:35.990160 7fa5e6429940 -1 osd.3 0 Cannot write to >> >> >> disk! Missing features: compat={},rocompat={},incompat={14=explicit >> >> >> missing set,15=fastinfo pg attr,16=deletes in missing set} >> >> >> 2017-11-28 03:42:35.990775 7fa5e6429940 1 journal close >> >> >> /dev/disk/by-partlabel/ceph-3 >> >> >> 2017-11-28 03:42:35.992960 7fa5e6429940 -1 ** ERROR: osd init failed: >> >> >> (95) Operation not supported >> >> > >> >> > Oh, right. In that case, install the 'luminous' branch[1] on the monitors >> >> > (or just the primary monitor if you're being conservative), restrart it, >> >> > and you'll be able to do >> >> > >> >> > ceph osd set require_jewel_osds --yes-i-really-mean-it >> >> > >> >> > sage >> >> > >> >> > >> >> > [1] ceph-deploy install --dev luminous HOST >> >> > >> >> > >> >> > >> >> > >> >> >> Cary >> >> >> >> >> >> On Tue, Nov 28, 2017 at 3:09 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> >> >> > On Tue, 28 Nov 2017, Cary wrote: >> >> >> >> Hello, >> >> >> >> >> >> >> >> Could someone please help me complete my botched upgrade from Jewel >> >> >> >> 10.2.3-r1 to Luminous 12.2.1. I have 9 Gentoo servers, 4 of which have >> >> >> >> 2 OSDs each. >> >> >> >> >> >> >> >> My OSD servers were accidentally rebooted before the monitor servers >> >> >> >> causing them to be running Luminous before the monitors. All services >> >> >> >> have been restarted and running ceph versions gives the following: >> >> >> >> >> >> >> >> # ceph versions >> >> >> >> 2017-11-27 21:27:24.356940 7fed67efe700 -1 WARNING: the following >> >> >> >> dangerous and experimental features are enabled: btrfs >> >> >> >> 2017-11-27 21:27:24.368469 7fed67efe700 -1 WARNING: the following >> >> >> >> dangerous and experimental features are enabled: btrfs >> >> >> >> >> >> >> >> "mon": { >> >> >> >> "ceph version 12.2.1 >> >> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 4 >> >> >> >> }, >> >> >> >> "mgr": { >> >> >> >> "ceph version 12.2.1 >> >> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3 >> >> >> >> }, >> >> >> >> "osd": {}, >> >> >> >> "mds": { >> >> >> >> "ceph version 12.2.1 >> >> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 1 >> >> >> >> }, >> >> >> >> "overall": { >> >> >> >> "ceph version 12.2.1 >> >> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 8 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> For some reason the OSDs do not show what version they are running, >> >> >> >> and a ceph osd tree shows all of the OSD as being down. >> >> >> >> >> >> >> >> # ceph osd tree >> >> >> >> 2017-11-27 21:32:51.969335 7f483d9c2700 -1 WARNING: the following >> >> >> >> dangerous and experimental features are enabled: btrfs >> >> >> >> 2017-11-27 21:32:51.980976 7f483d9c2700 -1 WARNING: the following >> >> >> >> dangerous and experimental features are enabled: btrfs >> >> >> >> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF >> >> >> >> -1 27.77998 root default >> >> >> >> -3 27.77998 datacenter DC1 >> >> >> >> -6 27.77998 rack 1B06 >> >> >> >> -5 6.48000 host ceph3 >> >> >> >> 1 1.84000 osd.1 down 0 1.00000 >> >> >> >> 3 4.64000 osd.3 down 0 1.00000 >> >> >> >> -2 5.53999 host ceph4 >> >> >> >> 5 4.64000 osd.5 down 0 1.00000 >> >> >> >> 8 0.89999 osd.8 down 0 1.00000 >> >> >> >> -4 9.28000 host ceph6 >> >> >> >> 0 4.64000 osd.0 down 0 1.00000 >> >> >> >> 2 4.64000 osd.2 down 0 1.00000 >> >> >> >> -7 6.48000 host ceph7 >> >> >> >> 6 4.64000 osd.6 down 0 1.00000 >> >> >> >> 7 1.84000 osd.7 down 0 1.00000 >> >> >> >> >> >> >> >> The OSD logs all have this message: >> >> >> >> >> >> >> >> 20235 osdmap REQUIRE_JEWEL OSDMap flag is NOT set; please set it. >> >> >> > >> >> >> > THis is an annoying corner condition. 12.2.2 (out soon!) will have a >> >> >> > --force option to set the flag even tho no osds are up. Until then, the >> >> >> > workaround is to downgrade one host to jewel, start one jewel osd, then >> >> >> > set the flag. Then upgrade to luminous again and restart all osds. >> >> >> > >> >> >> > sage >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> When I try to set it with "ceph osd set require_jewel_osds" I get this error: >> >> >> >> >> >> >> >> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> A "ceph features" returns: >> >> >> >> >> >> >> >> "mon": { >> >> >> >> "group": { >> >> >> >> "features": "0x1ffddff8eea4fffb", >> >> >> >> "release": "luminous", >> >> >> >> "num": 4 >> >> >> >> } >> >> >> >> }, >> >> >> >> "mds": { >> >> >> >> "group": { >> >> >> >> "features": "0x1ffddff8eea4fffb", >> >> >> >> "release": "luminous", >> >> >> >> "num": 1 >> >> >> >> } >> >> >> >> }, >> >> >> >> "osd": { >> >> >> >> "group": { >> >> >> >> "features": "0x1ffddff8eea4fffb", >> >> >> >> "release": "luminous", >> >> >> >> "num": 8 >> >> >> >> } >> >> >> >> }, >> >> >> >> "client": { >> >> >> >> "group": { >> >> >> >> "features": "0x1ffddff8eea4fffb", >> >> >> >> "release": "luminous", >> >> >> >> "num": 3 >> >> >> >> >> >> >> >> # ceph tell osd.* versions >> >> >> >> 2017-11-28 02:29:28.565943 7f99c6aee700 -1 WARNING: the following >> >> >> >> dangerous and experimental features are enabled: btrfs >> >> >> >> 2017-11-28 02:29:28.578956 7f99c6aee700 -1 WARNING: the following >> >> >> >> dangerous and experimental features are enabled: btrfs >> >> >> >> Error ENXIO: problem getting command descriptions from osd.0 >> >> >> >> osd.0: problem getting command descriptions from osd.0 >> >> >> >> Error ENXIO: problem getting command descriptions from osd.1 >> >> >> >> osd.1: problem getting command descriptions from osd.1 >> >> >> >> Error ENXIO: problem getting command descriptions from osd.2 >> >> >> >> osd.2: problem getting command descriptions from osd.2 >> >> >> >> Error ENXIO: problem getting command descriptions from osd.3 >> >> >> >> osd.3: problem getting command descriptions from osd.3 >> >> >> >> Error ENXIO: problem getting command descriptions from osd.5 >> >> >> >> osd.5: problem getting command descriptions from osd.5 >> >> >> >> Error ENXIO: problem getting command descriptions from osd.6 >> >> >> >> osd.6: problem getting command descriptions from osd.6 >> >> >> >> Error ENXIO: problem getting command descriptions from osd.7 >> >> >> >> osd.7: problem getting command descriptions from osd.7 >> >> >> >> Error ENXIO: problem getting command descriptions from osd.8 >> >> >> >> osd.8: problem getting command descriptions from osd.8 >> >> >> >> >> >> >> >> # ceph daemon osd.1 status >> >> >> >> >> >> >> >> "cluster_fsid": "CENSORED", >> >> >> >> "osd_fsid": "CENSORED", >> >> >> >> "whoami": 1, >> >> >> >> "state": "preboot", >> >> >> >> "oldest_map": 19482, >> >> >> >> "newest_map": 20235, >> >> >> >> "num_pgs": 141 >> >> >> >> >> >> >> >> # ceph -s >> >> >> >> 2017-11-27 22:04:10.372471 7f89a3935700 -1 WARNING: the following >> >> >> >> dangerous and experimental features are enabled: btrfs >> >> >> >> 2017-11-27 22:04:10.375709 7f89a3935700 -1 WARNING: the following >> >> >> >> dangerous and experimental features are enabled: btrfs >> >> >> >> cluster: >> >> >> >> id: CENSORED >> >> >> >> health: HEALTH_ERR >> >> >> >> 513 pgs are stuck inactive for more than 60 seconds >> >> >> >> 126 pgs backfill_wait >> >> >> >> 52 pgs backfilling >> >> >> >> 435 pgs degraded >> >> >> >> 513 pgs stale >> >> >> >> 435 pgs stuck degraded >> >> >> >> 513 pgs stuck stale >> >> >> >> 435 pgs stuck unclean >> >> >> >> 435 pgs stuck undersized >> >> >> >> 435 pgs undersized >> >> >> >> recovery 854719/3688140 objects degraded (23.175%) >> >> >> >> recovery 838607/3688140 objects misplaced (22.738%) >> >> >> >> mds cluster is degraded >> >> >> >> crush map has straw_calc_version=0 >> >> >> >> >> >> >> >> services: >> >> >> >> mon: 4 daemons, quorum 0,1,3,2 >> >> >> >> mgr: 0(active), standbys: 1, 5 >> >> >> >> mds: cephfs-1/1/1 up {0=a=up:replay}, 1 up:standby >> >> >> >> osd: 8 osds: 0 up, 0 in >> >> >> >> >> >> >> >> data: >> >> >> >> pools: 7 pools, 513 pgs >> >> >> >> objects: 1199k objects, 4510 GB >> >> >> >> usage: 13669 GB used, 15150 GB / 28876 GB avail >> >> >> >> pgs: 854719/3688140 objects degraded (23.175%) >> >> >> >> 838607/3688140 objects misplaced (22.738%) >> >> >> >> 257 stale+active+undersized+degraded >> >> >> >> 126 stale+active+undersized+degraded+remapped+backfill_wait >> >> >> >> 78 stale+active+clean >> >> >> >> 52 stale+active+undersized+degraded+remapped+backfilling >> >> >> >> >> >> >> >> >> >> >> >> I ran "ceph auth list", and client.admin has the following permissions. >> >> >> >> auid: 0 >> >> >> >> caps: [mds] allow >> >> >> >> caps: [mgr] allow * >> >> >> >> caps: [mon] allow * >> >> >> >> caps: [osd] allow * >> >> >> >> >> >> >> >> Thank you for your time. >> >> >> >> >> >> >> >> Is there any way I can get these OSDs to join the cluster now, or >> >> >> >> recover my data? >> >> >> >> >> >> >> >> Cary >> >> >> >> -Dynamic >> >> >> >> -- >> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html