Re: Broken upgrade from Hammer to Luminous

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I thought somebody else was going to contact you about this, but in case it didn't happen off-list:

This appears to be an embarrassing issue on our end where we alter the disk state despite not being able to start up all the way, and rely on our users to read release notes carefully. ;) :/

At this point, you're going to need to manually manipulate the OSDs. It will involve identifying exactly what the Luminous daemons did; *hopefully* they have only set new features on disk. If that's so, you can probably use ceph-dencoder on whatever the feature flags file is and pull out everything added after hammer.

But I'm not sure if that's the only thing that happened. You may need to get some consulting from somebody who has experience doing Ceph cluster recovery.
-Greg

On Thu, Nov 16, 2017 at 7:58 PM Gianfilippo <gianfi@xxxxxxxxx> wrote:
Hi all,
I did a pretty bit mistake doing our upgrade from hammer to luminous,
skipping the jewel release.
When I realized and tried to switch back to jewel, it was too late  -
the cluster now won't start, complaining about "The disk uses features
unsupported by the executable.":

2017-11-17 01:27:26.190971 7fb446ab58c0  0 ceph version 0.94.10
(b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid 19638
2017-11-17 01:27:26.209600 7fb446ab58c0  0
filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342)
2017-11-17 01:27:26.277323 7fb446ab58c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
FIEMAP ioctl is supported and appears to work
2017-11-17 01:27:26.277353 7fb446ab58c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2017-11-17 01:27:26.302508 7fb446ab58c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2017-11-17 01:27:26.302668 7fb446ab58c0  0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: extsize is
disabled by conf
2017-11-17 01:27:26.325121 7fb446ab58c0  0
filestore(/var/lib/ceph/osd/ceph-2) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
2017-11-17 01:27:26.343360 7fb446ab58c0  1 journal _open
/var/lib/ceph/osd/ceph-2/journal fd 20: 5368709120 bytes, block size
4096 bytes, directio = 1, aio = 1
2017-11-17 01:27:26.393876 7fb446ab58c0  1 journal _open
/var/lib/ceph/osd/ceph-2/journal fd 20: 5368709120 bytes, block size
4096 bytes, directio = 1, aio = 1
2017-11-17 01:27:26.394746 7fb446ab58c0 -1 osd.2 0 The disk uses
features unsupported by the executable.
2017-11-17 01:27:26.394758 7fb446ab58c0 -1 osd.2 0  ondisk features
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
attr,16=deletes in missing set}
2017-11-17 01:27:26.394780 7fb446ab58c0 -1 osd.2 0  daemon features
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
objects,12=transaction hints,13=pg meta object}
2017-11-17 01:27:26.394794 7fb446ab58c0 -1 osd.2 0 Cannot write to disk!
Missing features: compat={},rocompat={},incompat={14=explicit missing
set,15=fastinfo pg attr,16=deletes in missing set}
2017-11-17 01:27:26.419854 7fb446ab58c0  1 journal close
/var/lib/ceph/osd/ceph-2/journal
2017-11-17 01:27:26.422687 7fb446ab58c0 -1 ESC[0;31m ** ERROR: osd init
failed: (95) Operation not supportedESC[0m
2017-11-17 01:27:26.863514 7fcc5f1428c0  0 ceph version 0.94.10
(b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid 19731
2017-11-17 01:27:26.878617 7fcc5f1428c0  0
filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342)
2017-11-17 01:27:26.880689 7fcc5f1428c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
FIEMAP ioctl is supported and appears to work
2017-11-17 01:27:26.880703 7fcc5f1428c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2017-11-17 01:27:26.898681 7fcc5f1428c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2017-11-17 01:27:26.898829 7fcc5f1428c0  0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: extsize is
disabled by conf
2017-11-17 01:27:26.906300 7fcc5f1428c0  0
filestore(/var/lib/ceph/osd/ceph-2) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
2017-11-17 01:27:26.917013 7fcc5f1428c0  1 journal _open
/var/lib/ceph/osd/ceph-2/journal fd 20: 5368709120 bytes, block size
4096 bytes, directio = 1, aio = 1
2017-11-17 01:27:26.925628 7fcc5f1428c0  1 journal _open
/var/lib/ceph/osd/ceph-2/journal fd 20: 5368709120 bytes, block size
4096 bytes, directio = 1, aio = 1
2017-11-17 01:27:26.926496 7fcc5f1428c0 -1 osd.2 0 The disk uses
features unsupported by the executable.
2017-11-17 01:27:26.926509 7fcc5f1428c0 -1 osd.2 0  ondisk features
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
attr,16=deletes in missing set}
2017-11-17 01:27:26.926533 7fcc5f1428c0 -1 osd.2 0  daemon features
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
objects,12=transaction hints,13=pg meta object}
2017-11-17 01:27:26.926553 7fcc5f1428c0 -1 osd.2 0 Cannot write to disk!
Missing features: compat={},rocompat={},incompat={14=explicit missing
set,15=fastinfo pg attr,16=deletes in missing set}
2017-11-17 01:27:26.927159 7fcc5f1428c0  1 journal close
/var/lib/ceph/osd/ceph-2/journal
2017-11-17 01:27:26.929073 7fcc5f1428c0 -1 ESC[0;31m ** ERROR: osd init
failed: (95) Operation not supportedESC[0m
2017-11-17 01:27:27.364931 7f16ccdc78c0  0 ceph version 0.94.10
(b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid 19821
2017-11-17 01:27:27.379962 7f16ccdc78c0  0
filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342)
2017-11-17 01:27:27.381509 7f16ccdc78c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
FIEMAP ioctl is supported and appears to work
2017-11-17 01:27:27.381524 7f16ccdc78c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2017-11-17 01:27:27.397192 7f16ccdc78c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2017-11-17 01:27:27.397324 7f16ccdc78c0  0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: extsize is
disabled by conf
2017-11-17 01:27:27.402018 7f16ccdc78c0  0
filestore(/var/lib/ceph/osd/ceph-2) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
2017-11-17 01:27:27.412815 7f16ccdc78c0  1 journal _open
/var/lib/ceph/osd/ceph-2/journal fd 19: 5368709120 bytes, block size
4096 bytes, directio = 1, aio = 1
2017-11-17 01:27:27.421621 7f16ccdc78c0  1 journal _open
/var/lib/ceph/osd/ceph-2/journal fd 19: 5368709120 bytes, block size
4096 bytes, directio = 1, aio = 1
2017-11-17 01:27:27.422471 7f16ccdc78c0 -1 osd.2 0 The disk uses
features unsupported by the executable.
2017-11-17 01:27:27.422482 7f16ccdc78c0 -1 osd.2 0  ondisk features
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
attr,16=deletes in missing set}
2017-11-17 01:27:27.422495 7f16ccdc78c0 -1 osd.2 0  daemon features
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
objects,12=transaction hints,13=pg meta object}
2017-11-17 01:27:27.422515 7f16ccdc78c0 -1 osd.2 0 Cannot write to disk!
Missing features: compat={},rocompat={},incompat={14=explicit missing
set,15=fastinfo pg attr,16=deletes in missing set}
2017-11-17 01:27:27.424247 7f16ccdc78c0  1 journal close
/var/lib/ceph/osd/ceph-2/journal
2017-11-17 01:27:27.426533 7f16ccdc78c0 -1 ESC[0;31m ** ERROR: osd init
failed: (95) Operation not supportedESC[0m




As the cluster won't start on Jewel, i can't give a ceph osd set
sortbitwise to complete a successful upgrade to Luminous. So trying to
start on Luminous i get the following errors from OSD:

2017-11-17 04:57:37.779635 7f04c28ead00  0 set uid:gid to 64045:64045
(ceph:ceph)
2017-11-17 04:57:37.779682 7f04c28ead00  0 ceph version 12.2.1
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process
(unknown), pid 8531
2017-11-17 04:57:37.783538 7f04c28ead00 -1 Public network was set, but
cluster network was not set
2017-11-17 04:57:37.783562 7f04c28ead00 -1     Using public network also
for cluster network
2017-11-17 04:57:37.790066 7f04c28ead00  0 pidfile_write: ignore empty
--pid-file
2017-11-17 04:57:37.800558 7f04c28ead00  0 load: jerasure load: lrc
load: isa
2017-11-17 04:57:37.801160 7f04c28ead00  0
filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342)
2017-11-17 04:57:37.802397 7f04c28ead00  0
filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342)
2017-11-17 04:57:37.802951 7f04c28ead00  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2017-11-17 04:57:37.802965 7f04c28ead00  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2017-11-17 04:57:37.802970 7f04c28ead00  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
splice() is disabled via 'filestore splice' config option
2017-11-17 04:57:37.817358 7f04c28ead00  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2017-11-17 04:57:37.817510 7f04c28ead00  0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: extsize is
disabled by conf
2017-11-17 04:57:37.818348 7f04c28ead00  0
filestore(/var/lib/ceph/osd/ceph-2) start omap initiation
2017-11-17 04:57:37.818791 7f04c28ead00  1 leveldb: Recovering log #275311
2017-11-17 04:57:37.820333 7f04c28ead00  1 leveldb: Delete type=0 #275311

2017-11-17 04:57:37.820397 7f04c28ead00  1 leveldb: Delete type=3 #275310

2017-11-17 04:57:38.555491 7f04c28ead00  0
filestore(/var/lib/ceph/osd/ceph-2) mount(1758): enabling WRITEAHEAD
journal mode: checkpoint is not enabled
2017-11-17 04:57:38.559310 7f04c28ead00  1 journal _open
/var/lib/ceph/osd/ceph-2/journal fd 28: 5368709120 bytes, block size
4096 bytes, directio = 1, aio = 1
2017-11-17 04:57:38.561591 7f04c28ead00  1 journal _open
/var/lib/ceph/osd/ceph-2/journal fd 28: 5368709120 bytes, block size
4096 bytes, directio = 1, aio = 1
2017-11-17 04:57:38.563244 7f04c28ead00  1
filestore(/var/lib/ceph/osd/ceph-2) upgrade(1365)
2017-11-17 04:57:38.564625 7f04c28ead00  0 <cls>
/build/ceph-12.2.1/src/cls/hello/cls_hello.cc:296: loading cls_hello
2017-11-17 04:57:38.564658 7f04c28ead00  0 _get_class not permitted to
load lua
2017-11-17 04:57:38.567611 7f04c28ead00  0 <cls>
/build/ceph-12.2.1/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs
2017-11-17 04:57:38.568554 7f04c28ead00  0 _get_class not permitted to
load sdk
2017-11-17 04:57:38.568774 7f04c28ead00  0 _get_class not permitted to
load kvs
2017-11-17 04:57:38.568793 7f04c28ead00  1 osd.2 0 warning: got an error
loading one or more classes: (1) Operation not permitted
2017-11-17 04:57:38.569061 7f04c28ead00  0 osd.2 1531 crush map has
features 1107558400, adjusting msgr requires for clients
2017-11-17 04:57:38.569077 7f04c28ead00  0 osd.2 1531 crush map has
features 1107558400 was 8705, adjusting msgr requires for mons
2017-11-17 04:57:38.569085 7f04c28ead00  0 osd.2 1531 crush map has
features 1107558400, adjusting msgr requires for osds
2017-11-17 04:57:38.735392 7f04c28ead00  0 osd.2 1531 load_pgs
2017-11-17 04:57:38.740951 7f04c28ead00 -1 *** Caught signal (Aborted) **
  in thread 7f04c28ead00 thread_name:ceph-osd

  ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
luminous (stable)
  1: (()+0xa088b9) [0x5618d594b8b9]
  2: (()+0x10330) [0x7f04c0e0d330]
  3: (gsignal()+0x37) [0x7f04bfe2dc37]
  4: (abort()+0x148) [0x7f04bfe31028]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f04c073c535]
  6: (()+0x5e6d6) [0x7f04c073a6d6]
  7: (()+0x5e703) [0x7f04c073a703]
  8: (()+0x5e922) [0x7f04c073a922]
  9: (object_stat_sum_t::decode(ceph::buffer::list::iterator&)+0x5cc)
[0x5618d563ec1c]
  10:
(object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x54)
[0x5618d5659e14]
  11: (pg_stat_t::decode(ceph::buffer::list::iterator&)+0x1d5)
[0x5618d565a455]
  12: (pg_info_t::decode(ceph::buffer::list::iterator&)+0x12a)
[0x5618d566059a]
  13: (PG::read_info(ObjectStore*, spg_t, coll_t const&,
ceph::buffer::list&, pg_info_t&, PastIntervals&, unsigned char&)+0x231)
[0x5618d54d1231]
  14: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x7b)
[0x5618d54d8c6b]
  15: (OSD::load_pgs()+0x994) [0x5618d5435e74]
  16: (OSD::init()+0x2127) [0x5618d544ddc7]
  17: (main()+0x2ba8) [0x5618d5355798]
  18: (__libc_start_main()+0xf5) [0x7f04bfe18f45]
  19: (()+0x4b0826) [0x5618d53f3826]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.



Thank you for any help, the storage cluster is hosting crucial data.




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux