I ran into a similar problem while in the middle of upgrading from Hammer (0.94.5) to Infernalis (9.2.0). I decided to try rebuilding one of the OSDs by using 'ceph-disk prepare /dev/sdb' and it never comes up:
root@b3:~# ceph daemon osd.10 status
{
"cluster_fsid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"osd_fsid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"whoami": 10,
"state": "booting",
"oldest_map": 25804,
"newest_map": 25904,
"num_pgs": 0
}
Here's what is written to /var/log/ceph/osd/ceph-osd.10.log:
2015-12-18 16:09:48.928462 7fd5e2bec940 0 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 6866
2015-12-18 16:09:48.931387 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs in /var/lib/ceph/tmp/mnt.IOnlxY
2015-12-18 16:09:48.931417 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs fsid is already set to xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
2015-12-18 16:09:48.931422 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) write_version_stamp 4
2015-12-18 16:09:48.932671 7fd5e2bec940 0 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) backend xfs (magic 0x58465342)
2015-12-18 16:09:48.934953 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) leveldb db exists/created
2015-12-18 16:09:48.935082 7fd5e2bec940 1 journal _open /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 11: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-12-18 16:09:48.935218 7fd5e2bec940 -1 journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, invalid (someone else's?) journal
2015-12-18 16:09:48.935227 7fd5e2bec940 1 journal close /var/lib/ceph/tmp/mnt.IOnlxY/journal
2015-12-18 16:09:48.935452 7fd5e2bec940 1 journal _open /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 11: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-12-18 16:09:48.935771 7fd5e2bec940 0 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkjournal created journal on /var/lib/ceph/tmp/mnt.IOnlxY/journal
2015-12-18 16:09:48.935803 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs done in /var/lib/ceph/tmp/mnt.IOnlxY
2015-12-18 16:09:48.935919 7fd5e2bec940 0 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) backend xfs (magic 0x58465342)
2015-12-18 16:09:48.936548 7fd5e2bec940 0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-12-18 16:09:48.936559 7fd5e2bec940 0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2015-12-18 16:09:48.936588 7fd5e2bec940 0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: splice is supported
2015-12-18 16:09:48.938319 7fd5e2bec940 0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-12-18 16:09:48.938394 7fd5e2bec940 0 xfsfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: extsize is supported and your kernel >= 3.5
2015-12-18 16:09:48.940420 7fd5e2bec940 0 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-12-18 16:09:48.940646 7fd5e2bec940 1 journal _open /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 17: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-12-18 16:09:48.940865 7fd5e2bec940 1 journal _open /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 17: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-12-18 16:09:48.941270 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) upgrade
2015-12-18 16:09:48.941389 7fd5e2bec940 -1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
2015-12-18 16:09:48.945392 7fd5e2bec940 1 journal close /var/lib/ceph/tmp/mnt.IOnlxY/journal
2015-12-18 16:09:48.946175 7fd5e2bec940 -1 created object store /var/lib/ceph/tmp/mnt.IOnlxY journal /var/lib/ceph/tmp/mnt.IOnlxY/journal for osd.10 fsid xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
2015-12-18 16:09:48.946269 7fd5e2bec940 -1 auth: error reading file: /var/lib/ceph/tmp/mnt.IOnlxY/keyring: can't open /var/lib/ceph/tmp/mnt.IOnlxY/keyring: (2) No such file or directory
2015-12-18 16:09:48.946623 7fd5e2bec940 -1 created new key in keyring /var/lib/ceph/tmp/mnt.IOnlxY/keyring
2015-12-18 16:09:50.698753 7fb5db130940 0 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 7045
2015-12-18 16:09:50.745427 7fb5db130940 0 filestore(/var/lib/ceph/osd/ceph-10) backend xfs (magic 0x58465342)
2015-12-18 16:09:50.745978 7fb5db130940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-12-18 16:09:50.745987 7fb5db130940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2015-12-18 16:09:50.746012 7fb5db130940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: splice is supported
2015-12-18 16:09:50.746517 7fb5db130940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-12-18 16:09:50.746616 7fb5db130940 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: extsize is supported and your kernel >= 3.5
2015-12-18 16:09:50.748775 7fb5db130940 0 filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-12-18 16:09:50.749005 7fb5db130940 1 journal _open /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-12-18 16:09:50.749256 7fb5db130940 1 journal _open /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-12-18 16:09:50.749632 7fb5db130940 1 filestore(/var/lib/ceph/osd/ceph-10) upgrade
2015-12-18 16:09:50.783188 7fb5db130940 0 <cls> cls/cephfs/cls_cephfs.cc:136: loading cephfs_size_scan
2015-12-18 16:09:50.851735 7fb5db130940 0 <cls> cls/hello/cls_hello.cc:305: loading cls_hello
2015-12-18 16:09:50.851807 7fb5db130940 0 osd.10 0 crush map has features 33816576, adjusting msgr requires for clients
2015-12-18 16:09:50.851818 7fb5db130940 0 osd.10 0 crush map has features 33816576 was 8705, adjusting msgr requires for mons
2015-12-18 16:09:50.851821 7fb5db130940 0 osd.10 0 crush map has features 33816576, adjusting msgr requires for osds
2015-12-18 16:09:50.851965 7fb5db130940 0 osd.10 0 load_pgs
2015-12-18 16:09:50.851988 7fb5db130940 0 osd.10 0 load_pgs opened 0 pgs
2015-12-18 16:09:50.852822 7fb5db130940 -1 osd.10 0 log_to_monitors {default=true}
2015-12-18 16:09:50.870133 7fb5c7f39700 0 osd.10 0 ignoring osdmap until we have initialized
2015-12-18 16:09:50.870409 7fb5db130940 0 osd.10 0 done with init, starting boot process
2015-12-18 16:09:50.873357 7fb5c7f39700 0 osd.10 25804 crush map has features 104186773504, adjusting msgr requires for clients
2015-12-18 16:09:50.873368 7fb5c7f39700 0 osd.10 25804 crush map has features 379064680448 was 33825281, adjusting msgr requires for mons
2015-12-18 16:09:50.873374 7fb5c7f39700 0 osd.10 25804 crush map has features 379064680448, adjusting msgr requires for osds
2015-12-18 16:09:50.873377 7fb5c7f39700 0 osd.10 25804 check_osdmap_features enabling on-disk ERASURE CODES compat feature
2015-12-18 16:09:50.876187 7fb5c7f39700 0 log_channel(cluster) log [WRN] : failed to encode map e25805 with expected crc
2015-12-18 16:09:50.879534 7fb5c7f39700 0 log_channel(cluster) log [WRN] : failed to encode map e25805 with expected crc
2015-12-18 16:09:50.950405 7fb5c7f39700 0 log_channel(cluster) log [WRN] : failed to encode map e25905 with expected crc
2015-12-18 16:09:50.983355 7fb5c7f39700 0 log_channel(cluster) log [WRN] : failed to encode map e25905 with expected crc
I'm running this on Ubuntu 14.04.3 with the linux-image-generic-lts-wily kernel (4.2.0-21.25~14.04.1).
Are you running a mixed cluster right now too? For example this is my cluster right now:
root@b1:~# ceph tell osd.* version | grep version | uniq -c
osd.10: Error ENXIO: problem getting command descriptions from osd.10
osd.10: problem getting command descriptions from osd.10
11 "version": "ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)"
15 "version": "ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)"
Bryan
From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Bob R <bobr@xxxxxxxxxxxxxx>
Date: Wednesday, December 16, 2015 at 11:45 AM To: ceph-users <ceph-users@xxxxxxxxxxxxxx> Subject: OSDs stuck in booting state on CentOS 7.2.1511 and ceph infernalis 9.2.0
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com