Bryan,
Once the rest of the cluster was updated to v0.94.5 it now appears the one host running infernalis v9.2.0 OSDs are booting.
Bob
On Fri, Dec 18, 2015 at 3:44 PM, Bob R <bobr@xxxxxxxxxxxxxx> wrote:
Bryan,I rebooted another host which wasn't updated to CentOS 7.2 and those OSDs also failed to come out of booting state. I thought I'd restarted each OSD host after upgrading them to infernalis but I must have been mistaken and after running ceph tell osd.* version I saw we were on a mix of v0.94.1, v0.94.2, v0.94.4, and v0.94.5. I've downgraded the two hosts we were having problems with to hammer v0.94.5 and once the cluster is happy again we will try upgrading again.Good luck.BobOn Fri, Dec 18, 2015 at 3:21 PM, Stillwell, Bryan <bryan.stillwell@xxxxxxxxxxx> wrote:I ran into a similar problem while in the middle of upgrading from Hammer (0.94.5) to Infernalis (9.2.0). I decided to try rebuilding one of the OSDs by using 'ceph-disk prepare /dev/sdb' and it never comes up:
root@b3:~# ceph daemon osd.10 status{"cluster_fsid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx","osd_fsid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx","whoami": 10,"state": "booting","oldest_map": 25804,"newest_map": 25904,"num_pgs": 0}
Here's what is written to /var/log/ceph/osd/ceph-osd.10.log:
2015-12-18 16:09:48.928462 7fd5e2bec940 0 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 68662015-12-18 16:09:48.931387 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs in /var/lib/ceph/tmp/mnt.IOnlxY2015-12-18 16:09:48.931417 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs fsid is already set to xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx2015-12-18 16:09:48.931422 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) write_version_stamp 42015-12-18 16:09:48.932671 7fd5e2bec940 0 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) backend xfs (magic 0x58465342)2015-12-18 16:09:48.934953 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) leveldb db exists/created2015-12-18 16:09:48.935082 7fd5e2bec940 1 journal _open /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 11: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 12015-12-18 16:09:48.935218 7fd5e2bec940 -1 journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, invalid (someone else's?) journal2015-12-18 16:09:48.935227 7fd5e2bec940 1 journal close /var/lib/ceph/tmp/mnt.IOnlxY/journal2015-12-18 16:09:48.935452 7fd5e2bec940 1 journal _open /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 11: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 12015-12-18 16:09:48.935771 7fd5e2bec940 0 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkjournal created journal on /var/lib/ceph/tmp/mnt.IOnlxY/journal2015-12-18 16:09:48.935803 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs done in /var/lib/ceph/tmp/mnt.IOnlxY2015-12-18 16:09:48.935919 7fd5e2bec940 0 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) backend xfs (magic 0x58465342)2015-12-18 16:09:48.936548 7fd5e2bec940 0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option2015-12-18 16:09:48.936559 7fd5e2bec940 0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option2015-12-18 16:09:48.936588 7fd5e2bec940 0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: splice is supported2015-12-18 16:09:48.938319 7fd5e2bec940 0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)2015-12-18 16:09:48.938394 7fd5e2bec940 0 xfsfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: extsize is supported and your kernel >= 3.52015-12-18 16:09:48.940420 7fd5e2bec940 0 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled2015-12-18 16:09:48.940646 7fd5e2bec940 1 journal _open /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 17: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 12015-12-18 16:09:48.940865 7fd5e2bec940 1 journal _open /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 17: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 12015-12-18 16:09:48.941270 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) upgrade2015-12-18 16:09:48.941389 7fd5e2bec940 -1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory2015-12-18 16:09:48.945392 7fd5e2bec940 1 journal close /var/lib/ceph/tmp/mnt.IOnlxY/journal2015-12-18 16:09:48.946175 7fd5e2bec940 -1 created object store /var/lib/ceph/tmp/mnt.IOnlxY journal /var/lib/ceph/tmp/mnt.IOnlxY/journal for osd.10 fsid xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx2015-12-18 16:09:48.946269 7fd5e2bec940 -1 auth: error reading file: /var/lib/ceph/tmp/mnt.IOnlxY/keyring: can't open /var/lib/ceph/tmp/mnt.IOnlxY/keyring: (2) No such file or directory2015-12-18 16:09:48.946623 7fd5e2bec940 -1 created new key in keyring /var/lib/ceph/tmp/mnt.IOnlxY/keyring2015-12-18 16:09:50.698753 7fb5db130940 0 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 70452015-12-18 16:09:50.745427 7fb5db130940 0 filestore(/var/lib/ceph/osd/ceph-10) backend xfs (magic 0x58465342)2015-12-18 16:09:50.745978 7fb5db130940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option2015-12-18 16:09:50.745987 7fb5db130940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option2015-12-18 16:09:50.746012 7fb5db130940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: splice is supported2015-12-18 16:09:50.746517 7fb5db130940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)2015-12-18 16:09:50.746616 7fb5db130940 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: extsize is supported and your kernel >= 3.52015-12-18 16:09:50.748775 7fb5db130940 0 filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled2015-12-18 16:09:50.749005 7fb5db130940 1 journal _open /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 12015-12-18 16:09:50.749256 7fb5db130940 1 journal _open /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 12015-12-18 16:09:50.749632 7fb5db130940 1 filestore(/var/lib/ceph/osd/ceph-10) upgrade2015-12-18 16:09:50.783188 7fb5db130940 0 <cls> cls/cephfs/cls_cephfs.cc:136: loading cephfs_size_scan2015-12-18 16:09:50.851735 7fb5db130940 0 <cls> cls/hello/cls_hello.cc:305: loading cls_hello2015-12-18 16:09:50.851807 7fb5db130940 0 osd.10 0 crush map has features 33816576, adjusting msgr requires for clients2015-12-18 16:09:50.851818 7fb5db130940 0 osd.10 0 crush map has features 33816576 was 8705, adjusting msgr requires for mons2015-12-18 16:09:50.851821 7fb5db130940 0 osd.10 0 crush map has features 33816576, adjusting msgr requires for osds2015-12-18 16:09:50.851965 7fb5db130940 0 osd.10 0 load_pgs2015-12-18 16:09:50.851988 7fb5db130940 0 osd.10 0 load_pgs opened 0 pgs2015-12-18 16:09:50.852822 7fb5db130940 -1 osd.10 0 log_to_monitors {default=true}2015-12-18 16:09:50.870133 7fb5c7f39700 0 osd.10 0 ignoring osdmap until we have initialized2015-12-18 16:09:50.870409 7fb5db130940 0 osd.10 0 done with init, starting boot process2015-12-18 16:09:50.873357 7fb5c7f39700 0 osd.10 25804 crush map has features 104186773504, adjusting msgr requires for clients2015-12-18 16:09:50.873368 7fb5c7f39700 0 osd.10 25804 crush map has features 379064680448 was 33825281, adjusting msgr requires for mons2015-12-18 16:09:50.873374 7fb5c7f39700 0 osd.10 25804 crush map has features 379064680448, adjusting msgr requires for osds2015-12-18 16:09:50.873377 7fb5c7f39700 0 osd.10 25804 check_osdmap_features enabling on-disk ERASURE CODES compat feature2015-12-18 16:09:50.876187 7fb5c7f39700 0 log_channel(cluster) log [WRN] : failed to encode map e25805 with expected crc2015-12-18 16:09:50.879534 7fb5c7f39700 0 log_channel(cluster) log [WRN] : failed to encode map e25805 with expected crc2015-12-18 16:09:50.950405 7fb5c7f39700 0 log_channel(cluster) log [WRN] : failed to encode map e25905 with expected crc2015-12-18 16:09:50.983355 7fb5c7f39700 0 log_channel(cluster) log [WRN] : failed to encode map e25905 with expected crc
I'm running this on Ubuntu 14.04.3 with the linux-image-generic-lts-wily kernel (4.2.0-21.25~14.04.1).
Are you running a mixed cluster right now too? For example this is my cluster right now:
root@b1:~# ceph tell osd.* version | grep version | uniq -cosd.10: Error ENXIO: problem getting command descriptions from osd.10osd.10: problem getting command descriptions from osd.1011 "version": "ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)"15 "version": "ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)"
Bryan
From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Bob R <bobr@xxxxxxxxxxxxxx>
Date: Wednesday, December 16, 2015 at 11:45 AM
To: ceph-users <ceph-users@xxxxxxxxxxxxxx>
Subject: OSDs stuck in booting state on CentOS 7.2.1511 and ceph infernalis 9.2.0
We've been operating a cluster relatively incident free since 0.86. On Monday I did a yum update on one node, ceph00, and after rebooting we're seeing every OSD stuck in 'booting' state. I've tried removing all of the OSDs and recreating them with ceph-deploy (ceph-disk required modification to use partx -a rather than partprobe) but we see the same status. I'm not sure how to troubleshoot this further. Our OSDs on this host are now running as the ceph user which may be related to the issue as the other three hosts are running as root (although I followed the steps listed to upgrade from hammer to infernalis and did chown -R ceph:ceph /var/lib/ceph on each node).
[root@ceph00 ceph]# lsb_release -idrcDistributor ID: CentOSDescription: CentOS Linux release 7.2.1511 (Core)Release: 7.2.1511Codename: Core
[root@ceph00 ceph]# ceph --versionceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
[root@ceph00 ceph]# ceph daemon osd.0 status{"cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e","osd_fsid": "ddf659ad-a3db-4094-b4d0-7d50f34b8f75","whoami": 0,"state": "booting","oldest_map": 25243,"newest_map": 26610,"num_pgs": 0}
[root@ceph00 ceph]# ceph daemon osd.3 status{"cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e","osd_fsid": "8b1acd8a-645d-4dc2-8c1d-6dbb1715265f","whoami": 3,"state": "booting","oldest_map": 25243,"newest_map": 26612,"num_pgs": 0}
[root@ceph00 ceph]# ceph osd treeID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY-23 1.43999 root ssd-19 0 host ceph00_ssd-20 0.48000 host ceph01_ssd40 0.48000 osd.40 up 1.00000 1.00000-21 0.48000 host ceph02_ssd43 0.48000 osd.43 up 1.00000 1.00000-22 0.48000 host ceph03_ssd41 0.48000 osd.41 up 1.00000 1.00000-1 120.00000 root default-17 80.00000 room b1-14 40.00000 host ceph011 4.00000 osd.1 up 1.00000 1.000004 4.00000 osd.4 up 1.00000 1.0000018 4.00000 osd.18 up 1.00000 1.0000019 4.00000 osd.19 up 1.00000 1.0000020 4.00000 osd.20 up 1.00000 1.0000021 4.00000 osd.21 up 1.00000 1.0000022 4.00000 osd.22 up 1.00000 1.0000023 4.00000 osd.23 up 1.00000 1.0000024 4.00000 osd.24 up 1.00000 1.0000025 4.00000 osd.25 up 1.00000 1.00000-16 40.00000 host ceph0330 4.00000 osd.30 up 1.00000 1.0000031 4.00000 osd.31 up 1.00000 1.0000032 4.00000 osd.32 up 1.00000 1.0000033 4.00000 osd.33 up 1.00000 1.0000034 4.00000 osd.34 up 1.00000 1.0000035 4.00000 osd.35 up 1.00000 1.0000036 4.00000 osd.36 up 1.00000 1.0000037 4.00000 osd.37 up 1.00000 1.0000038 4.00000 osd.38 up 1.00000 1.0000039 4.00000 osd.39 up 1.00000 1.00000-18 40.00000 room b2-13 0 host ceph00-15 40.00000 host ceph022 4.00000 osd.2 up 1.00000 1.000005 4.00000 osd.5 up 1.00000 1.0000014 4.00000 osd.14 up 1.00000 1.0000015 4.00000 osd.15 up 1.00000 1.0000016 4.00000 osd.16 up 1.00000 1.0000017 4.00000 osd.17 up 1.00000 1.0000026 4.00000 osd.26 up 1.00000 1.0000027 4.00000 osd.27 up 1.00000 1.0000028 4.00000 osd.28 up 1.00000 1.0000029 4.00000 osd.29 up 1.00000 1.000000 0 osd.0 down 0 1.000003 0 osd.3 down 0 1.000006 0 osd.6 down 0 1.000007 0 osd.7 down 0 1.000008 0 osd.8 down 0 1.000009 0 osd.9 down 0 1.0000010 0 osd.10 down 0 1.0000011 0 osd.11 down 0 1.0000012 0 osd.12 down 0 1.0000013 0 osd.13 down 0 1.00000
Any assistance is greatly appreciated.
Bob
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com