I've been successfully running cephfs on my Debian Jessies for a while and one day after power outage, MDS wasn't happy. MDS crashing after it was done loading, increasing the memory utilization quite a bit. I was running infernalis 9.2.0 and did successful upgrade from Hammer before... so I thought I may have hit a bug and decided to try 9.2.1.
In 9.2.1, it was not happy that my journal didn't have permission for user ceph. So corrected it. Then all of my OSDs are no longer starting. Failing with similar messages as below. I upgraded to Jewel, as I didn't see too much more complexitiy to upgrade from Infernalis and am still seeing these errors.
2016-04-15 22:47:04.897500 7f65fbbb0800 0 set uid:gid to 1001:1001 (ceph:ceph)
2016-04-15 22:47:04.897635 7f65fbbb0800 0 ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4), process ceph-osd, pid 1284
2016-04-15 22:47:04.900585 7f65fbbb0800 0 pidfile_write: ignore empty --pid-file
2016-04-15 22:47:05.467530 7f65fbbb0800 0 filestore(/var/lib/ceph/osd/ceph-3) backend xfs (magic 0x58465342)
2016-04-15 22:47:05.477912 7f65fbbb0800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2016-04-15 22:47:05.477999 7f65fbbb0800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2016-04-15 22:47:05.478091 7f65fbbb0800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: splice is supported
2016-04-15 22:47:05.494593 7f65fbbb0800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2016-04-15 22:47:05.494785 7f65fbbb0800 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_feature: extsize is disabled by conf
2016-04-15 22:47:05.596738 7f65fbbb0800 1 leveldb: Recovering log #20899
2016-04-15 22:47:05.825914 7f65fbbb0800 1 leveldb: Delete type=0 #20899
2016-04-15 22:47:05.826089 7f65fbbb0800 1 leveldb: Delete type=3 #20898
2016-04-15 22:47:05.900058 7f65fbbb0800 0 filestore(/var/lib/ceph/osd/ceph-3) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2016-04-15 22:47:06.377878 7f65fbbb0800 1 journal _open /var/lib/ceph/osd/ceph-3/journal fd 18: 14998831104 bytes, block size 4096 bytes, directio = 1, aio = 1
2016-04-15 22:47:06.381738 7f65fbbb0800 1 journal _open /var/lib/ceph/osd/ceph-3/journal fd 18: 14998831104 bytes, block size 4096 bytes, directio = 1, aio = 1
2016-04-15 22:47:06.384954 7f65fbbb0800 1 filestore(/var/lib/ceph/osd/ceph-3) upgrade
2016-04-15 22:47:06.415851 7f65fbbb0800 0 <cls> cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
2016-04-15 22:47:06.419654 7f65fbbb0800 0 <cls> cls/hello/cls_hello.cc:305: loading cls_hello
2016-04-15 22:47:06.498512 7f65fbbb0800 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f65fbbb0800 time 2016-04-15 22:47:06.494680
osd/OSD.h: 885: FAILED assert(ret)
ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) [0x7f65fb6364f2]
2: (OSDService::get_map(unsigned int)+0x3d) [0x7f65fafbd83d]
3: (OSD::init()+0x1862) [0x7f65faf6ba52]
4: (main()+0x2b05) [0x7f65faed1735]
5: (__libc_start_main()+0xf5) [0x7f65f7a67b45]
6: (()+0x337197) [0x7f65faf1c197]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
-78> 2016-04-15 22:47:04.873688 7f65fbbb0800 5 asok(0x7f660689a000) register_command perfcounters_dump hook 0x7f66067e2030
-77> 2016-04-15 22:47:04.873771 7f65fbbb0800 5 asok(0x7f660689a000) register_command 1 hook 0x7f66067e2030
-76> 2016-04-15 22:47:04.873804 7f65fbbb0800 5 asok(0x7f660689a000) register_command perf dump hook 0x7f66067e2030
-75> 2016-04-15 22:47:04.873834 7f65fbbb0800 5 asok(0x7f660689a000) register_command perfcounters_schema hook 0x7f66067e2030
-76> 2016-04-15 22:47:04.873804 7f65fbbb0800 5 asok(0x7f660689a000) register_command perf dump hook 0x7f66067e2030
-75> 2016-04-15 22:47:04.873834 7f65fbbb0800 5 asok(0x7f660689a000) register_command perfcounters_schema hook 0x7f66067e2030
-74> 2016-04-15 22:47:04.873860 7f65fbbb0800 5 asok(0x7f660689a000) register_command 2 hook 0x7f66067e2030
-73> 2016-04-15 22:47:04.873886 7f65fbbb0800 5 asok(0x7f660689a000) register_command perf schema hook 0x7f66067e2030
-72> 2016-04-15 22:47:04.873916 7f65fbbb0800 5 asok(0x7f660689a000) register_command perf reset hook 0x7f66067e2030
-71> 2016-04-15 22:47:04.873943 7f65fbbb0800 5 asok(0x7f660689a000) register_command config show hook 0x7f66067e2030
-70> 2016-04-15 22:47:04.873974 7f65fbbb0800 5 asok(0x7f660689a000) register_command config set hook 0x7f66067e2030
-69> 2016-04-15 22:47:04.874000 7f65fbbb0800 5 asok(0x7f660689a000) register_command config get hook 0x7f66067e2030
-68> 2016-04-15 22:47:04.874029 7f65fbbb0800 5 asok(0x7f660689a000) register_command config diff hook 0x7f66067e2030
-67> 2016-04-15 22:47:04.874055 7f65fbbb0800 5 asok(0x7f660689a000) register_command log flush hook 0x7f66067e2030
-66> 2016-04-15 22:47:04.874082 7f65fbbb0800 5 asok(0x7f660689a000) register_command log dump hook 0x7f66067e2030
-65> 2016-04-15 22:47:04.874109 7f65fbbb0800 5 asok(0x7f660689a000) register_command log reopen hook 0x7f66067e2030
-64> 2016-04-15 22:47:04.897500 7f65fbbb0800 0 set uid:gid to 1001:1001 (ceph:ceph)
-63> 2016-04-15 22:47:04.897635 7f65fbbb0800 0 ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4), process ceph-osd, pid 1284
-62> 2016-04-15 22:47:04.900224 7f65fbbb0800 1 -- 192.168.1.31:0/0 learned my addr 192.168.1.31:0/0
-61> 2016-04-15 22:47:04.900286 7f65fbbb0800 1 accepter.accepter.bind my_inst.addr is 192.168.1.31:6802/1284 need_addr=0
-60> 2016-04-15 22:47:04.900350 7f65fbbb0800 1 -- 192.168.2.31:0/0 learned my addr 192.168.2.31:0/0
-59> 2016-04-15 22:47:04.900375 7f65fbbb0800 1 accepter.accepter.bind my_inst.addr is 192.168.2.31:6802/1284 need_addr=0
-58> 2016-04-15 22:47:04.900443 7f65fbbb0800 1 -- 192.168.2.31:0/0 learned my addr 192.168.2.31:0/0
-57> 2016-04-15 22:47:04.900475 7f65fbbb0800 1 accepter.accepter.bind my_inst.addr is 192.168.2.31:6803/1284 need_addr=0
-56> 2016-04-15 22:47:04.900538 7f65fbbb0800 1 -- 192.168.1.31:0/0 learned my addr 192.168.1.31:0/0
-55> 2016-04-15 22:47:04.900562 7f65fbbb0800 1 accepter.accepter.bind my_inst.addr is 192.168.1.31:6803/1284 need_addr=0
-54> 2016-04-15 22:47:04.900585 7f65fbbb0800 0 pidfile_write: ignore empty --pid-file
-53> 2016-04-15 22:47:04.909743 7f65fbbb0800 5 asok(0x7f660689a000) init /var/run/ceph/ceph-osd.3.asok
-52> 2016-04-15 22:47:04.909792 7f65fbbb0800 5 asok(0x7f660689a000) bind_and_listen /var/run/ceph/ceph-osd.3.asok
-51> 2016-04-15 22:47:04.909891 7f65fbbb0800 5 asok(0x7f660689a000) register_command 0 hook 0x7f66067de0d8
-50> 2016-04-15 22:47:04.909928 7f65fbbb0800 5 asok(0x7f660689a000) register_command version hook 0x7f66067de0d8
-49> 2016-04-15 22:47:04.909955 7f65fbbb0800 5 asok(0x7f660689a000) register_command git_version hook 0x7f66067de0d8
-48> 2016-04-15 22:47:04.909988 7f65fbbb0800 5 asok(0x7f660689a000) register_command help hook 0x7f66067e21e0
-47> 2016-04-15 22:47:04.910015 7f65fbbb0800 5 asok(0x7f660689a000) register_command get_command_descriptions hook 0x7f66067e21f0
-46> 2016-04-15 22:47:04.910205 7f65f43c9700 5 asok(0x7f660689a000) entry start
-45> 2016-04-15 22:47:04.910330 7f65fbbb0800 10 monclient(hunting): build_initial_monmap
-44> 2016-04-15 22:47:04.939070 7f65fbbb0800 5 adding auth protocol: cephx
-43> 2016-04-15 22:47:04.939118 7f65fbbb0800 5 adding auth protocol: cephx
-42> 2016-04-15 22:47:04.939986 7f65fbbb0800 5 asok(0x7f660689a000) register_command objecter_requests hook 0x7f66067e22b0
-41> 2016-04-15 22:47:04.940256 7f65fbbb0800 1 -- 192.168.1.31:6802/1284 messenger.start
-40> 2016-04-15 22:47:04.940413 7f65fbbb0800 1 -- :/0 messenger.start
-39> 2016-04-15 22:47:04.940557 7f65fbbb0800 1 -- 192.168.1.31:6803/1284 messenger.start
-38> 2016-04-15 22:47:04.940686 7f65fbbb0800 1 -- 192.168.2.31:6803/1284 messenger.start
-37> 2016-04-15 22:47:04.940798 7f65fbbb0800 1 -- 192.168.2.31:6802/1284 messenger.start
-36> 2016-04-15 22:47:04.940899 7f65fbbb0800 1 -- :/0 messenger.start
-35> 2016-04-15 22:47:04.941223 7f65fbbb0800 2 osd.3 0 mounting /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journal
-34> 2016-04-15 22:47:05.467530 7f65fbbb0800 0 filestore(/var/lib/ceph/osd/ceph-3) backend xfs (magic 0x58465342)
-33> 2016-04-15 22:47:05.477912 7f65fbbb0800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
-32> 2016-04-15 22:47:05.477999 7f65fbbb0800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
-31> 2016-04-15 22:47:05.478091 7f65fbbb0800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: splice is supported
-30> 2016-04-15 22:47:05.494593 7f65fbbb0800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
-29> 2016-04-15 22:47:05.494785 7f65fbbb0800 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_feature: extsize is disabled by conf
-28> 2016-04-15 22:47:05.596738 7f65fbbb0800 1 leveldb: Recovering log #20899
-27> 2016-04-15 22:47:05.825914 7f65fbbb0800 1 leveldb: Delete type=0 #20899
-26> 2016-04-15 22:47:05.826089 7f65fbbb0800 1 leveldb: Delete type=3 #20898
-25> 2016-04-15 22:47:05.900058 7f65fbbb0800 0 filestore(/var/lib/ceph/osd/ceph-3) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
-24> 2016-04-15 22:47:06.377715 7f65fbbb0800 2 journal open /var/lib/ceph/osd/ceph-3/journal fsid 4f86a418-6c67-4cb4-83a1-6c123c890036 fs_op_seq 9829589
-23> 2016-04-15 22:47:06.377878 7f65fbbb0800 1 journal _open /var/lib/ceph/osd/ceph-3/journal fd 18: 14998831104 bytes, block size 4096 bytes, directio = 1, aio = 1
-22> 2016-04-15 22:47:06.379811 7f65fbbb0800 2 journal open advancing committed_seq 9829584 to fs op_seq 9829589
-21> 2016-04-15 22:47:06.380757 7f65fbbb0800 2 journal read_entry 2537717760 : seq 9829585 29509 bytes
-20> 2016-04-15 22:47:06.380996 7f65fbbb0800 2 journal read_entry 2537750528 : seq 9829586 8134 bytes
-19> 2016-04-15 22:47:06.381091 7f65fbbb0800 2 journal read_entry 2537762816 : seq 9829587 3064 bytes
-18> 2016-04-15 22:47:06.381155 7f65fbbb0800 2 journal read_entry 2537766912 : seq 9829588 7647 bytes
-17> 2016-04-15 22:47:06.381219 7f65fbbb0800 2 journal read_entry 2537775104 : seq 9829589 4737 bytes
-16> 2016-04-15 22:47:06.381257 7f65fbbb0800 2 journal No further valid entries found, journal is most likely valid
-15> 2016-04-15 22:47:06.381287 7f65fbbb0800 2 journal No further valid entries found, journal is most likely valid
-14> 2016-04-15 22:47:06.381302 7f65fbbb0800 3 journal journal_replay: end of journal, done.
-13> 2016-04-15 22:47:06.381738 7f65fbbb0800 1 journal _open /var/lib/ceph/osd/ceph-3/journal fd 18: 14998831104 bytes, block size 4096 bytes, directio = 1, aio = 1
-12> 2016-04-15 22:47:06.384954 7f65fbbb0800 1 filestore(/var/lib/ceph/osd/ceph-3) upgrade
-11> 2016-04-15 22:47:06.385071 7f65fbbb0800 2 osd.3 0 boot
-10> 2016-04-15 22:47:06.415253 7f65fbbb0800 1 <cls> cls/statelog/cls_statelog.cc:306: Loaded log class!
-9> 2016-04-15 22:47:06.415851 7f65fbbb0800 0 <cls> cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
-8> 2016-04-15 22:47:06.418172 7f65fbbb0800 1 <cls> cls/version/cls_version.cc:228: Loaded version class!
-7> 2016-04-15 22:47:06.419654 7f65fbbb0800 0 <cls> cls/hello/cls_hello.cc:305: loading cls_hello
-6> 2016-04-15 22:47:06.426520 7f65fbbb0800 1 <cls> cls/refcount/cls_refcount.cc:232: Loaded refcount class!
-5> 2016-04-15 22:47:06.427217 7f65fbbb0800 1 <cls> cls/user/cls_user.cc:375: Loaded user class!
-4> 2016-04-15 22:47:06.428364 7f65fbbb0800 1 <cls> cls/replica_log/cls_replica_log.cc:141: Loaded replica log class!
-3> 2016-04-15 22:47:06.428970 7f65fbbb0800 1 <cls> cls/timeindex/cls_timeindex.cc:259: Loaded timeindex class!
-2> 2016-04-15 22:47:06.430177 7f65fbbb0800 1 <cls> cls/log/cls_log.cc:317: Loaded log class!
-1> 2016-04-15 22:47:06.438063 7f65fbbb0800 1 <cls> cls/rgw/cls_rgw.cc:3206: Loaded rgw class!
0> 2016-04-15 22:47:06.498512 7f65fbbb0800 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f65fbbb0800 time 2016-04-15 22:47:06.494680
osd/OSD.h: 885: FAILED assert(ret)
ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) [0x7f65fb6364f2]
2: (OSDService::get_map(unsigned int)+0x3d) [0x7f65fafbd83d]
3: (OSD::init()+0x1862) [0x7f65faf6ba52]
4: (main()+0x2b05) [0x7f65faed1735]
5: (__libc_start_main()+0xf5) [0x7f65f7a67b45]
6: (()+0x337197) [0x7f65faf1c197]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 newstore
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
1/ 5 kinetic
1/ 5 fuse
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.3.log
--- end dump of recent events ---
2016-04-15 22:47:06.509080 7f65fbbb0800 -1 *** Caught signal (Aborted) **
in thread 7f65fbbb0800 thread_name:ceph-osd
ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4)
1: (()+0x949117) [0x7f65fb52e117]
2: (()+0xf8d0) [0x7f65f9a318d0]
3: (gsignal()+0x37) [0x7f65f7a7b067]
4: (abort()+0x148) [0x7f65f7a7c448]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x7f65fb6366c6]
6: (OSDService::get_map(unsigned int)+0x3d) [0x7f65fafbd83d]
7: (OSD::init()+0x1862) [0x7f65faf6ba52]
8: (main()+0x2b05) [0x7f65faed1735]
9: (__libc_start_main()+0xf5) [0x7f65f7a67b45]
10: (()+0x337197) [0x7f65faf1c197]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
0> 2016-04-15 22:47:06.509080 7f65fbbb0800 -1 *** Caught signal (Aborted) **
in thread 7f65fbbb0800 thread_name:ceph-osd
ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4)
1: (()+0x949117) [0x7f65fb52e117]
2: (()+0xf8d0) [0x7f65f9a318d0]
3: (gsignal()+0x37) [0x7f65f7a7b067]
4: (abort()+0x148) [0x7f65f7a7c448]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x7f65fb6366c6]
6: (OSDService::get_map(unsigned int)+0x3d) [0x7f65fafbd83d]
7: (OSD::init()+0x1862) [0x7f65faf6ba52]
8: (main()+0x2b05) [0x7f65faed1735]
9: (__libc_start_main()+0xf5) [0x7f65f7a67b45]
10: (()+0x337197) [0x7f65faf1c197]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 newstore
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
1/ 5 kinetic
1/ 5 fuse
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.3.log
--- end dump of recent events ---
What can I try to get this OSD back online? I saw some similar issues on google but I wasn't sure if that was actually the same issue.
If I run in to MDS issue after resolving this, I'll send out another one. =) Thanks all!
Regards,
Hong
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com