Anyone? Can this page be saved? If not what are my options?
Regards,
Hong
On Saturday, September 16, 2017 1:55 AM, hjcho616 <hjcho616@xxxxxxxxx> wrote:
Looking better... working on scrubbing..
HEALTH_ERR 1 pgs are stuck inactive for more than 300 seconds; 1 pgs incomplete; 12 pgs inconsistent; 2 pgs repair; 1 pgs stuck inactive; 1 pgs stuck unclean; 109 scrub errors; too few PGs per OSD (29 < min 30); mds rank 0 has failed; mds cluster is degraded; noout flag(s) set; no legacy OSD present but 'sortbitwise' flag is not set
Now PG1.28.. looking at all old osds dead or alive. Only one with DIR_* directory is in osd.4. This appears to be metadata pool! 21M of metadata can be quite a bit of stuff.. so I would like to rescue this! But I am not able to start this OSD. exporting through ceph-objectstore-tool appears to crash. Even with --skip-journal-replay and --skip-mount-omap (different failure). As I mentioned in earlier email, that exception thrown message is bogus...
# ceph-objectstore-tool --op export --pgid 1.28 --data-path /var/lib/ceph/osd/ceph-4 --journal-path /var/lib/ceph/osd/ceph-4/journal --file ~/1.28.export
terminate called after throwing an instance of 'std::domain_error'
what(): coll_t::decode(): don't know how to decode version 1
*** Caught signal (Aborted) **
in thread 7f812e7fb940 thread_name:ceph-objectstor
ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
1: (()+0x996a57) [0x55dee175fa57]
2: (()+0x110c0) [0x7f812d0050c0]
3: (gsignal()+0xcf) [0x7f812b438fcf]
4: (abort()+0x16a) [0x7f812b43a3fa]
5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f812bd1fb3d]
6: (()+0x5ebb6) [0x7f812bd1dbb6]
7: (()+0x5ec01) [0x7f812bd1dc01]
8: (()+0x5ee19) [0x7f812bd1de19]
9: (coll_t::decode(ceph::buffer::list::iterator&)+0x21e) [0x55dee143001e]
10: (DBObjectMap::_Header::decode(ceph::buffer::list::iterator&)+0x125) [0x55dee156d5f5]
11: (DBObjectMap::check(std::ostream&, bool)+0x279) [0x55dee1562bb9]
12: (DBObjectMap::init(bool)+0x288) [0x55dee1561eb8]
13: (FileStore::mount()+0x2525) [0x55dee1498eb5]
14: (main()+0x28c0) [0x55dee10c9400]
15: (__libc_start_main()+0xf1) [0x7f812b4262b1]
16: (()+0x34f747) [0x55dee1118747]
Aborted
# ceph-objectstore-tool --op export --pgid 1.28 --data-path /var/lib/ceph/osd/ceph-4 --journal-path /var/lib/ceph/osd/ceph-4/journal --file ~/1.28.export --skip-journal-replay
terminate called after throwing an instance of 'std::domain_error'
what(): coll_t::decode(): don't know how to decode version 1
*** Caught signal (Aborted) **
in thread 7fa6d087b940 thread_name:ceph-objectstor
ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
1: (()+0x996a57) [0x55abd356aa57]
2: (()+0x110c0) [0x7fa6cf0850c0]
3: (gsignal()+0xcf) [0x7fa6cd4b8fcf]
4: (abort()+0x16a) [0x7fa6cd4ba3fa]
5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7fa6cdd9fb3d]
6: (()+0x5ebb6) [0x7fa6cdd9dbb6]
7: (()+0x5ec01) [0x7fa6cdd9dc01]
8: (()+0x5ee19) [0x7fa6cdd9de19]
9: (coll_t::decode(ceph::buffer::list::iterator&)+0x21e) [0x55abd323b01e]
10: (DBObjectMap::_Header::decode(ceph::buffer::list::iterator&)+0x125) [0x55abd33785f5]
11: (DBObjectMap::check(std::ostream&, bool)+0x279) [0x55abd336dbb9]
12: (DBObjectMap::init(bool)+0x288) [0x55abd336ceb8]
13: (FileStore::mount()+0x2525) [0x55abd32a3eb5]
14: (main()+0x28c0) [0x55abd2ed4400]
15: (__libc_start_main()+0xf1) [0x7fa6cd4a62b1]
16: (()+0x34f747) [0x55abd2f23747]
Aborted
# ceph-objectstore-tool --op export --pgid 1.28 --data-path /var/lib/ceph/osd/ceph-4 --journal-path /var/lib/ceph/osd/ceph-4/journal --file ~/1.28.export --skip-mount-omap
ceph-objectstore-tool: /usr/include/boost/smart_ptr/scoped_ptr.hpp:99: T* boost::scoped_ptr<T>::operator->() const [with T = ObjectMap]: Assertion `px != 0' failed.
*** Caught signal (Aborted) **
in thread 7f14345c5940 thread_name:ceph-objectstor
ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
1: (()+0x996a57) [0x5575b50a9a57]
2: (()+0x110c0) [0x7f1432dcf0c0]
3: (gsignal()+0xcf) [0x7f1431202fcf]
4: (abort()+0x16a) [0x7f14312043fa]
5: (()+0x2be37) [0x7f14311fbe37]
6: (()+0x2bee2) [0x7f14311fbee2]
7: (()+0x2fa19c) [0x5575b4a0d19c]
8: (FileStore::omap_get_values(coll_t const&, ghobject_t const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0x6c2) [0x5575b4dc9322]
9: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x235) [0x5575b4ab3135]
10: (main()+0x5bd6) [0x5575b4a16716]
11: (__libc_start_main()+0xf1) [0x7f14311f02b1]
12: (()+0x34f747) [0x5575b4a62747]
When trying to bring up osd.4 we get this message. Feels very similar to that crash in first two above.
ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
1: (()+0x960e57) [0x5565e564ae57]
2: (()+0x110c0) [0x7f34aa17e0c0]
3: (gsignal()+0xcf) [0x7f34a81c4fcf]
4: (abort()+0x16a) [0x7f34a81c63fa]
5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f34a8aabb3d]
6: (()+0x5ebb6) [0x7f34a8aa9bb6]
7: (()+0x5ec01) [0x7f34a8aa9c01]
8: (()+0x5ee19) [0x7f34a8aa9e19]
9: (coll_t::decode(ceph::buffer::list::iterator&)+0x21e) [0x5565e531933e]
10: (DBObjectMap::_Header::decode(ceph::buffer::list::iterator&)+0x125) [0x5565e54c02f5]
11: (DBObjectMap::check(std::ostream&, bool)+0x279) [0x5565e54b58b9]
12: (DBObjectMap::init(bool)+0x288) [0x5565e54b4bb8]
13: (FileStore::mount()+0x2525) [0x5565e53e0185]
14: (OSD::init()+0x27d) [0x5565e50797ed]
15: (main()+0x2a64) [0x5565e4fe05d4]
16: (__libc_start_main()+0xf1) [0x7f34a81b22b1]
17: (()+0x341117) [0x5565e502b117]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 newstore
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
1/ 5 kinetic
1/ 5 fuse
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.4.log
--- end dump of recent events ---
2017-09-15 18:54:46.429439 7fc5fbc867c0 0 set uid:gid to 1001:1001 (ceph:ceph)
2017-09-15 18:54:46.429451 7fc5fbc867c0 0 ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0), process ceph-osd, pid 22671
2017-09-15 18:54:46.430384 7fc5fbc867c0 0 pidfile_write: ignore empty --pid-file
2017-09-15 18:54:46.439836 7fc5fbc867c0 0 filestore(/var/lib/ceph/osd/ceph-4) backend xfs (magic 0x58465342)
2017-09-15 18:54:46.440234 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2017-09-15 18:54:46.440238 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2017-09-15 18:54:46.440250 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: splice is supported
2017-09-15 18:54:46.474005 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2017-09-15 18:54:46.474186 7fc5fbc867c0 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: extsize is disabled by conf
2017-09-15 18:54:46.475050 7fc5fbc867c0 1 leveldb: Recovering log #31539
2017-09-15 18:54:46.617922 7fc5fbc867c0 1 leveldb: Delete type=3 #31538
2017-09-15 18:54:46.617978 7fc5fbc867c0 1 leveldb: Delete type=0 #31539
2017-09-15 18:54:56.846756 7fc5fbc867c0 -1 *** Caught signal (Aborted) **
in thread 7fc5fbc867c0 thread_name:ceph-osd
ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
1: (()+0x960e57) [0x558935ed9e57]
2: (()+0x110c0) [0x7fc5faaf40c0]
3: (gsignal()+0xcf) [0x7fc5f8b3afcf]
4: (abort()+0x16a) [0x7fc5f8b3c3fa]
5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7fc5f9421b3d]
6: (()+0x5ebb6) [0x7fc5f941fbb6]
7: (()+0x5ec01) [0x7fc5f941fc01]
8: (()+0x5ee19) [0x7fc5f941fe19]
9: (coll_t::decode(ceph::buffer::list::iterator&)+0x21e) [0x558935ba833e]
10: (DBObjectMap::_Header::decode(ceph::buffer::list::iterator&)+0x125) [0x558935d4f2f5]
11: (DBObjectMap::check(std::ostream&, bool)+0x279) [0x558935d448b9]
12: (DBObjectMap::init(bool)+0x288) [0x558935d43bb8]
13: (FileStore::mount()+0x2525) [0x558935c6f185]
14: (OSD::init()+0x27d) [0x5589359087ed]
15: (main()+0x2a64) [0x55893586f5d4]
16: (__libc_start_main()+0xf1) [0x7fc5f8b282b1]
17: (()+0x341117) [0x5589358ba117]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
-53> 2017-09-15 18:54:46.423990 7fc5fbc867c0 5 asok(0x558941004000) register_command perfcounters_dump hook 0x558940f4c030
-52> 2017-09-15 18:54:46.424007 7fc5fbc867c0 5 asok(0x558941004000) register_command 1 hook 0x558940f4c030
-51> 2017-09-15 18:54:46.424011 7fc5fbc867c0 5 asok(0x558941004000) register_command perf dump hook 0x558940f4c030
-50> 2017-09-15 18:54:46.424015 7fc5fbc867c0 5 asok(0x558941004000) register_command perfcounters_schema hook 0x558940f4c030
-49> 2017-09-15 18:54:46.424018 7fc5fbc867c0 5 asok(0x558941004000) register_command 2 hook 0x558940f4c030
-48> 2017-09-15 18:54:46.424021 7fc5fbc867c0 5 asok(0x558941004000) register_command perf schema hook 0x558940f4c030
-47> 2017-09-15 18:54:46.424024 7fc5fbc867c0 5 asok(0x558941004000) register_command perf reset hook 0x558940f4c030
-46> 2017-09-15 18:54:46.424027 7fc5fbc867c0 5 asok(0x558941004000) register_command config show hook 0x558940f4c030
-45> 2017-09-15 18:54:46.424031 7fc5fbc867c0 5 asok(0x558941004000) register_command config set hook 0x558940f4c030
-44> 2017-09-15 18:54:46.424034 7fc5fbc867c0 5 asok(0x558941004000) register_command config get hook 0x558940f4c030
-43> 2017-09-15 18:54:46.424037 7fc5fbc867c0 5 asok(0x558941004000) register_command config diff hook 0x558940f4c030
-42> 2017-09-15 18:54:46.424040 7fc5fbc867c0 5 asok(0x558941004000) register_command log flush hook 0x558940f4c030
-41> 2017-09-15 18:54:46.424043 7fc5fbc867c0 5 asok(0x558941004000) register_command log dump hook 0x558940f4c030
-40> 2017-09-15 18:54:46.424047 7fc5fbc867c0 5 asok(0x558941004000) register_command log reopen hook 0x558940f4c030
-39> 2017-09-15 18:54:46.429439 7fc5fbc867c0 0 set uid:gid to 1001:1001 (ceph:ceph)
-38> 2017-09-15 18:54:46.429451 7fc5fbc867c0 0 ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0), process ceph-osd, pid 22671
-37> 2017-09-15 18:54:46.430326 7fc5fbc867c0 1 -- 192.168.1.31:0/0 learned my addr 192.168.1.31:0/0
-36> 2017-09-15 18:54:46.430333 7fc5fbc867c0 1 accepter.accepter.bind my_inst.addr is 192.168.1.31:6819/22671 need_addr=0
-35> 2017-09-15 18:54:46.430346 7fc5fbc867c0 1 -- 192.168.2.31:0/0 learned my addr 192.168.2.31:0/0
-34> 2017-09-15 18:54:46.430350 7fc5fbc867c0 1 accepter.accepter.bind my_inst.addr is 192.168.2.31:6818/22671 need_addr=0
-33> 2017-09-15 18:54:46.430360 7fc5fbc867c0 1 -- 192.168.2.31:0/0 learned my addr 192.168.2.31:0/0
-32> 2017-09-15 18:54:46.430363 7fc5fbc867c0 1 accepter.accepter.bind my_inst.addr is 192.168.2.31:6819/22671 need_addr=0
-31> 2017-09-15 18:54:46.430378 7fc5fbc867c0 1 -- 192.168.1.31:0/0 learned my addr 192.168.1.31:0/0
-30> 2017-09-15 18:54:46.430381 7fc5fbc867c0 1 accepter.accepter.bind my_inst.addr is 192.168.1.31:6823/22671 need_addr=0
-29> 2017-09-15 18:54:46.430384 7fc5fbc867c0 0 pidfile_write: ignore empty --pid-file
-28> 2017-09-15 18:54:46.432245 7fc5fbc867c0 5 asok(0x558941004000) init /var/run/ceph/ceph-osd.4.asok
-27> 2017-09-15 18:54:46.432253 7fc5fbc867c0 5 asok(0x558941004000) bind_and_listen /var/run/ceph/ceph-osd.4.asok
-26> 2017-09-15 18:54:46.432316 7fc5fbc867c0 5 asok(0x558941004000) register_command 0 hook 0x558940f480d0
-25> 2017-09-15 18:54:46.432322 7fc5fbc867c0 5 asok(0x558941004000) register_command version hook 0x558940f480d0
-24> 2017-09-15 18:54:46.432326 7fc5fbc867c0 5 asok(0x558941004000) register_command git_version hook 0x558940f480d0
-23> 2017-09-15 18:54:46.432329 7fc5fbc867c0 5 asok(0x558941004000) register_command help hook 0x558940f4c1e0
-22> 2017-09-15 18:54:46.432333 7fc5fbc867c0 5 asok(0x558941004000) register_command get_command_descriptions hook 0x558940f4c1f0
-21> 2017-09-15 18:54:46.432359 7fc5f543e700 5 asok(0x558941004000) entry start
-20> 2017-09-15 18:54:46.432381 7fc5fbc867c0 10 monclient(hunting): build_initial_monmap
-19> 2017-09-15 18:54:46.439452 7fc5fbc867c0 5 adding auth protocol: none
-18> 2017-09-15 18:54:46.439462 7fc5fbc867c0 5 adding auth protocol: none
-17> 2017-09-15 18:54:46.439608 7fc5fbc867c0 5 asok(0x558941004000) register_command objecter_requests hook 0x558940f4c2b0
-16> 2017-09-15 18:54:46.439678 7fc5fbc867c0 1 -- 192.168.1.31:6819/22671 messenger.start
-15> 2017-09-15 18:54:46.439700 7fc5fbc867c0 1 -- :/0 messenger.start
-14> 2017-09-15 18:54:46.439713 7fc5fbc867c0 1 -- 192.168.1.31:6823/22671 messenger.start
-13> 2017-09-15 18:54:46.439726 7fc5fbc867c0 1 -- 192.168.2.31:6819/22671 messenger.start
-12> 2017-09-15 18:54:46.439738 7fc5fbc867c0 1 -- 192.168.2.31:6818/22671 messenger.start
-11> 2017-09-15 18:54:46.439750 7fc5fbc867c0 1 -- :/0 messenger.start
-10> 2017-09-15 18:54:46.439791 7fc5fbc867c0 2 osd.4 0 mounting /var/lib/ceph/osd/ceph-4 /var/lib/ceph/osd/ceph-4/journal
-9> 2017-09-15 18:54:46.439836 7fc5fbc867c0 0 filestore(/var/lib/ceph/osd/ceph-4) backend xfs (magic 0x58465342)
-8> 2017-09-15 18:54:46.440234 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
-7> 2017-09-15 18:54:46.440238 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
-6> 2017-09-15 18:54:46.440250 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: splice is supported
-5> 2017-09-15 18:54:46.474005 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
-4> 2017-09-15 18:54:46.474186 7fc5fbc867c0 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: extsize is disabled by conf
-3> 2017-09-15 18:54:46.475050 7fc5fbc867c0 1 leveldb: Recovering log #31539
-2> 2017-09-15 18:54:46.617922 7fc5fbc867c0 1 leveldb: Delete type=3 #31538
-1> 2017-09-15 18:54:46.617978 7fc5fbc867c0 1 leveldb: Delete type=0 #31539
0> 2017-09-15 18:54:56.846756 7fc5fbc867c0 -1 *** Caught signal (Aborted) **
in thread 7fc5fbc867c0 thread_name:ceph-osd
ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
1: (()+0x960e57) [0x558935ed9e57]
2: (()+0x110c0) [0x7fc5faaf40c0]
3: (gsignal()+0xcf) [0x7fc5f8b3afcf]
4: (abort()+0x16a) [0x7fc5f8b3c3fa]
5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7fc5f9421b3d]
6: (()+0x5ebb6) [0x7fc5f941fbb6]
7: (()+0x5ec01) [0x7fc5f941fc01]
8: (()+0x5ee19) [0x7fc5f941fe19]
9: (coll_t::decode(ceph::buffer::list::iterator&)+0x21e) [0x558935ba833e]
10: (DBObjectMap::_Header::decode(ceph::buffer::list::iterator&)+0x125) [0x558935d4f2f5]
11: (DBObjectMap::check(std::ostream&, bool)+0x279) [0x558935d448b9]
12: (DBObjectMap::init(bool)+0x288) [0x558935d43bb8]
13: (FileStore::mount()+0x2525) [0x558935c6f185]
14: (OSD::init()+0x27d) [0x5589359087ed]
15: (main()+0x2a64) [0x55893586f5d4]
16: (__libc_start_main()+0xf1) [0x7fc5f8b282b1]
17: (()+0x341117) [0x5589358ba117]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 newstore
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
1/ 5 kinetic
1/ 5 fuse
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.4.log
--- end dump of recent events ---
What can I do to save that PG1.28? Please let me know if you need more information. So close!... =)
Regards,
Hong
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com