Flapping/Crashing OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

Have a log full of -

"log [ERR] : 1.9 log bound mismatch, info (46784'1236417,46797'1239418] actual [46784'1235968,46797'1239418]"

"192.168.7.177:6800/15655 >> 192.168.7.183:6802/3348 pipe(0x20e4f00 sd=65 :56394 s=2 pgs=24194 cs=1 l=0 c=0x19668f20).fault, initiating reconnect"

and an OSD that showed as down, started it up and data synced as expected but then the osd started crashing and rebooting on cycle.

Log can be obtained from http://onlinefusion.co.uk/info/ceph-osd.4.zip (Trimmed the repeating parts so it's 160KB), snippet below.
Any ideas what's wrong with it?

--------------

-1> 2014-02-20 11:54:55.703196 7fbca1278700 0 log [ERR] : 1.9 log bound mismatch, info (46784'1236417,46797'1239418] actual [46784'1235968,46797'1239418] 0> 2014-02-20 11:55:05.243723 7fbc9f274700 -1 os/DBObjectMap.cc: In function 'virtual bool DBObjectMap::DBObjectMapIteratorImpl::valid()' thread 7fbc9f274700 time 2014-02-20 11:55:05.240689
os/DBObjectMap.cc: 400: FAILED assert(!valid || cur_iter->valid())

 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
 1: /usr/bin/ceph-osd() [0x95b762]
2: (PG::_scan_list(ScrubMap&, std::vector<hobject_t, std::allocator<hobject_t> >&, bool, ThreadPool::TPHandle&)+0xed7) [0x86e3b7] 3: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x106) [0x871256] 4: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x88e) [0x871f1e] 5: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle&)+0xbd) [0x740b8d]
 6: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68c) [0xa33d9c]
 7: (ThreadPool::WorkThread::entry()+0x10) [0xa34ff0]
 8: (()+0x7f8e) [0x7fbcbe3c6f8e]
 9: (clone()+0x6d) [0x7fbcbc8e5a0d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 0 lockdep
   0/ 0 context
   0/ 0 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 0 buffer
   0/ 0 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 0 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 0 osd
   0/ 0 optracker
   0/ 0 objclass
   0/ 0 filestore
   0/ 0 journal
   0/ 0 ms
   1/ 5 mon
   0/ 0 monc
   1/ 5 paxos
   0/ 0 tp
   0/ 0 auth
   1/ 5 crypto
   0/ 0 finisher
   0/ 0 heartbeatmap
   0/ 0 perfcounter
   1/ 5 rgw
   1/ 5 javaclient
   0/ 0 asok
   0/ 0 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.4.log
--- end dump of recent events ---
2014-02-20 11:55:05.343940 7fbc9f274700 -1 *** Caught signal (Aborted) **
 in thread 7fbc9f274700

 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
 1: /usr/bin/ceph-osd() [0x97dc70]
 2: (()+0xfbd0) [0x7fbcbe3cebd0]
 3: (gsignal()+0x37) [0x7fbcbc822037]
 4: (abort()+0x148) [0x7fbcbc825698]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fbcbd12fe8d]
 6: (()+0x5ef76) [0x7fbcbd12df76]
 7: (()+0x5efa3) [0x7fbcbd12dfa3]
 8: (()+0x5f1de) [0x7fbcbd12e1de]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x43d) [0xa3f64d]
 10: /usr/bin/ceph-osd() [0x95b762]
11: (PG::_scan_list(ScrubMap&, std::vector<hobject_t, std::allocator<hobject_t> >&, bool, ThreadPool::TPHandle&)+0xed7) [0x86e3b7] 12: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x106) [0x871256] 13: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x88e) [0x871f1e] 14: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle&)+0xbd) [0x740b8d]
 15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68c) [0xa33d9c]
 16: (ThreadPool::WorkThread::entry()+0x10) [0xa34ff0]
 17: (()+0x7f8e) [0x7fbcbe3c6f8e]
 18: (clone()+0x6d) [0x7fbcbc8e5a0d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
0> 2014-02-20 11:55:05.343940 7fbc9f274700 -1 *** Caught signal (Aborted) **
 in thread 7fbc9f274700

 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
 1: /usr/bin/ceph-osd() [0x97dc70]
 2: (()+0xfbd0) [0x7fbcbe3cebd0]
 3: (gsignal()+0x37) [0x7fbcbc822037]
 4: (abort()+0x148) [0x7fbcbc825698]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fbcbd12fe8d]
 6: (()+0x5ef76) [0x7fbcbd12df76]
 7: (()+0x5efa3) [0x7fbcbd12dfa3]
 8: (()+0x5f1de) [0x7fbcbd12e1de]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x43d) [0xa3f64d]
 10: /usr/bin/ceph-osd() [0x95b762]
11: (PG::_scan_list(ScrubMap&, std::vector<hobject_t, std::allocator<hobject_t> >&, bool, ThreadPool::TPHandle&)+0xed7) [0x86e3b7] 12: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x106) [0x871256] 13: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x88e) [0x871f1e] 14: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle&)+0xbd) [0x740b8d]
 15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68c) [0xa33d9c]
 16: (ThreadPool::WorkThread::entry()+0x10) [0xa34ff0]
 17: (()+0x7f8e) [0x7fbcbe3c6f8e]
 18: (clone()+0x6d) [0x7fbcbc8e5a0d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--------------
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux