On Mon, Apr 15, 2013 at 2:42 AM, Olivier Bonvalet <ceph.list@xxxxxxxxx> wrote: > Hi, > > I have an OSD process which is regulary shutdown by scrub, if I well > understand that trace : > > 0> 2013-04-15 09:29:53.708141 7f5a8e3cc700 -1 *** Caught signal (Aborted) ** > in thread 7f5a8e3cc700 > > ceph version 0.56.4-4-gd89ab0e (d89ab0ea6fa8d0961cad82f6a81eccbd3bbd3f55) > 1: /usr/bin/ceph-osd() [0x7a6289] > 2: (()+0xeff0) [0x7f5aa08faff0] > 3: (gsignal()+0x35) [0x7f5a9f3841b5] > 4: (abort()+0x180) [0x7f5a9f386fc0] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f5a9fc18dc5] > 6: (()+0xcb166) [0x7f5a9fc17166] > 7: (()+0xcb193) [0x7f5a9fc17193] > 8: (()+0xcb28e) [0x7f5a9fc1728e] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7c9) [0x8f9549] > 10: (ReplicatedPG::_scrub(ScrubMap&)+0x1a78) [0x57a038] > 11: (PG::scrub_compare_maps()+0xeb8) [0x696c18] > 12: (PG::chunky_scrub()+0x2d9) [0x6c37f9] > 13: (PG::scrub()+0x145) [0x6c4e55] > 14: (OSD::ScrubWQ::_process(PG*)+0xc) [0x64048c] > 15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x815179] > 16: (ThreadPool::WorkThread::entry()+0x10) [0x817980] > 17: (()+0x68ca) [0x7f5aa08f28ca] > 18: (clone()+0x6d) [0x7f5a9f421b6d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 0/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/ 5 hadoop > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > -1/-1 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/osd.25.log > --- end dump of recent events --- > > > I tried to format that OSD, and re-inject it in the cluster, but after > the recovery the problem still occur. > > Since I don't see any hard drive error in kernel logs, what can be the > problem ? Are you saying you saw this problem more than once, and so you completely wiped the OSD in question, then brought it back into the cluster, and now it's seeing this error again? Are any other OSDs experiencing this issue? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com