Re: Scrub shutdown the OSD process

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 15 Apr 2013 10:16:30 -0700



On Mon, Apr 15, 2013 at 2:42 AM, Olivier Bonvalet <ceph.list@xxxxxxxxx> wrote:
> Hi,
>
> I have an OSD process which is regulary shutdown by scrub, if I well
> understand that trace :
>
>      0> 2013-04-15 09:29:53.708141 7f5a8e3cc700 -1 *** Caught signal (Aborted) **
>  in thread 7f5a8e3cc700
>
>  ceph version 0.56.4-4-gd89ab0e (d89ab0ea6fa8d0961cad82f6a81eccbd3bbd3f55)
>  1: /usr/bin/ceph-osd() [0x7a6289]
>  2: (()+0xeff0) [0x7f5aa08faff0]
>  3: (gsignal()+0x35) [0x7f5a9f3841b5]
>  4: (abort()+0x180) [0x7f5a9f386fc0]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f5a9fc18dc5]
>  6: (()+0xcb166) [0x7f5a9fc17166]
>  7: (()+0xcb193) [0x7f5a9fc17193]
>  8: (()+0xcb28e) [0x7f5a9fc1728e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7c9) [0x8f9549]
>  10: (ReplicatedPG::_scrub(ScrubMap&)+0x1a78) [0x57a038]
>  11: (PG::scrub_compare_maps()+0xeb8) [0x696c18]
>  12: (PG::chunky_scrub()+0x2d9) [0x6c37f9]
>  13: (PG::scrub()+0x145) [0x6c4e55]
>  14: (OSD::ScrubWQ::_process(PG*)+0xc) [0x64048c]
>  15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x815179]
>  16: (ThreadPool::WorkThread::entry()+0x10) [0x817980]
>  17: (()+0x68ca) [0x7f5aa08f28ca]
>  18: (clone()+0x6d) [0x7f5a9f421b6d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    0/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 hadoop
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -1/-1 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/osd.25.log
> --- end dump of recent events ---
>
>
> I tried to format that OSD, and re-inject it in the cluster, but after
> the recovery the problem still occur.
>
> Since I don't see any hard drive error in kernel logs, what can be the
> problem ?

Are you saying you saw this problem more than once, and so you
completely wiped the OSD in question, then brought it back into the
cluster, and now it's seeing this error again?
Are any other OSDs experiencing this issue?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com