Some additional infos : today at 18:57:40, the PG 3.1 [19,5,28] was having a scrub date of "2013-03-28 08:38:12.858041", and the OSD 28 was recovering. Ten minutes later (@ 19:07:40), that PG 3.1 was having a scrub date of today. But at 19:41:04 I seen a error in syslog : osd.10 52042 heartbeat_check: no reply from osd.28 since 2013-04-17 19:40:43.565511 So, since 19:47:44, the PG 3.1 [19,5] is in "active+degraded" state, is scrub date is returned to "2013-03-28 08:38:12.858041" ; and of course the osd.28 is DOWN, the process abort : 0> 2013-04-17 19:40:46.791010 7f6658f5a700 -1 *** Caught signal (Aborted) ** in thread 7f6658f5a700 ceph version 0.56.4-4-gd89ab0e (d89ab0ea6fa8d0961cad82f6a81eccbd3bbd3f55) 1: /usr/bin/ceph-osd() [0x7a6289] 2: (()+0xeff0) [0x7f666b488ff0] 3: (gsignal()+0x35) [0x7f6669f121b5] 4: (abort()+0x180) [0x7f6669f14fc0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f666a7a6dc5] 6: (()+0xcb166) [0x7f666a7a5166] 7: (()+0xcb193) [0x7f666a7a5193] 8: (()+0xcb28e) [0x7f666a7a528e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7c9) [0x8f9549] 10: (ReplicatedPG::_scrub(ScrubMap&)+0x1a78) [0x57a038] 11: (PG::scrub_compare_maps()+0xeb8) [0x696c18] 12: (PG::chunky_scrub()+0x2d9) [0x6c37f9] 13: (PG::scrub()+0x145) [0x6c4e55] 14: (OSD::ScrubWQ::_process(PG*)+0xc) [0x64048c] 15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x815179] 16: (ThreadPool::WorkThread::entry()+0x10) [0x817980] 17: (()+0x68ca) [0x7f666b4808ca] 18: (clone()+0x6d) [0x7f6669fafb6d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. What I didn't understand is why the OSD process crash, instead of marking that PG "corrupted", and does that PG really "corrupted" are is this just an OSD bug ? Thanks, Olivier _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com