Well, Am 01.03.2012 um 18:15 schrieb Oliver Francke: > Hi *, > > after some crashes we still had to care for some remaining inconsistancies reported via > ceph -w > and friends. > Well, we traced one of them down via > ceph pg dump > > and we picked 79. pg=79.7 and found the corresponding file in the /var/log/ceph/osd.2.log. > /data/osd4/current/79.7_head/rb.0.0.00000000136c__head_9FB2FA17 > and the dup on > /data/osd2/... > Strange though, they had the same checksum but reported a stat-error. Anyway. Decided to do a: > ceph pg repair 79.7 > ... byebye ceph-osd on node2! > > Here the trace: > > === 8-< === > > 2012-03-01 17:49:13.024571 7f3944584700 -- 10.10.10.14:6802/4892 >> 10.10.10.10:6802/19139 pipe(0xfcd2c80 sd=16 pgs=0 cs=0 l=0).connect protocol version mismatch, my 9 != 0 > 2012-03-01 17:49:23.674162 7f395001b700 log [ERR] : 79.7 osd.4: soid 9fb2fa17/rb.0.0.00000000136c/headextra attr _, extra attr snapset one clarification by ourselves done: one copy is missing the xattrs, checked via getfattr but why can't it be corrected, and worse this crash happens? > 2012-03-01 17:49:23.674222 7f395001b700 log [ERR] : 79.7 repair 0 missing, 1 inconsistent objects > *** Caught signal (Aborted) ** > in thread 7f395001b700 > ceph version 0.42-142-gc9416e6 (commit:c9416e6184905501159e96115f734bdf65a74d28) > 1: /usr/bin/ceph-osd() [0x5a6b89] > 2: (()+0xeff0) [0x7f3960ca5ff0] > 3: (gsignal()+0x35) [0x7f395f2841b5] > 4: (abort()+0x180) [0x7f395f286fc0] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f395fb18dc5] > 6: (()+0xcb166) [0x7f395fb17166] > 7: (()+0xcb193) [0x7f395fb17193] > 8: (()+0xcb28e) [0x7f395fb1728e] > 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x13e) [0x67c5ce] > 10: (object_info_t::decode(ceph::buffer::list::iterator&)+0x2c) [0x61663c] > 11: (PG::repair_object(hobject_t const&, ScrubMap::object*, int, int)+0x3be) [0x68d96e] > 12: (PG::scrub_finalize()+0x1438) [0x6b8568] > 13: (OSD::ScrubFinalizeWQ::_process(PG*)+0xc) [0x588edc] > 14: (ThreadPool::worker()+0xa26) [0x5bc426] > 15: (ThreadPool::WorkThread::entry()+0xd) [0x585f0d] > 16: (()+0x68ca) [0x7f3960c9d8ca] > 17: (clone()+0x6d) [0x7f395f32186d] > 2012-03-01 17:49:30.017269 7f81b662b780 ceph version 0.42-142-gc9416e6 (commit:c9416e6184905501159e96115f734bdf65a74d28), process ceph-osd, pid 3111 > 2012-03-01 17:49:30.085426 7f81b662b780 filestore(/data/osd2) mount FIEMAP ioctl is NOT supported > 2012-03-01 17:49:30.085466 7f81b662b780 filestore(/data/osd2) mount did NOT detect btrfs > 2012-03-01 17:49:30.110409 7f81b662b780 filestore(/data/osd2) mount found snaps <> > 2012-03-01 17:49:30.110476 7f81b662b780 filestore(/data/osd2) mount: enabling WRITEAHEAD journal mode: btrfs not detected > 2012-03-01 17:49:31.964977 7f81b662b780 journal _open /dev/sdc1 fd 16: 10737942528 bytes, block size 4096 bytes, directio = 1, aio = 0 > 2012-03-01 17:49:31.967549 7f81b662b780 journal read_entry 9292222464 : seq 67841857 11225 bytes > > === 8-< === > > ... after some journal-replay things calmed down, but: > > 2012-03-01 17:58:29.470446 log 2012-03-01 17:58:24.242369 osd.2 10.10.10.14:6801/3111 368 : [WRN] bad locator @56 on object @79 loc @56 op osd_op(client.44350.0:1412387 rb.0.0.00000000136c [write 2465792~49152] 56.9fb2fa17) v4 > > these type of messages we see ever so often... It corresponds, but in what way? > > Can't we assume, if both snipplets "rb.0.0..." are identical, that life's good? > We had some other inconsistancies, where we had to delete the whole pool to get rid of crappy > blocks. The ceph-osd died, too, after doing some > rbd rm <pool>/<image> > the one block in question remained, visable via > rados ls -p <pool> > > Any idea, o better clue? ;-) > > Kind reg's, > > Oliver. > > -- > > Oliver Francke > > filoo GmbH > Moltkestraße 25a > 33330 Gütersloh > HRB4355 AG Gütersloh > > Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz > > Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html