I found a hardware error in the osd server the day before: Dec 10 05:40:20 zstore kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) Сould it affect the replication process? 2012-12-11 00:15:17.705096 7f22b27f4700 0 log [ERR] : 4.6 osd.0: soid fe0ab176/seodo1.rbd/head//4 size 0 != known size 112 2012-12-11 00:15:17.705100 7f22b27f4700 0 log [ERR] : 4.6 scrub 0 missing, 1 inconsistent objects 2012-12-11 00:15:17.706169 7f22b27f4700 0 log [ERR] : scrub 4.6 fe0ab176/seodo1.rbd/head//4 on disk size (112) does not match object info size (0) 2012-12-11 00:15:17.706452 7f22b27f4700 0 log [ERR] : 4.6 scrub 1 errors 2012-12-11 00:21:58.214974 7f23a5ffb700 0 log [ERR] : 3.5 scrub stat mismatch, got 21841/21839 objects, 199/199 clones, 90932097984/90932097760 bytes. 2012-12-11 00:21:58.214993 7f23a5ffb700 0 log [ERR] : 3.5 scrub 1 errors 2012/12/11 Vladislav Gorbunov <vadikgo@xxxxxxxxx>: > Look like the header object on broken images is empty. > > root@bender:~# rados -p iscsi stat seodo1.rbd > iscsi/seodo1.rbd mtime 1354795057, size 0 > > root@bender:~# rados -p iscsi stat siri.rbd > iscsi/siri.rbd mtime 1355151093, size 0 > > On accessible image header size not empty: > root@bender:~# rados -p iscsi stat siri1.rbd > iscsi/siri1.rbd mtime 1355174156, size 112 > > and header can't saved: > root@bender:~# rados -p iscsi get seodo1.rbd seodo1.header > 2012-12-11 11:34:06.044164 7fe732f52780 0 wrote 0 byte payload to seodo1.header > > Before this header became unreadable new osd server added and cluster > was rebalanced. One of the mon server (mon.0) crushed, and i restart > them. > > 2012/12/11 Josh Durgin <josh.durgin@xxxxxxxxxxx>: >> On 12/10/2012 01:54 PM, Vladislav Gorbunov wrote: >>> >>> but access to iscsi/seodo1 and iscsi/siri1 fail on every rbd client >>> hosts. Data completely inaccessible. >>> >>> root@bender:~# rbd info iscsi/seodo1 >>> *** Caught signal (Segmentation fault) ** >>> in thread 7fb8c93f5780 >>> ceph version 0.48.2argonaut >>> (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe) >>> 1: rbd() [0x41dfea] >>> 2: (()+0xfcb0) [0x7fb8c796fcb0] >>> 3: (()+0x16244d) [0x7fb8c6ae444d] >>> 4: (librbd::read_header_bl(librados::IoCtx&, std::string const&, >>> ceph::buffer::list&, unsigned long*)+0xf9) [0x7fb8c8fadb99] >>> 5: (librbd::read_header(librados::IoCtx&, std::string const&, >>> rbd_obj_header_ondisk*, unsigned long*)+0x82) [0x7fb8c8fadda2] >>> 6: (librbd::ictx_refresh(librbd::ImageCtx*)+0x90b) [0x7fb8c8fb05eb] >>> 7: (librbd::open_image(librbd::ImageCtx*)+0x1b5) [0x7fb8c8fb1165] >>> 8: (librbd::RBD::open(librados::IoCtx&, librbd::Image&, char const*, >>> char const*)+0x5f) [0x7fb8c8fb16af] >>> 9: (main()+0x73c) [0x41721c] >>> 10: (__libc_start_main()+0xed) [0x7fb8c69a376d] >>> 11: rbd() [0x41a0c9] >>> 2012-12-11 09:33:14.264755 7fb8c93f5780 -1 *** Caught signal >>> (Segmentation fault) ** >>> in thread 7fb8c93f5780 >> >> >> It sounds like the header object (which rbd uses to determine the >> prefix for data object names) is corrupted or otherwise inaccessible. >> >> Could you save the header object to a file ('rados -p iscsi get seodo1.rbd') >> and put that file somewhere accessible? >> >> Did anything happen to your cluster before this header became >> unreadable? Any disk problems, or osds crashing? >> >> Josh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html