On Mon, Sep 09, 2013 at 04:30:33PM -0700, Sage Weil wrote: > On Tue, 10 Sep 2013, Chris Dunlop wrote: >> G'day, >> >> On 0.56.7-1~bpo70+1 I'm getting: >> >> # ceph pg dump | grep inconsistent >> 013-09-10-08:39:59 2.bc 2776 0 0 0 11521799680 162063 162063 active+clean+inconsistent 2013-09-10 08:38:38.482302 20512'699877 20360'13461026 [6,0] [6,0] 20512'699877 2013-09-10 08:38:38.482264 20512'699877 2013-09-10 08:38:38.482264 >> >> # ceph pg repair 2.bc >> instructing pg 2.bc on osd.6 to repair >> >> # tail /var/log/ceph/ceph-osd.6.log >> 2013-09-10 08:17:25.557926 7fef09c14700 0 log [ERR] : repair 2.bc 89ebebc/rbd_data.13a0c74b0dc51.00000000000107ec/head//2 on disk size (4194304) does not match object info size (4104192) >> 2013-09-10 08:17:27.316112 7fef09c14700 0 log [ERR] : 2.bc repair 1 errors, 0 fixed >> >> # ls -l 'ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' >> -rw-r--r-- 1 root root 4194304 Sep 8 21:01 ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2 >> # ls -l 'ceph-0/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' >> -rw-r--r-- 1 root root 4194304 Sep 8 21:01 ceph-0/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2 >> >> One possible solution would be to simply truncate the objects down to the >> object info size, as recommended in this case: >> >> http://www.spinics.net/lists/ceph-users/msg00793.html >> >> However I'm a little concerned about that solution as the on-disk size is >> exactly 4MB, which I think is the expected size of these objects, and matches >> the size of all the other objects in the same directory, and the "extra" data >> looks a little interesting, with "FILE0" blocks in there (what are those?): >> >> # cd /var/lib/ceph/osd/ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/ >> # dd if='rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' bs=1024 skip=4008 | od -c >> 0000000 F I L E 0 \0 003 \0 312 j o o \0 \0 \0 \0 >> 0000020 001 \0 001 \0 8 \0 001 \0 X 001 \0 \0 \0 004 \0 \0 >> 0000040 \0 \0 \0 \0 \0 \0 \0 \0 006 \0 \0 \0 310 p 017 \0 >> 0000060 002 \0 \0 \0 \0 \0 \0 \0 020 \0 \0 \0 ` \0 \0 \0 >> ... >> 0002000 F I L E 0 \0 003 \0 002 k o o \0 \0 \0 \0 >> 0002020 001 \0 001 \0 8 \0 001 \0 X 001 \0 \0 \0 004 \0 \0 >> 0002040 \0 \0 \0 \0 \0 \0 \0 \0 006 \0 \0 \0 311 p 017 \0 >> 0002060 002 \0 \0 \0 \0 \0 \0 \0 020 \0 \0 \0 ` \0 \0 \0 >> ... >> 0004000 F I L E 0 \0 003 \0 023 r o o \0 \0 \0 \0 >> 0004020 001 \0 001 \0 8 \0 001 \0 X 001 \0 \0 \0 004 \0 \0 >> 0004040 \0 \0 \0 \0 \0 \0 \0 \0 006 \0 \0 \0 312 p 017 \0 >> 0004060 002 \0 \0 \0 \0 \0 \0 \0 020 \0 \0 \0 ` \0 \0 \0 >> >> Is it safe to simply truncate this object, or what other solutions might >> be applicable? > > The alternative is to edit the xattr. That's harder, but better. You'll > want grab the user.ceph._ xattr, change the the one instance of 4104192 to > 4194304, and then reset it. You can use > > ceph-dencoder type object_info_t import /tmp/xattrfile decode dump_json > > to verify that it decodes properly before and after you make the edit. I > like the 'attr' tool for getting/setting xattrs. Can ceph-dencoder import the (modified) json and write out the encoded binary suitable for setting in the xattr? If not, what encoding is the xattr, so I can work out what I need to do to make the change? # getfattr -n user.ceph._ 'ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' getfattr: Removing leading '/' from absolute path names # file: ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\134udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2 user.ceph._=0sCgjoAAAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xM2EwYzc0YjBkYzUxLjAwMDAwMDAwMDAwMTA3ZWP+/////////7y+nggAAAAAAAIAAAAAAAAABAMQAAAAAgAAAAAAAAD/////AAAAAAAAAACrqgoAAAAAADJPAAB6dwoAAAAAADBOAAACAhUAAAAIS2YBAAAAAADm+g0fAAAAAAAAAAAAoD4AAAAAABlZLFIAh2IgAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAq6oKAAAAAAAyTwAAAA== > Is this still bobtail? We haven't seen this sort of corruption since > then. Yup. I'll upgrade once the cluster settles down cleanly! Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html