On Tue, 10 Sep 2013, Chris Dunlop wrote: > On Mon, Sep 09, 2013 at 04:30:33PM -0700, Sage Weil wrote: > > On Tue, 10 Sep 2013, Chris Dunlop wrote: > >> G'day, > >> > >> On 0.56.7-1~bpo70+1 I'm getting: > >> > >> # ceph pg dump | grep inconsistent > >> 013-09-10-08:39:59 2.bc 2776 0 0 0 11521799680 162063 162063 active+clean+inconsistent 2013-09-10 08:38:38.482302 20512'699877 20360'13461026 [6,0] [6,0] 20512'699877 2013-09-10 08:38:38.482264 20512'699877 2013-09-10 08:38:38.482264 > >> > >> # ceph pg repair 2.bc > >> instructing pg 2.bc on osd.6 to repair > >> > >> # tail /var/log/ceph/ceph-osd.6.log > >> 2013-09-10 08:17:25.557926 7fef09c14700 0 log [ERR] : repair 2.bc 89ebebc/rbd_data.13a0c74b0dc51.00000000000107ec/head//2 on disk size (4194304) does not match object info size (4104192) > >> 2013-09-10 08:17:27.316112 7fef09c14700 0 log [ERR] : 2.bc repair 1 errors, 0 fixed > >> > >> # ls -l 'ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' > >> -rw-r--r-- 1 root root 4194304 Sep 8 21:01 ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2 > >> # ls -l 'ceph-0/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' > >> -rw-r--r-- 1 root root 4194304 Sep 8 21:01 ceph-0/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2 > >> > >> One possible solution would be to simply truncate the objects down to the > >> object info size, as recommended in this case: > >> > >> http://www.spinics.net/lists/ceph-users/msg00793.html > >> > >> However I'm a little concerned about that solution as the on-disk size is > >> exactly 4MB, which I think is the expected size of these objects, and matches > >> the size of all the other objects in the same directory, and the "extra" data > >> looks a little interesting, with "FILE0" blocks in there (what are those?): > >> > >> # cd /var/lib/ceph/osd/ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/ > >> # dd if='rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' bs=1024 skip=4008 | od -c > >> 0000000 F I L E 0 \0 003 \0 312 j o o \0 \0 \0 \0 > >> 0000020 001 \0 001 \0 8 \0 001 \0 X 001 \0 \0 \0 004 \0 \0 > >> 0000040 \0 \0 \0 \0 \0 \0 \0 \0 006 \0 \0 \0 310 p 017 \0 > >> 0000060 002 \0 \0 \0 \0 \0 \0 \0 020 \0 \0 \0 ` \0 \0 \0 > >> ... > >> 0002000 F I L E 0 \0 003 \0 002 k o o \0 \0 \0 \0 > >> 0002020 001 \0 001 \0 8 \0 001 \0 X 001 \0 \0 \0 004 \0 \0 > >> 0002040 \0 \0 \0 \0 \0 \0 \0 \0 006 \0 \0 \0 311 p 017 \0 > >> 0002060 002 \0 \0 \0 \0 \0 \0 \0 020 \0 \0 \0 ` \0 \0 \0 > >> ... > >> 0004000 F I L E 0 \0 003 \0 023 r o o \0 \0 \0 \0 > >> 0004020 001 \0 001 \0 8 \0 001 \0 X 001 \0 \0 \0 004 \0 \0 > >> 0004040 \0 \0 \0 \0 \0 \0 \0 \0 006 \0 \0 \0 312 p 017 \0 > >> 0004060 002 \0 \0 \0 \0 \0 \0 \0 020 \0 \0 \0 ` \0 \0 \0 > >> > >> Is it safe to simply truncate this object, or what other solutions might > >> be applicable? > > > > The alternative is to edit the xattr. That's harder, but better. You'll > > want grab the user.ceph._ xattr, change the the one instance of 4104192 to > > 4194304, and then reset it. You can use > > > > ceph-dencoder type object_info_t import /tmp/xattrfile decode dump_json > > > > to verify that it decodes properly before and after you make the edit. I > > like the 'attr' tool for getting/setting xattrs. > > Can ceph-dencoder import the (modified) json and write out the > encoded binary suitable for setting in the xattr? It can't, sadly. > If not, what encoding is the xattr, so I can work out what I > need to do to make the change? It's little-endian. So 'printf "%x\n" $badsize' and look for that value with hexedit or whatever, and check your work with ceph-dencoder. > # getfattr -n user.ceph._ 'ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' > getfattr: Removing leading '/' from absolute path names > # file: ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\134udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2 > user.ceph._=0sCgjoAAAABANIAAAAAAAAACcAAAByYmRfZGF0YS4xM2EwYzc0YjBkYzUxLjAwMDAwMDAwMDAwMTA3ZWP+/////////7y+nggAAAAAAAIAAAAAAAAABAMQAAAAAgAAAAAAAAD/////AAAAAAAAAACrqgoAAAAAADJPAAB6dwoAAAAAADBOAAACAhUAAAAIS2YBAAAAAADm+g0fAAAAAAAAAAAAoD4AAAAAABlZLFIAh2IgAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAq6oKAAAAAAAyTwAAAA== > > > Is this still bobtail? We haven't seen this sort of corruption since > > then. > > Yup. I'll upgrade once the cluster settles down cleanly! sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html