Re: OSD repair: on disk size does not match object info size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 09, 2013 at 05:14:14PM -0700, Sage Weil wrote:
> On Tue, 10 Sep 2013, Chris Dunlop wrote:
>> On Mon, Sep 09, 2013 at 04:30:33PM -0700, Sage Weil wrote:
>>> On Tue, 10 Sep 2013, Chris Dunlop wrote:
>>>> G'day,
>>>> 
>>>> On 0.56.7-1~bpo70+1 I'm getting:
>>>> 
>>>> # ceph pg dump | grep inconsistent
>>>> 013-09-10-08:39:59 2.bc        2776    0       0       0       11521799680     162063  162063  active+clean+inconsistent       2013-09-10 08:38:38.482302      20512'699877    20360'13461026  [6,0]   [6,0]   20512'699877    2013-09-10 08:38:38.482264      20512'699877     2013-09-10 08:38:38.482264
>>>> 
>>>> # ceph pg repair 2.bc
>>>> instructing pg 2.bc on osd.6 to repair
>>>> 
>>>> # tail /var/log/ceph/ceph-osd.6.log
>>>> 2013-09-10 08:17:25.557926 7fef09c14700  0 log [ERR] : repair 2.bc 89ebebc/rbd_data.13a0c74b0dc51.00000000000107ec/head//2 on disk size (4194304) does not match object info size (4104192)
>>>> 2013-09-10 08:17:27.316112 7fef09c14700  0 log [ERR] : 2.bc repair 1 errors, 0 fixed
>>>> 
>>>> # ls -l 'ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2'
>>>> -rw-r--r-- 1 root root 4194304 Sep  8 21:01 ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2
>>>> # ls -l 'ceph-0/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2'
>>>> -rw-r--r-- 1 root root 4194304 Sep  8 21:01 ceph-0/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2
>>>> 
>>>> One possible solution would be to simply truncate the objects down to the
>>>> object info size, as recommended in this case:
>>>> 
>>>> http://www.spinics.net/lists/ceph-users/msg00793.html
>>>> 
>>>> However I'm a little concerned about that solution as the on-disk size is
>>>> exactly 4MB, which I think is the expected size of these objects, and matches
>>>> the size of all the other objects in the same directory, and the "extra" data
>>>> looks a little interesting, with "FILE0" blocks in there (what are those?):
>>>> 
>>>> # cd /var/lib/ceph/osd/ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/
>>>> # dd if='rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' bs=1024 skip=4008 | od -c
>>>> 0000000   F   I   L   E   0  \0 003  \0 312   j   o   o  \0  \0  \0  \0
>>>> 0000020 001  \0 001  \0   8  \0 001  \0   X 001  \0  \0  \0 004  \0  \0
>>>> 0000040  \0  \0  \0  \0  \0  \0  \0  \0 006  \0  \0  \0 310   p 017  \0
>>>> 0000060 002  \0  \0  \0  \0  \0  \0  \0 020  \0  \0  \0   `  \0  \0  \0
>>>> ...
>>>> 0002000   F   I   L   E   0  \0 003  \0 002   k   o   o  \0  \0  \0  \0
>>>> 0002020 001  \0 001  \0   8  \0 001  \0   X 001  \0  \0  \0 004  \0  \0
>>>> 0002040  \0  \0  \0  \0  \0  \0  \0  \0 006  \0  \0  \0 311   p 017  \0
>>>> 0002060 002  \0  \0  \0  \0  \0  \0  \0 020  \0  \0  \0   `  \0  \0  \0
>>>> ...
>>>> 0004000   F   I   L   E   0  \0 003  \0 023   r   o   o  \0  \0  \0  \0
>>>> 0004020 001  \0 001  \0   8  \0 001  \0   X 001  \0  \0  \0 004  \0  \0
>>>> 0004040  \0  \0  \0  \0  \0  \0  \0  \0 006  \0  \0  \0 312   p 017  \0
>>>> 0004060 002  \0  \0  \0  \0  \0  \0  \0 020  \0  \0  \0   `  \0  \0  \0
>>>> 
>>>> Is it safe to simply truncate this object, or what other solutions might
>>>> be applicable?
>>> 
>>> The alternative is to edit the xattr.  That's harder, but better.  You'll 
>>> want grab the user.ceph._ xattr, change the the one instance of 4104192 to 
>>> 4194304, and then reset it.  You can use
>>> 
>>>  ceph-dencoder type object_info_t import /tmp/xattrfile decode dump_json
>>>
>>> to verify that it decodes properly before and after you make the edit.  I 
>>> like the 'attr' tool for getting/setting xattrs.
>> 
>> Can ceph-dencoder import the (modified) json and write out the
>> encoded binary suitable for setting in the xattr?
> 
> It can't, sadly.
>  
>> If not, what encoding is the xattr, so I can work out what I
>> need to do to make the change?
> 
> It's little-endian.  So 'printf "%x\n" $badsize' and look for that value 
> with hexedit or whatever, and check your work with ceph-dencoder.

OK, for the record:

# printf '%x\n' 4104192
3ea000
# printf '%x\n' 4194304
400000
# attr -q -g ceph._ 'rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' > /tmp/attr.1
## Note: reversed bytes
# xxd /tmp/attr.1 | sed 's/a03e/0040/' | xxd -r > /tmp/attr.2
# diff -u \
  <(ceph-dencoder type object_info_t import /tmp/attr.1 decode dump_json) \
  <(ceph-dencoder type object_info_t import /tmp/attr.2 decode dump_json)
--- /dev/fd/63  2013-09-10 10:28:59.882470249 +1000
+++ /dev/fd/62  2013-09-10 10:28:59.882470249 +1000
@@ -9,7 +9,7 @@
   "version": "20274'699051",
   "prior_version": "20016'685946",
   "last_reqid": "client.91723.0:521009894",
-  "size": 4104192,
+  "size": 4194304,
   "mtime": "2013-09-08 21:01:45.543328",
   "lost": 0,
   "wrlock_by": "unknown.0.0:0",
# attr -s ceph._ 'rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' < /tmp/attr.2

(And repeat the 'attr -s' on the secondary storage)

...and repairing again!

Thanks for your help.

Chris.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux