Re: Corruption by missing blocks

Bryan Stillwell <bstillwell@xxxxxxxxxxxxxxx> · Tue, 23 Apr 2013 17:14:34 -0600

I'm testing this now, but while going through the logs I saw something
that might have something to do with this:

Apr 23 16:35:28 a1 kernel: [692455.496594] libceph: corrupt inc osdmap
epoch 22146 off 102 (ffff88021e0dc802 of
ffff88021e0dc79c-ffff88021e0dc802)
Apr 23 16:35:28 a1 kernel: [692455.505154] osdmap: 00000000: 05 00 69
17 a0 33 34 39 4f d7 88 db 46 c9 e1 df  ..i..349O...F...
Apr 23 16:35:28 a1 kernel: [692455.505158] osdmap: 00000010: 0d 6e 82
56 00 00 b0 0c 77 51 00 1a 00 22 ff ff  .n.V....wQ..."..
Apr 23 16:35:28 a1 kernel: [692455.505161] osdmap: 00000020: ff ff ff
ff ff ff 00 00 00 00 00 00 00 00 ff ff  ................
Apr 23 16:35:28 a1 kernel: [692455.505163] osdmap: 00000030: ff ff 00
00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 23 16:35:28 a1 kernel: [692455.505166] osdmap: 00000040: 00 00 00
00 00 00 00 00 00 00 01 00 00 00 ff ff  ................
Apr 23 16:35:28 a1 kernel: [692455.505169] osdmap: 00000050: 5c 02 00
00 00 00 03 00 00 00 0c 00 00 00 00 00  \...............
Apr 23 16:35:28 a1 kernel: [692455.505171] osdmap: 00000060: 00 00 02
00 00 00                                ......
Apr 23 16:35:28 a1 kernel: [692455.505174] libceph: osdc handle_map corrupt msg
Apr 23 16:35:28 a1 kernel: [692455.513590] header: 00000000: 90 03 00
00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 23 16:35:28 a1 kernel: [692455.513593] header: 00000010: 29 00 c4
00 01 00 86 00 00 00 00 00 00 00 00 00  )...............
Apr 23 16:35:28 a1 kernel: [692455.513596] header: 00000020: 00 00 00
00 01 00 00 00 00 00 00 00 00 01 00 00  ................
Apr 23 16:35:28 a1 kernel: [692455.513599] header: 00000030: 00 5d 68
c5 e8                                   .]h..
Apr 23 16:35:28 a1 kernel: [692455.513602]  front: 00000000: 69 17 a0
33 34 39 4f d7 88 db 46 c9 e1 df 0d 6e  i..349O...F....n
Apr 23 16:35:28 a1 kernel: [692455.513605]  front: 00000010: 01 00 00
00 82 56 00 00 66 00 00 00 05 00 69 17  .....V..f.....i.
Apr 23 16:35:28 a1 kernel: [692455.513607]  front: 00000020: a0 33 34
39 4f d7 88 db 46 c9 e1 df 0d 6e 82 56  .349O...F....n.V
Apr 23 16:35:28 a1 kernel: [692455.513610]  front: 00000030: 00 00 b0
0c 77 51 00 1a 00 22 ff ff ff ff ff ff  ....wQ..."......
Apr 23 16:35:28 a1 kernel: [692455.513613]  front: 00000040: ff ff 00
00 00 00 00 00 00 00 ff ff ff ff 00 00  ................
Apr 23 16:35:28 a1 kernel: [692455.513616]  front: 00000050: 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 23 16:35:28 a1 kernel: [692455.513618]  front: 00000060: 00 00 00
00 00 00 01 00 00 00 ff ff 5c 02 00 00  ............\...
Apr 23 16:35:28 a1 kernel: [692455.513621]  front: 00000070: 00 00 03
00 00 00 0c 00 00 00 00 00 00 00 02 00  ................
Apr 23 16:35:28 a1 kernel: [692455.513624]  front: 00000080: 00 00 00
00 00 00                                ......
Apr 23 16:35:28 a1 kernel: [692455.513627] footer: 00000000: ae ee 1e
d8 00 00 00 00 00 00 00 00 01           .............

On Tue, Apr 23, 2013 at 4:41 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> On Tue, Apr 23, 2013 at 3:37 PM, Bryan Stillwell
> <bstillwell@xxxxxxxxxxxxxxx> wrote:
>> I'm using the kernel client that's built into precise & quantal.
>>
>> I could give the ceph-fuse client a try and see if it has the same
>> issue.  I haven't used it before, so I'll have to do some reading
>> first.
>
> If you've got the time that would be a good data point, and make
> debugging easier if it reproduces. There's not a ton to learn — you
> install the ceph-fuse package (I think it's packaged separately,
> anyway) and then instead of "mount" you run "ceph-fuse -c <ceph.conf
> file> --name client.<name> --keyring <keyring_file>" or similar. :)
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
>>
>> Bryan
>>
>> On Tue, Apr 23, 2013 at 4:04 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>> Sorry, I meant kernel client or ceph-fuse? Client logs would be enough
>>> to start with, I suppose — "debug client = 20" and "debug ms = 1" if
>>> using ceph-fuse; if using the kernel client things get tricker; I'd
>>> have to look at what logging is available without the debugfs stuff
>>> being enabled. :/
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>>
>>> On Tue, Apr 23, 2013 at 3:00 PM, Bryan Stillwell
>>> <bstillwell@xxxxxxxxxxxxxxx> wrote:
>>>> I've tried a few different ones:
>>>>
>>>> 1. cp to cephfs mounted filesystem on Ubuntu 12.10 (quantal)
>>>> 2. rsync over ssh to cephfs mounted filesystem on Ubuntu 12.04.2 (precise)
>>>> 3. scp to cephfs mounted filesystem on Ubuntu 12.04.2 (precise)
>>>>
>>>> It's fairly reproducible, so I can collect logs for you.  Which ones
>>>> would you be interested in?
>>>>
>>>> The cluster has been in a couple states during testing (during
>>>> expansion/rebalancing and during an all active+clean state).
>>>>
>>>> BTW, all the nodes are running with the 0.56.4-1precise packages.
>>>>
>>>> Bryan
>>>>
>>>> On Tue, Apr 23, 2013 at 12:56 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>>>> On Tue, Apr 23, 2013 at 11:38 AM, Bryan Stillwell
>>>>> <bstillwell@xxxxxxxxxxxxxxx> wrote:
>>>>>> I've run into an issue where after copying a file to my cephfs cluster
>>>>>> the md5sums no longer match.  I believe I've tracked it down to some
>>>>>> parts of the file which are missing:
>>>>>>
>>>>>> $ obj_name=$(cephfs "title1.mkv" show_location -l 0 | grep object_name
>>>>>> | sed -e "s/.*:\W*\([0-9a-f]*\)\.[0-9a-f]*/\1/")
>>>>>> $ echo "Object name: $obj_name"
>>>>>> Object name: 10000001120
>>>>>>
>>>>>> $ file_size=$(stat "title1.mkv" | grep Size | awk '{ print $2 }')
>>>>>> $ printf "File size: %d MiB (%d Bytes)\n" $(($file_size/1048576)) $file_size
>>>>>> File size: 20074 MiB (21049178117 Bytes)
>>>>>>
>>>>>> $ blocks=$((file_size/4194304+1))
>>>>>> $ printf "Blocks: %d\n" $blocks
>>>>>> Blocks: 5019
>>>>>>
>>>>>> $ for b in `seq 0 $(($blocks-1))`; do rados -p data stat
>>>>>> ${obj_name}.`printf '%8.8x\n' $b` | grep "error"; done
>>>>>>  error stat-ing data/10000001120.00001076: No such file or directory
>>>>>>  error stat-ing data/10000001120.000011c7: No such file or directory
>>>>>>  error stat-ing data/10000001120.0000129c: No such file or directory
>>>>>>  error stat-ing data/10000001120.000012f4: No such file or directory
>>>>>>  error stat-ing data/10000001120.00001307: No such file or directory
>>>>>>
>>>>>>
>>>>>> Any ideas where to look to investigate what caused these blocks to not
>>>>>> be written?
>>>>>
>>>>> What client are you using to write this? Is it fairly reproducible (so
>>>>> you could collect logs of it happening)?
>>>>>
>>>>> Usually the only times I've seen anything like this were when either
>>>>> the file data was supposed to go into a pool which the client didn't
>>>>> have write permissions on, or when the RADOS cluster was in bad shape
>>>>> and so the data never got flushed to disk. Has your cluster been
>>>>> healthy since you started writing the file out?
>>>>> -Greg
>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>>
>>>>>
>>>>>>
>>>>>> Here's the current state of the cluster:
>>>>>>
>>>>>> ceph -s
>>>>>>    health HEALTH_OK
>>>>>>    monmap e1: 1 mons at {a=172.24.88.50:6789/0}, election epoch 1, quorum 0 a
>>>>>>    osdmap e22059: 24 osds: 24 up, 24 in
>>>>>>     pgmap v1783615: 1920 pgs: 1917 active+clean, 3
>>>>>> active+clean+scrubbing+deep; 4667 GB data, 9381 GB used, 4210 GB /
>>>>>> 13592 GB avail
>>>>>>    mdsmap e437: 1/1/1 up {0=a=up:active}
>>>>>>
>>>>>> Here's my current crushmap:
>>>>>>
>>>>>> # begin crush map
>>>>>>
>>>>>> # devices
>>>>>> device 0 osd.0
>>>>>> device 1 osd.1
>>>>>> device 2 osd.2
>>>>>> device 3 osd.3
>>>>>> device 4 osd.4
>>>>>> device 5 osd.5
>>>>>> device 6 osd.6
>>>>>> device 7 osd.7
>>>>>> device 8 osd.8
>>>>>> device 9 osd.9
>>>>>> device 10 osd.10
>>>>>> device 11 osd.11
>>>>>> device 12 osd.12
>>>>>> device 13 osd.13
>>>>>> device 14 osd.14
>>>>>> device 15 osd.15
>>>>>> device 16 osd.16
>>>>>> device 17 osd.17
>>>>>> device 18 osd.18
>>>>>> device 19 osd.19
>>>>>> device 20 osd.20
>>>>>> device 21 osd.21
>>>>>> device 22 osd.22
>>>>>> device 23 osd.23
>>>>>>
>>>>>> # types
>>>>>> type 0 osd
>>>>>> type 1 host
>>>>>> type 2 rack
>>>>>> type 3 row
>>>>>> type 4 room
>>>>>> type 5 datacenter
>>>>>> type 6 pool
>>>>>>
>>>>>> # buckets
>>>>>> host b1 {
>>>>>>         id -2           # do not change unnecessarily
>>>>>>         # weight 2.980
>>>>>>         alg straw
>>>>>>         hash 0  # rjenkins1
>>>>>>         item osd.0 weight 0.500
>>>>>>         item osd.1 weight 0.500
>>>>>>         item osd.2 weight 0.500
>>>>>>         item osd.3 weight 0.500
>>>>>>         item osd.4 weight 0.500
>>>>>>         item osd.20 weight 0.480
>>>>>> }
>>>>>> host b2 {
>>>>>>         id -4           # do not change unnecessarily
>>>>>>         # weight 4.680
>>>>>>         alg straw
>>>>>>         hash 0  # rjenkins1
>>>>>>         item osd.5 weight 0.500
>>>>>>         item osd.6 weight 0.500
>>>>>>         item osd.7 weight 2.200
>>>>>>         item osd.8 weight 0.500
>>>>>>         item osd.9 weight 0.500
>>>>>>         item osd.21 weight 0.480
>>>>>> }
>>>>>> host b3 {
>>>>>>         id -5           # do not change unnecessarily
>>>>>>         # weight 3.480
>>>>>>         alg straw
>>>>>>         hash 0  # rjenkins1
>>>>>>         item osd.10 weight 0.500
>>>>>>         item osd.11 weight 0.500
>>>>>>         item osd.12 weight 1.000
>>>>>>         item osd.13 weight 0.500
>>>>>>         item osd.14 weight 0.500
>>>>>>         item osd.22 weight 0.480
>>>>>> }
>>>>>> host b4 {
>>>>>>         id -6           # do not change unnecessarily
>>>>>>         # weight 3.480
>>>>>>         alg straw
>>>>>>         hash 0  # rjenkins1
>>>>>>         item osd.15 weight 0.500
>>>>>>         item osd.16 weight 1.000
>>>>>>         item osd.17 weight 0.500
>>>>>>         item osd.18 weight 0.500
>>>>>>         item osd.19 weight 0.500
>>>>>>         item osd.23 weight 0.480
>>>>>> }
>>>>>> pool default {
>>>>>>         id -1           # do not change unnecessarily
>>>>>>         # weight 14.620
>>>>>>         alg straw
>>>>>>         hash 0  # rjenkins1
>>>>>>         item b1 weight 2.980
>>>>>>         item b2 weight 4.680
>>>>>>         item b3 weight 3.480
>>>>>>         item b4 weight 3.480
>>>>>> }
>>>>>>
>>>>>> # rules
>>>>>> rule data {
>>>>>>         ruleset 0
>>>>>>         type replicated
>>>>>>         min_size 2
>>>>>>         max_size 10
>>>>>>         step take default
>>>>>>         step chooseleaf firstn 0 type host
>>>>>>         step emit
>>>>>> }
>>>>>> rule metadata {
>>>>>>         ruleset 1
>>>>>>         type replicated
>>>>>>         min_size 2
>>>>>>         max_size 10
>>>>>>         step take default
>>>>>>         step chooseleaf firstn 0 type host
>>>>>>         step emit
>>>>>> }
>>>>>> rule rbd {
>>>>>>         ruleset 2
>>>>>>         type replicated
>>>>>>         min_size 1
>>>>>>         max_size 10
>>>>>>         step take default
>>>>>>         step chooseleaf firstn 0 type host
>>>>>>         step emit
>>>>>> }
>>>>>>
>>>>>> # end crush map
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Bryan
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>>
>>
>> Bryan Stillwell
>> SENIOR SYSTEM ADMINISTRATOR
>>
>> E: bstillwell@xxxxxxxxxxxxxxx
>> O: 303.228.5109
>> M: 970.310.6085

--

Bryan Stillwell
SENIOR SYSTEM ADMINISTRATOR

E: bstillwell@xxxxxxxxxxxxxxx
O: 303.228.5109
M: 970.310.6085
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com