Re: Corruption by missing blocks

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 23 Apr 2013 15:41:28 -0700



On Tue, Apr 23, 2013 at 3:37 PM, Bryan Stillwell
<bstillwell@xxxxxxxxxxxxxxx> wrote:
> I'm using the kernel client that's built into precise & quantal.
>
> I could give the ceph-fuse client a try and see if it has the same
> issue.  I haven't used it before, so I'll have to do some reading
> first.

If you've got the time that would be a good data point, and make
debugging easier if it reproduces. There's not a ton to learn — you
install the ceph-fuse package (I think it's packaged separately,
anyway) and then instead of "mount" you run "ceph-fuse -c <ceph.conf
file> --name client.<name> --keyring <keyring_file>" or similar. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


>
> Bryan
>
> On Tue, Apr 23, 2013 at 4:04 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> Sorry, I meant kernel client or ceph-fuse? Client logs would be enough
>> to start with, I suppose — "debug client = 20" and "debug ms = 1" if
>> using ceph-fuse; if using the kernel client things get tricker; I'd
>> have to look at what logging is available without the debugfs stuff
>> being enabled. :/
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Tue, Apr 23, 2013 at 3:00 PM, Bryan Stillwell
>> <bstillwell@xxxxxxxxxxxxxxx> wrote:
>>> I've tried a few different ones:
>>>
>>> 1. cp to cephfs mounted filesystem on Ubuntu 12.10 (quantal)
>>> 2. rsync over ssh to cephfs mounted filesystem on Ubuntu 12.04.2 (precise)
>>> 3. scp to cephfs mounted filesystem on Ubuntu 12.04.2 (precise)
>>>
>>> It's fairly reproducible, so I can collect logs for you.  Which ones
>>> would you be interested in?
>>>
>>> The cluster has been in a couple states during testing (during
>>> expansion/rebalancing and during an all active+clean state).
>>>
>>> BTW, all the nodes are running with the 0.56.4-1precise packages.
>>>
>>> Bryan
>>>
>>> On Tue, Apr 23, 2013 at 12:56 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>>> On Tue, Apr 23, 2013 at 11:38 AM, Bryan Stillwell
>>>> <bstillwell@xxxxxxxxxxxxxxx> wrote:
>>>>> I've run into an issue where after copying a file to my cephfs cluster
>>>>> the md5sums no longer match.  I believe I've tracked it down to some
>>>>> parts of the file which are missing:
>>>>>
>>>>> $ obj_name=$(cephfs "title1.mkv" show_location -l 0 | grep object_name
>>>>> | sed -e "s/.*:\W*\([0-9a-f]*\)\.[0-9a-f]*/\1/")
>>>>> $ echo "Object name: $obj_name"
>>>>> Object name: 10000001120
>>>>>
>>>>> $ file_size=$(stat "title1.mkv" | grep Size | awk '{ print $2 }')
>>>>> $ printf "File size: %d MiB (%d Bytes)\n" $(($file_size/1048576)) $file_size
>>>>> File size: 20074 MiB (21049178117 Bytes)
>>>>>
>>>>> $ blocks=$((file_size/4194304+1))
>>>>> $ printf "Blocks: %d\n" $blocks
>>>>> Blocks: 5019
>>>>>
>>>>> $ for b in `seq 0 $(($blocks-1))`; do rados -p data stat
>>>>> ${obj_name}.`printf '%8.8x\n' $b` | grep "error"; done
>>>>>  error stat-ing data/10000001120.00001076: No such file or directory
>>>>>  error stat-ing data/10000001120.000011c7: No such file or directory
>>>>>  error stat-ing data/10000001120.0000129c: No such file or directory
>>>>>  error stat-ing data/10000001120.000012f4: No such file or directory
>>>>>  error stat-ing data/10000001120.00001307: No such file or directory
>>>>>
>>>>>
>>>>> Any ideas where to look to investigate what caused these blocks to not
>>>>> be written?
>>>>
>>>> What client are you using to write this? Is it fairly reproducible (so
>>>> you could collect logs of it happening)?
>>>>
>>>> Usually the only times I've seen anything like this were when either
>>>> the file data was supposed to go into a pool which the client didn't
>>>> have write permissions on, or when the RADOS cluster was in bad shape
>>>> and so the data never got flushed to disk. Has your cluster been
>>>> healthy since you started writing the file out?
>>>> -Greg
>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>
>>>>
>>>>>
>>>>> Here's the current state of the cluster:
>>>>>
>>>>> ceph -s
>>>>>    health HEALTH_OK
>>>>>    monmap e1: 1 mons at {a=172.24.88.50:6789/0}, election epoch 1, quorum 0 a
>>>>>    osdmap e22059: 24 osds: 24 up, 24 in
>>>>>     pgmap v1783615: 1920 pgs: 1917 active+clean, 3
>>>>> active+clean+scrubbing+deep; 4667 GB data, 9381 GB used, 4210 GB /
>>>>> 13592 GB avail
>>>>>    mdsmap e437: 1/1/1 up {0=a=up:active}
>>>>>
>>>>> Here's my current crushmap:
>>>>>
>>>>> # begin crush map
>>>>>
>>>>> # devices
>>>>> device 0 osd.0
>>>>> device 1 osd.1
>>>>> device 2 osd.2
>>>>> device 3 osd.3
>>>>> device 4 osd.4
>>>>> device 5 osd.5
>>>>> device 6 osd.6
>>>>> device 7 osd.7
>>>>> device 8 osd.8
>>>>> device 9 osd.9
>>>>> device 10 osd.10
>>>>> device 11 osd.11
>>>>> device 12 osd.12
>>>>> device 13 osd.13
>>>>> device 14 osd.14
>>>>> device 15 osd.15
>>>>> device 16 osd.16
>>>>> device 17 osd.17
>>>>> device 18 osd.18
>>>>> device 19 osd.19
>>>>> device 20 osd.20
>>>>> device 21 osd.21
>>>>> device 22 osd.22
>>>>> device 23 osd.23
>>>>>
>>>>> # types
>>>>> type 0 osd
>>>>> type 1 host
>>>>> type 2 rack
>>>>> type 3 row
>>>>> type 4 room
>>>>> type 5 datacenter
>>>>> type 6 pool
>>>>>
>>>>> # buckets
>>>>> host b1 {
>>>>>         id -2           # do not change unnecessarily
>>>>>         # weight 2.980
>>>>>         alg straw
>>>>>         hash 0  # rjenkins1
>>>>>         item osd.0 weight 0.500
>>>>>         item osd.1 weight 0.500
>>>>>         item osd.2 weight 0.500
>>>>>         item osd.3 weight 0.500
>>>>>         item osd.4 weight 0.500
>>>>>         item osd.20 weight 0.480
>>>>> }
>>>>> host b2 {
>>>>>         id -4           # do not change unnecessarily
>>>>>         # weight 4.680
>>>>>         alg straw
>>>>>         hash 0  # rjenkins1
>>>>>         item osd.5 weight 0.500
>>>>>         item osd.6 weight 0.500
>>>>>         item osd.7 weight 2.200
>>>>>         item osd.8 weight 0.500
>>>>>         item osd.9 weight 0.500
>>>>>         item osd.21 weight 0.480
>>>>> }
>>>>> host b3 {
>>>>>         id -5           # do not change unnecessarily
>>>>>         # weight 3.480
>>>>>         alg straw
>>>>>         hash 0  # rjenkins1
>>>>>         item osd.10 weight 0.500
>>>>>         item osd.11 weight 0.500
>>>>>         item osd.12 weight 1.000
>>>>>         item osd.13 weight 0.500
>>>>>         item osd.14 weight 0.500
>>>>>         item osd.22 weight 0.480
>>>>> }
>>>>> host b4 {
>>>>>         id -6           # do not change unnecessarily
>>>>>         # weight 3.480
>>>>>         alg straw
>>>>>         hash 0  # rjenkins1
>>>>>         item osd.15 weight 0.500
>>>>>         item osd.16 weight 1.000
>>>>>         item osd.17 weight 0.500
>>>>>         item osd.18 weight 0.500
>>>>>         item osd.19 weight 0.500
>>>>>         item osd.23 weight 0.480
>>>>> }
>>>>> pool default {
>>>>>         id -1           # do not change unnecessarily
>>>>>         # weight 14.620
>>>>>         alg straw
>>>>>         hash 0  # rjenkins1
>>>>>         item b1 weight 2.980
>>>>>         item b2 weight 4.680
>>>>>         item b3 weight 3.480
>>>>>         item b4 weight 3.480
>>>>> }
>>>>>
>>>>> # rules
>>>>> rule data {
>>>>>         ruleset 0
>>>>>         type replicated
>>>>>         min_size 2
>>>>>         max_size 10
>>>>>         step take default
>>>>>         step chooseleaf firstn 0 type host
>>>>>         step emit
>>>>> }
>>>>> rule metadata {
>>>>>         ruleset 1
>>>>>         type replicated
>>>>>         min_size 2
>>>>>         max_size 10
>>>>>         step take default
>>>>>         step chooseleaf firstn 0 type host
>>>>>         step emit
>>>>> }
>>>>> rule rbd {
>>>>>         ruleset 2
>>>>>         type replicated
>>>>>         min_size 1
>>>>>         max_size 10
>>>>>         step take default
>>>>>         step chooseleaf firstn 0 type host
>>>>>         step emit
>>>>> }
>>>>>
>>>>> # end crush map
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Bryan
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
>
>
> Bryan Stillwell
> SENIOR SYSTEM ADMINISTRATOR
>
> E: bstillwell@xxxxxxxxxxxxxxx
> O: 303.228.5109
> M: 970.310.6085
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com