I'm using the kernel client that's built into precise & quantal. I could give the ceph-fuse client a try and see if it has the same issue. I haven't used it before, so I'll have to do some reading first. Bryan On Tue, Apr 23, 2013 at 4:04 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > Sorry, I meant kernel client or ceph-fuse? Client logs would be enough > to start with, I suppose — "debug client = 20" and "debug ms = 1" if > using ceph-fuse; if using the kernel client things get tricker; I'd > have to look at what logging is available without the debugfs stuff > being enabled. :/ > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Tue, Apr 23, 2013 at 3:00 PM, Bryan Stillwell > <bstillwell@xxxxxxxxxxxxxxx> wrote: >> I've tried a few different ones: >> >> 1. cp to cephfs mounted filesystem on Ubuntu 12.10 (quantal) >> 2. rsync over ssh to cephfs mounted filesystem on Ubuntu 12.04.2 (precise) >> 3. scp to cephfs mounted filesystem on Ubuntu 12.04.2 (precise) >> >> It's fairly reproducible, so I can collect logs for you. Which ones >> would you be interested in? >> >> The cluster has been in a couple states during testing (during >> expansion/rebalancing and during an all active+clean state). >> >> BTW, all the nodes are running with the 0.56.4-1precise packages. >> >> Bryan >> >> On Tue, Apr 23, 2013 at 12:56 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>> On Tue, Apr 23, 2013 at 11:38 AM, Bryan Stillwell >>> <bstillwell@xxxxxxxxxxxxxxx> wrote: >>>> I've run into an issue where after copying a file to my cephfs cluster >>>> the md5sums no longer match. I believe I've tracked it down to some >>>> parts of the file which are missing: >>>> >>>> $ obj_name=$(cephfs "title1.mkv" show_location -l 0 | grep object_name >>>> | sed -e "s/.*:\W*\([0-9a-f]*\)\.[0-9a-f]*/\1/") >>>> $ echo "Object name: $obj_name" >>>> Object name: 10000001120 >>>> >>>> $ file_size=$(stat "title1.mkv" | grep Size | awk '{ print $2 }') >>>> $ printf "File size: %d MiB (%d Bytes)\n" $(($file_size/1048576)) $file_size >>>> File size: 20074 MiB (21049178117 Bytes) >>>> >>>> $ blocks=$((file_size/4194304+1)) >>>> $ printf "Blocks: %d\n" $blocks >>>> Blocks: 5019 >>>> >>>> $ for b in `seq 0 $(($blocks-1))`; do rados -p data stat >>>> ${obj_name}.`printf '%8.8x\n' $b` | grep "error"; done >>>> error stat-ing data/10000001120.00001076: No such file or directory >>>> error stat-ing data/10000001120.000011c7: No such file or directory >>>> error stat-ing data/10000001120.0000129c: No such file or directory >>>> error stat-ing data/10000001120.000012f4: No such file or directory >>>> error stat-ing data/10000001120.00001307: No such file or directory >>>> >>>> >>>> Any ideas where to look to investigate what caused these blocks to not >>>> be written? >>> >>> What client are you using to write this? Is it fairly reproducible (so >>> you could collect logs of it happening)? >>> >>> Usually the only times I've seen anything like this were when either >>> the file data was supposed to go into a pool which the client didn't >>> have write permissions on, or when the RADOS cluster was in bad shape >>> and so the data never got flushed to disk. Has your cluster been >>> healthy since you started writing the file out? >>> -Greg >>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>> >>> >>>> >>>> Here's the current state of the cluster: >>>> >>>> ceph -s >>>> health HEALTH_OK >>>> monmap e1: 1 mons at {a=172.24.88.50:6789/0}, election epoch 1, quorum 0 a >>>> osdmap e22059: 24 osds: 24 up, 24 in >>>> pgmap v1783615: 1920 pgs: 1917 active+clean, 3 >>>> active+clean+scrubbing+deep; 4667 GB data, 9381 GB used, 4210 GB / >>>> 13592 GB avail >>>> mdsmap e437: 1/1/1 up {0=a=up:active} >>>> >>>> Here's my current crushmap: >>>> >>>> # begin crush map >>>> >>>> # devices >>>> device 0 osd.0 >>>> device 1 osd.1 >>>> device 2 osd.2 >>>> device 3 osd.3 >>>> device 4 osd.4 >>>> device 5 osd.5 >>>> device 6 osd.6 >>>> device 7 osd.7 >>>> device 8 osd.8 >>>> device 9 osd.9 >>>> device 10 osd.10 >>>> device 11 osd.11 >>>> device 12 osd.12 >>>> device 13 osd.13 >>>> device 14 osd.14 >>>> device 15 osd.15 >>>> device 16 osd.16 >>>> device 17 osd.17 >>>> device 18 osd.18 >>>> device 19 osd.19 >>>> device 20 osd.20 >>>> device 21 osd.21 >>>> device 22 osd.22 >>>> device 23 osd.23 >>>> >>>> # types >>>> type 0 osd >>>> type 1 host >>>> type 2 rack >>>> type 3 row >>>> type 4 room >>>> type 5 datacenter >>>> type 6 pool >>>> >>>> # buckets >>>> host b1 { >>>> id -2 # do not change unnecessarily >>>> # weight 2.980 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item osd.0 weight 0.500 >>>> item osd.1 weight 0.500 >>>> item osd.2 weight 0.500 >>>> item osd.3 weight 0.500 >>>> item osd.4 weight 0.500 >>>> item osd.20 weight 0.480 >>>> } >>>> host b2 { >>>> id -4 # do not change unnecessarily >>>> # weight 4.680 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item osd.5 weight 0.500 >>>> item osd.6 weight 0.500 >>>> item osd.7 weight 2.200 >>>> item osd.8 weight 0.500 >>>> item osd.9 weight 0.500 >>>> item osd.21 weight 0.480 >>>> } >>>> host b3 { >>>> id -5 # do not change unnecessarily >>>> # weight 3.480 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item osd.10 weight 0.500 >>>> item osd.11 weight 0.500 >>>> item osd.12 weight 1.000 >>>> item osd.13 weight 0.500 >>>> item osd.14 weight 0.500 >>>> item osd.22 weight 0.480 >>>> } >>>> host b4 { >>>> id -6 # do not change unnecessarily >>>> # weight 3.480 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item osd.15 weight 0.500 >>>> item osd.16 weight 1.000 >>>> item osd.17 weight 0.500 >>>> item osd.18 weight 0.500 >>>> item osd.19 weight 0.500 >>>> item osd.23 weight 0.480 >>>> } >>>> pool default { >>>> id -1 # do not change unnecessarily >>>> # weight 14.620 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item b1 weight 2.980 >>>> item b2 weight 4.680 >>>> item b3 weight 3.480 >>>> item b4 weight 3.480 >>>> } >>>> >>>> # rules >>>> rule data { >>>> ruleset 0 >>>> type replicated >>>> min_size 2 >>>> max_size 10 >>>> step take default >>>> step chooseleaf firstn 0 type host >>>> step emit >>>> } >>>> rule metadata { >>>> ruleset 1 >>>> type replicated >>>> min_size 2 >>>> max_size 10 >>>> step take default >>>> step chooseleaf firstn 0 type host >>>> step emit >>>> } >>>> rule rbd { >>>> ruleset 2 >>>> type replicated >>>> min_size 1 >>>> max_size 10 >>>> step take default >>>> step chooseleaf firstn 0 type host >>>> step emit >>>> } >>>> >>>> # end crush map >>>> >>>> >>>> Thanks, >>>> Bryan >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Bryan Stillwell SENIOR SYSTEM ADMINISTRATOR E: bstillwell@xxxxxxxxxxxxxxx O: 303.228.5109 M: 970.310.6085 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com