On Tue, 23 Apr 2013, Bryan Stillwell wrote: > I'm testing this now, but while going through the logs I saw something > that might have something to do with this: > > Apr 23 16:35:28 a1 kernel: [692455.496594] libceph: corrupt inc osdmap > epoch 22146 off 102 (ffff88021e0dc802 of > ffff88021e0dc79c-ffff88021e0dc802) Oh, that's not right... What kernel version is this? Which ceph version? Thanks- sage > Apr 23 16:35:28 a1 kernel: [692455.505154] osdmap: 00000000: 05 00 69 > 17 a0 33 34 39 4f d7 88 db 46 c9 e1 df ..i..349O...F... > Apr 23 16:35:28 a1 kernel: [692455.505158] osdmap: 00000010: 0d 6e 82 > 56 00 00 b0 0c 77 51 00 1a 00 22 ff ff .n.V....wQ...".. > Apr 23 16:35:28 a1 kernel: [692455.505161] osdmap: 00000020: ff ff ff > ff ff ff 00 00 00 00 00 00 00 00 ff ff ................ > Apr 23 16:35:28 a1 kernel: [692455.505163] osdmap: 00000030: ff ff 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > Apr 23 16:35:28 a1 kernel: [692455.505166] osdmap: 00000040: 00 00 00 > 00 00 00 00 00 00 00 01 00 00 00 ff ff ................ > Apr 23 16:35:28 a1 kernel: [692455.505169] osdmap: 00000050: 5c 02 00 > 00 00 00 03 00 00 00 0c 00 00 00 00 00 \............... > Apr 23 16:35:28 a1 kernel: [692455.505171] osdmap: 00000060: 00 00 02 > 00 00 00 ...... > Apr 23 16:35:28 a1 kernel: [692455.505174] libceph: osdc handle_map corrupt msg > Apr 23 16:35:28 a1 kernel: [692455.513590] header: 00000000: 90 03 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > Apr 23 16:35:28 a1 kernel: [692455.513593] header: 00000010: 29 00 c4 > 00 01 00 86 00 00 00 00 00 00 00 00 00 )............... > Apr 23 16:35:28 a1 kernel: [692455.513596] header: 00000020: 00 00 00 > 00 01 00 00 00 00 00 00 00 00 01 00 00 ................ > Apr 23 16:35:28 a1 kernel: [692455.513599] header: 00000030: 00 5d 68 > c5 e8 .]h.. > Apr 23 16:35:28 a1 kernel: [692455.513602] front: 00000000: 69 17 a0 > 33 34 39 4f d7 88 db 46 c9 e1 df 0d 6e i..349O...F....n > Apr 23 16:35:28 a1 kernel: [692455.513605] front: 00000010: 01 00 00 > 00 82 56 00 00 66 00 00 00 05 00 69 17 .....V..f.....i. > Apr 23 16:35:28 a1 kernel: [692455.513607] front: 00000020: a0 33 34 > 39 4f d7 88 db 46 c9 e1 df 0d 6e 82 56 .349O...F....n.V > Apr 23 16:35:28 a1 kernel: [692455.513610] front: 00000030: 00 00 b0 > 0c 77 51 00 1a 00 22 ff ff ff ff ff ff ....wQ..."...... > Apr 23 16:35:28 a1 kernel: [692455.513613] front: 00000040: ff ff 00 > 00 00 00 00 00 00 00 ff ff ff ff 00 00 ................ > Apr 23 16:35:28 a1 kernel: [692455.513616] front: 00000050: 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > Apr 23 16:35:28 a1 kernel: [692455.513618] front: 00000060: 00 00 00 > 00 00 00 01 00 00 00 ff ff 5c 02 00 00 ............\... > Apr 23 16:35:28 a1 kernel: [692455.513621] front: 00000070: 00 00 03 > 00 00 00 0c 00 00 00 00 00 00 00 02 00 ................ > Apr 23 16:35:28 a1 kernel: [692455.513624] front: 00000080: 00 00 00 > 00 00 00 ...... > Apr 23 16:35:28 a1 kernel: [692455.513627] footer: 00000000: ae ee 1e > d8 00 00 00 00 00 00 00 00 01 ............. > > On Tue, Apr 23, 2013 at 4:41 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > > On Tue, Apr 23, 2013 at 3:37 PM, Bryan Stillwell > > <bstillwell@xxxxxxxxxxxxxxx> wrote: > >> I'm using the kernel client that's built into precise & quantal. > >> > >> I could give the ceph-fuse client a try and see if it has the same > >> issue. I haven't used it before, so I'll have to do some reading > >> first. > > > > If you've got the time that would be a good data point, and make > > debugging easier if it reproduces. There's not a ton to learn ? you > > install the ceph-fuse package (I think it's packaged separately, > > anyway) and then instead of "mount" you run "ceph-fuse -c <ceph.conf > > file> --name client.<name> --keyring <keyring_file>" or similar. :) > > -Greg > > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > > > >> > >> Bryan > >> > >> On Tue, Apr 23, 2013 at 4:04 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > >>> Sorry, I meant kernel client or ceph-fuse? Client logs would be enough > >>> to start with, I suppose ? "debug client = 20" and "debug ms = 1" if > >>> using ceph-fuse; if using the kernel client things get tricker; I'd > >>> have to look at what logging is available without the debugfs stuff > >>> being enabled. :/ > >>> -Greg > >>> Software Engineer #42 @ http://inktank.com | http://ceph.com > >>> > >>> > >>> On Tue, Apr 23, 2013 at 3:00 PM, Bryan Stillwell > >>> <bstillwell@xxxxxxxxxxxxxxx> wrote: > >>>> I've tried a few different ones: > >>>> > >>>> 1. cp to cephfs mounted filesystem on Ubuntu 12.10 (quantal) > >>>> 2. rsync over ssh to cephfs mounted filesystem on Ubuntu 12.04.2 (precise) > >>>> 3. scp to cephfs mounted filesystem on Ubuntu 12.04.2 (precise) > >>>> > >>>> It's fairly reproducible, so I can collect logs for you. Which ones > >>>> would you be interested in? > >>>> > >>>> The cluster has been in a couple states during testing (during > >>>> expansion/rebalancing and during an all active+clean state). > >>>> > >>>> BTW, all the nodes are running with the 0.56.4-1precise packages. > >>>> > >>>> Bryan > >>>> > >>>> On Tue, Apr 23, 2013 at 12:56 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > >>>>> On Tue, Apr 23, 2013 at 11:38 AM, Bryan Stillwell > >>>>> <bstillwell@xxxxxxxxxxxxxxx> wrote: > >>>>>> I've run into an issue where after copying a file to my cephfs cluster > >>>>>> the md5sums no longer match. I believe I've tracked it down to some > >>>>>> parts of the file which are missing: > >>>>>> > >>>>>> $ obj_name=$(cephfs "title1.mkv" show_location -l 0 | grep object_name > >>>>>> | sed -e "s/.*:\W*\([0-9a-f]*\)\.[0-9a-f]*/\1/") > >>>>>> $ echo "Object name: $obj_name" > >>>>>> Object name: 10000001120 > >>>>>> > >>>>>> $ file_size=$(stat "title1.mkv" | grep Size | awk '{ print $2 }') > >>>>>> $ printf "File size: %d MiB (%d Bytes)\n" $(($file_size/1048576)) $file_size > >>>>>> File size: 20074 MiB (21049178117 Bytes) > >>>>>> > >>>>>> $ blocks=$((file_size/4194304+1)) > >>>>>> $ printf "Blocks: %d\n" $blocks > >>>>>> Blocks: 5019 > >>>>>> > >>>>>> $ for b in `seq 0 $(($blocks-1))`; do rados -p data stat > >>>>>> ${obj_name}.`printf '%8.8x\n' $b` | grep "error"; done > >>>>>> error stat-ing data/10000001120.00001076: No such file or directory > >>>>>> error stat-ing data/10000001120.000011c7: No such file or directory > >>>>>> error stat-ing data/10000001120.0000129c: No such file or directory > >>>>>> error stat-ing data/10000001120.000012f4: No such file or directory > >>>>>> error stat-ing data/10000001120.00001307: No such file or directory > >>>>>> > >>>>>> > >>>>>> Any ideas where to look to investigate what caused these blocks to not > >>>>>> be written? > >>>>> > >>>>> What client are you using to write this? Is it fairly reproducible (so > >>>>> you could collect logs of it happening)? > >>>>> > >>>>> Usually the only times I've seen anything like this were when either > >>>>> the file data was supposed to go into a pool which the client didn't > >>>>> have write permissions on, or when the RADOS cluster was in bad shape > >>>>> and so the data never got flushed to disk. Has your cluster been > >>>>> healthy since you started writing the file out? > >>>>> -Greg > >>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com > >>>>> > >>>>> > >>>>>> > >>>>>> Here's the current state of the cluster: > >>>>>> > >>>>>> ceph -s > >>>>>> health HEALTH_OK > >>>>>> monmap e1: 1 mons at {a=172.24.88.50:6789/0}, election epoch 1, quorum 0 a > >>>>>> osdmap e22059: 24 osds: 24 up, 24 in > >>>>>> pgmap v1783615: 1920 pgs: 1917 active+clean, 3 > >>>>>> active+clean+scrubbing+deep; 4667 GB data, 9381 GB used, 4210 GB / > >>>>>> 13592 GB avail > >>>>>> mdsmap e437: 1/1/1 up {0=a=up:active} > >>>>>> > >>>>>> Here's my current crushmap: > >>>>>> > >>>>>> # begin crush map > >>>>>> > >>>>>> # devices > >>>>>> device 0 osd.0 > >>>>>> device 1 osd.1 > >>>>>> device 2 osd.2 > >>>>>> device 3 osd.3 > >>>>>> device 4 osd.4 > >>>>>> device 5 osd.5 > >>>>>> device 6 osd.6 > >>>>>> device 7 osd.7 > >>>>>> device 8 osd.8 > >>>>>> device 9 osd.9 > >>>>>> device 10 osd.10 > >>>>>> device 11 osd.11 > >>>>>> device 12 osd.12 > >>>>>> device 13 osd.13 > >>>>>> device 14 osd.14 > >>>>>> device 15 osd.15 > >>>>>> device 16 osd.16 > >>>>>> device 17 osd.17 > >>>>>> device 18 osd.18 > >>>>>> device 19 osd.19 > >>>>>> device 20 osd.20 > >>>>>> device 21 osd.21 > >>>>>> device 22 osd.22 > >>>>>> device 23 osd.23 > >>>>>> > >>>>>> # types > >>>>>> type 0 osd > >>>>>> type 1 host > >>>>>> type 2 rack > >>>>>> type 3 row > >>>>>> type 4 room > >>>>>> type 5 datacenter > >>>>>> type 6 pool > >>>>>> > >>>>>> # buckets > >>>>>> host b1 { > >>>>>> id -2 # do not change unnecessarily > >>>>>> # weight 2.980 > >>>>>> alg straw > >>>>>> hash 0 # rjenkins1 > >>>>>> item osd.0 weight 0.500 > >>>>>> item osd.1 weight 0.500 > >>>>>> item osd.2 weight 0.500 > >>>>>> item osd.3 weight 0.500 > >>>>>> item osd.4 weight 0.500 > >>>>>> item osd.20 weight 0.480 > >>>>>> } > >>>>>> host b2 { > >>>>>> id -4 # do not change unnecessarily > >>>>>> # weight 4.680 > >>>>>> alg straw > >>>>>> hash 0 # rjenkins1 > >>>>>> item osd.5 weight 0.500 > >>>>>> item osd.6 weight 0.500 > >>>>>> item osd.7 weight 2.200 > >>>>>> item osd.8 weight 0.500 > >>>>>> item osd.9 weight 0.500 > >>>>>> item osd.21 weight 0.480 > >>>>>> } > >>>>>> host b3 { > >>>>>> id -5 # do not change unnecessarily > >>>>>> # weight 3.480 > >>>>>> alg straw > >>>>>> hash 0 # rjenkins1 > >>>>>> item osd.10 weight 0.500 > >>>>>> item osd.11 weight 0.500 > >>>>>> item osd.12 weight 1.000 > >>>>>> item osd.13 weight 0.500 > >>>>>> item osd.14 weight 0.500 > >>>>>> item osd.22 weight 0.480 > >>>>>> } > >>>>>> host b4 { > >>>>>> id -6 # do not change unnecessarily > >>>>>> # weight 3.480 > >>>>>> alg straw > >>>>>> hash 0 # rjenkins1 > >>>>>> item osd.15 weight 0.500 > >>>>>> item osd.16 weight 1.000 > >>>>>> item osd.17 weight 0.500 > >>>>>> item osd.18 weight 0.500 > >>>>>> item osd.19 weight 0.500 > >>>>>> item osd.23 weight 0.480 > >>>>>> } > >>>>>> pool default { > >>>>>> id -1 # do not change unnecessarily > >>>>>> # weight 14.620 > >>>>>> alg straw > >>>>>> hash 0 # rjenkins1 > >>>>>> item b1 weight 2.980 > >>>>>> item b2 weight 4.680 > >>>>>> item b3 weight 3.480 > >>>>>> item b4 weight 3.480 > >>>>>> } > >>>>>> > >>>>>> # rules > >>>>>> rule data { > >>>>>> ruleset 0 > >>>>>> type replicated > >>>>>> min_size 2 > >>>>>> max_size 10 > >>>>>> step take default > >>>>>> step chooseleaf firstn 0 type host > >>>>>> step emit > >>>>>> } > >>>>>> rule metadata { > >>>>>> ruleset 1 > >>>>>> type replicated > >>>>>> min_size 2 > >>>>>> max_size 10 > >>>>>> step take default > >>>>>> step chooseleaf firstn 0 type host > >>>>>> step emit > >>>>>> } > >>>>>> rule rbd { > >>>>>> ruleset 2 > >>>>>> type replicated > >>>>>> min_size 1 > >>>>>> max_size 10 > >>>>>> step take default > >>>>>> step chooseleaf firstn 0 type host > >>>>>> step emit > >>>>>> } > >>>>>> > >>>>>> # end crush map > >>>>>> > >>>>>> > >>>>>> Thanks, > >>>>>> Bryan > >>>>>> _______________________________________________ > >>>>>> ceph-users mailing list > >>>>>> ceph-users@xxxxxxxxxxxxxx > >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> > >> > >> -- > >> > >> > >> Bryan Stillwell > >> SENIOR SYSTEM ADMINISTRATOR > >> > >> E: bstillwell@xxxxxxxxxxxxxxx > >> O: 303.228.5109 > >> M: 970.310.6085 > > > > -- > > > Bryan Stillwell > SENIOR SYSTEM ADMINISTRATOR > > E: bstillwell@xxxxxxxxxxxxxxx > O: 303.228.5109 > M: 970.310.6085 > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com