All the OSDs are backed by xfs. Each RBD is formatted with ext4. Thanks for the response. On Mon, Feb 11, 2013 at 6:12 PM, Mike Lowe <j.michael.lowe@xxxxxxxxx> wrote: > Are your RBD's backed by btrfs? I struggled for a very long time with corruption of RBD images until Sage and Samuel helped find a btrfs bug that can truncate sparse files if they are written to at a lower offset right after a higher offset. The fix for this is now in 3.8rc7 and the commit is here https://git.kernel.org/?p=linux/kernel/git/josef/btrfs-next.git;a=commit;h=d468abec6b9fd7132d012d33573ecb8056c7c43f > > On Feb 11, 2013, at 6:06 PM, Travis Rhoden <trhoden@xxxxxxxxx> wrote: > >> Hey folks, >> >> Noticed this today and it has me stumped. >> >> I have a 10GB raw VM disk image that I've placed inside of an >> ext4-formatted RBD. When I do this, it gets corrupted in weird ways. >> I was prepared to show fsck results to show this, but then I found an >> easier way was just by looking at the sha1sum for the file. Here's >> what I see. >> >> disk image sitting on regular (non-RBD) ext4 filesystem: >> # sha1sum disk.img >> cfd37c33b9de926644f7b13e604374348662bc60 disk.img >> >> same disk image sitting in RBD #1 >> # cp -p disk.img /mnt/rbd1 >> # sha1sum /mnt/rbd1/disk.img >> cfd37c33b9de926644f7b13e604374348662bc60 disk.img >> >> Great, they match. But then comes the problematic RBD: >> # cp -p disk.img /mnt/rbd2 >> # sha1sum /mnt/rbd2/disk.img >> a28d0735c0f0863a3f84151122da75a56bf5022b disk.img >> >> They don't match. I can also confirm that fsck'ing the filesystem >> contained in disk.img reveals numerous errors in the latter case, >> while the system is clean in the first two. >> >> I'm running 0.48.2argonaut on this particular cluster. >> RBDs were mapped with the kernel client. Kernel is 3.2.0-29-generic, >> running in Ubuntu 12.04.1. >> >> The only weird thing I've observed is that while the copy was going to >> RBD #2, I saw this in ceph -w: >> 2013-02-11 22:18:14.134683 osd.2 [WRN] client.7830 >> 10.40.30.0:0/1548040543 misdirected client.7830.1:48034857 4.127 to >> osd.2 not [4,2] in e2459/2459 >> 2013-02-11 22:18:14.135159 osd.2 [WRN] client.7830 >> 10.40.30.0:0/1548040543 misdirected client.7830.1:48034858 4.127 to >> osd.2 not [4,2] in e2459/2459 >> 2013-02-11 22:18:14.136699 osd.2 [WRN] client.7830 >> 10.40.30.0:0/1548040543 misdirected client.7830.1:48034859 4.127 to >> osd.2 not [4,2] in e2459/2459 >> 2013-02-11 22:18:14.139479 osd.2 [WRN] client.7830 >> 10.40.30.0:0/1548040543 misdirected client.7830.1:48034860 4.127 to >> osd.2 not [4,2] in e2459/2459 >> 2013-02-11 22:18:14.139588 osd.2 [WRN] client.7830 >> 10.40.30.0:0/1548040543 misdirected client.7830.1:48034861 4.127 to >> osd.2 not [4,2] in e2459/2459 >> 2013-02-11 22:18:14.139667 osd.2 [WRN] client.7830 >> 10.40.30.0:0/1548040543 misdirected client.7830.1:48034862 4.127 to >> osd.2 not [4,2] in e2459/2459 >> 2013-02-11 22:18:14.139748 osd.2 [WRN] client.7830 >> 10.40.30.0:0/1548040543 misdirected client.7830.1:48034863 4.127 to >> osd.2 not [4,2] in e2459/2459 >> 2013-02-11 22:18:14.139827 osd.2 [WRN] client.7830 >> 10.40.30.0:0/1548040543 misdirected client.7830.1:48034864 4.127 to >> osd.2 not [4,2] in e2459/2459 >> >> I hadn't seen this one before. >> >> Full disclosure: >> >> I had a ceph node failure last week (a week ago today) where all three >> OSD processes on one of my nodes got killed by OOM. I haven't had a >> chance to go back and look for errors, gather logs, or ask the list >> for any advice on what went wrong. Restarting my OSDs brought >> everything back inline -- the cluster handled the failed OSDs just >> fine, with one exception. One of my RBDs went >> read-only/write-protected. Even after the cluster was back to >> HEALTH_OK, it remained read-only. I had to unmount, unmap, map, mount >> my RBD to get it back. It just so happens that that RBD is the one >> giving me problems now. So they could be related. =) >> >> It's a small cluster: >> >> # ceph -s >> health HEALTH_OK >> monmap e1: 3 mons at >> {a=10.40.30.0:6789/0,b=10.40.30.1:6789/0,c=10.40.30.2:6789/0}, >> election epoch 4, quorum 0,1,2 a,b,c >> osdmap e2459: 9 osds: 9 up, 9 in >> pgmap v9525714: 2880 pgs: 2880 active+clean; 2841 GB data, 5649 GB >> used, 11109 GB / 16758 GB avail >> mdsmap e1: 0/0/1 up >> >> # ceph osd tree >> dumped osdmap tree epoch 2459 >> # id weight type name up/down reweight >> -1 18 pool default >> -3 18 rack unknownrack >> -2 6 host ceph0 >> 0 2 osd.0 up 1 >> 1 2 osd.1 up 1 >> 2 2 osd.2 up 1 >> -4 6 host ceph1 >> 3 2 osd.3 up 1 >> 4 2 osd.4 up 1 >> 5 2 osd.5 up 1 >> -5 6 host ceph2 >> 6 2 osd.6 up 1 >> 7 2 osd.7 up 1 >> 8 2 osd.8 up 1 >> >> But yeah, I'm just stumped about why files going into that particular >> RBD get corrupted. I tried a smaller file (~140MB) and it was fine. >> I haven't gotten to do enough testing to find the threshold for >> corruption. Or if it only happens for specific file types. I did a >> similar test with qcow2 images (10G virtual, 4.4GB actual), and the >> fsck results were the same -- immediate corruption inside that RBD. I >> did not capture the sha1sum for those files though. I expect they >> would differ. =) >> >> Thanks, >> >> - Travis >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com