Hi, We had today a catastrophic fs corruption in one of our virtual machines, after fsck ~100MB was inside lost+found :-( So is think we hit the same bug (ceph-0.45.2, sparse rbd images) Is there any progress on this topic, or any hint how to help on this would be helpful. Greetings Stefan Majer On Tue, Jun 12, 2012 at 2:31 PM, Guido Winkelmann <guido-ceph@xxxxxxxxxxxxxxxxx> wrote: > Am Montag, 11. Juni 2012, 09:30:42 schrieb Sage Weil: >> If you can reproduce it with 'debug filestore = 20' too, that will be >> better, as it will tell us what the FIEMAP ioctl is returning. > > I ran another testrun with 'debug filestore = 20'. > >> Also, if >> you can attach/post the contents of the object itself (rados -p rbd get >> rb.0.1.0000000002a0 /tmp/foo) we can make sure the object has the right >> data (and the sparse-read operation that librbd is doing is the culprit). > > I tried that, with the block name that the steps further below gave me: > > rados -p rbd get rb.0.13.00000000045a block > > When I looked into the block, it looked like a bunch of temp files from the > portage system with padding in between, although it should be random data... I > think I got the wrong block after all... > > Here's what I did: > Run the iotester again: > testserver-rbd11 iotester # date ; ./iotester /var/iotest ; date > Tue Jun 12 13:51:58 CEST 2012 > Wrote 100 MiB of data in 2004 milliseconds > [snip lots of irrelevant lines] > Wrote 100 MiB of data in 2537 milliseconds > Read 100 MiB of data in 3794 milliseconds > Read 100 MiB of data in 10150 milliseconds > Digest wrong for file "/var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e" > Tue Jun 12 13:55:00 CEST 2012 > > Run the fiemap tool on that file: > > testserver-rbd11 ~ # ./fiemap > /var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e > File /var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e has 1 extents: > # Logical Physical Length Flags > 0: 0000000000000000 0000000116900000 0000000000100000 0001 > >> As for the log: >> >> First, map the offset to an rbd block. For example, taking the 'Physical' >> value of 00000000a8200000 from above: >> >> $ printf "%012x\n" $((0x00000000a8200000 / (4096*1024) )) >> 0000000002a0 > > That gave me > > $ printf "%012x\n" $((0x0000000116900000 / (4096*1024) )) > 00000000045a > >> Then figure out what the object name prefix is: >> >> $ rbd info <imagename> | grep prefix >> block_name_prefix: rb.0.1 > > Result: block_name_prefix: rb.0.13 > >> Then add the block number, 0000000002a0 to that, e.g. rb.0.1.0000000002a0. > > Result: rb.0.13.00000000045a > >> Then map that back to an osd with >> >> $ ceph osd map rbd rb.0.1.0000000002a0 >> osdmap e19 pool 'rbd' (2) object 'rb.0.1.0000000002a0' -> pg 2.a2e06f65 >> (2.5) -> up [0,2] acting [0,2] > > That gives me > [root@storage1 ~]# ceph osd map rbd rb.0.13.00000000045a 2> /dev/null > osdmap e101 pool 'rbd' (2) object 'rb.0.13.00000000045a' -> pg 2.80b039fb > (2.7b) -> up [2,1] acting [2,1] > >> You'll see the osd ids listed in brackets after 'active'. We want the >> first one, 0 in my example. The log from that OSD is what we need. > > Okay, i'm attaching the compressed log for osd.2 and the compressed block to > the issue report in the redmine. > > Guido > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Stefan Majer -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html