Re: Random data corruption in VM, possibly caused by rbd

Stefan Majer <stefan.majer@xxxxxxxxx> · Fri, 15 Jun 2012 14:14:29 +0200

Hi,

We had today a catastrophic fs corruption in one of our virtual
machines, after fsck ~100MB was inside lost+found :-(
So is think we hit the same bug (ceph-0.45.2, sparse rbd images)

Is there any progress on this topic, or any hint how to help on this
would be helpful.

Greetings
Stefan Majer

On Tue, Jun 12, 2012 at 2:31 PM, Guido Winkelmann
<guido-ceph@xxxxxxxxxxxxxxxxx> wrote:
> Am Montag, 11. Juni 2012, 09:30:42 schrieb Sage Weil:
>> If you can reproduce it with 'debug filestore = 20' too, that will be
>> better, as it will tell us what the FIEMAP ioctl is returning.
>
> I ran another testrun with 'debug filestore = 20'.
>
>> Also, if
>> you can attach/post the contents of the object itself (rados -p rbd get
>> rb.0.1.0000000002a0 /tmp/foo) we can make sure the object has the right
>> data (and the sparse-read operation that librbd is doing is the culprit).
>
> I tried that, with the block name that the steps further below gave me:
>
> rados -p rbd get rb.0.13.00000000045a block
>
> When I looked into the block, it looked like a bunch of temp files from the
> portage system with padding in between, although it should be random data... I
> think I got the wrong block after all...
>
> Here's what I did:
> Run the iotester again:
> testserver-rbd11 iotester # date ; ./iotester /var/iotest ; date
> Tue Jun 12 13:51:58 CEST 2012
> Wrote 100 MiB of data in 2004 milliseconds
> [snip lots of irrelevant lines]
> Wrote 100 MiB of data in 2537 milliseconds
> Read 100 MiB of data in 3794 milliseconds
> Read 100 MiB of data in 10150 milliseconds
> Digest wrong for file "/var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e"
> Tue Jun 12 13:55:00 CEST 2012
>
> Run the fiemap tool on that file:
>
> testserver-rbd11 ~ # ./fiemap
> /var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e
> File /var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e has 1 extents:
> #       Logical          Physical         Length           Flags
> 0:      0000000000000000 0000000116900000 0000000000100000 0001
>
>> As for the log:
>>
>> First, map the offset to an rbd block.  For example, taking the 'Physical'
>> value of 00000000a8200000 from above:
>>
>> $ printf "%012x\n" $((0x00000000a8200000 / (4096*1024) ))
>> 0000000002a0
>
> That gave me
>
>  $ printf "%012x\n" $((0x0000000116900000 / (4096*1024) ))
> 00000000045a
>
>> Then figure out what the object name prefix is:
>>
>> $ rbd info <imagename> | grep prefix
>>         block_name_prefix: rb.0.1
>
> Result: block_name_prefix: rb.0.13
>
>> Then add the block number, 0000000002a0 to that, e.g. rb.0.1.0000000002a0.
>
> Result: rb.0.13.00000000045a
>
>> Then map that back to an osd with
>>
>> $ ceph osd map rbd rb.0.1.0000000002a0
>> osdmap e19 pool 'rbd' (2) object 'rb.0.1.0000000002a0' -> pg 2.a2e06f65
>> (2.5) -> up [0,2] acting [0,2]
>
> That gives me
> [root@storage1 ~]# ceph osd map rbd rb.0.13.00000000045a 2> /dev/null
> osdmap e101 pool 'rbd' (2) object 'rb.0.13.00000000045a' -> pg 2.80b039fb
> (2.7b) -> up [2,1] acting [2,1]
>
>> You'll see the osd ids listed in brackets after 'active'.  We want the
>> first one, 0 in my example.  The log from that OSD is what we need.
>
> Okay, i'm attaching the compressed log for osd.2 and the compressed block to
> the issue report in the redmine.
>
>        Guido
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Stefan Majer
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html