Re: data loss when flattening a cloned image on giant

wuxingyi <wuxingyigfs@xxxxxxxxxxx> · Tue, 26 Jan 2016 08:11:11 +0000

really sorry for the bad format, I will put it here again.

I found data lost when flattening a cloned image on giant(0.87.2). The problem can be easily reproduced by runing the following script:
    #!/bin/bash
    ceph osd pool create wuxingyi 1 1
    rbd create --image-format 2 wuxingyi/disk1.img --size 8
    #writing "FOOBAR" at offset 0
    python writetooffset.py disk1.img 0 FOOBAR
    rbd snap create wuxingyi/disk1.img@SNAPSHOT
    rbd snap protect wuxingyi/disk1.img@SNAPSHOT

    echo "start cloing"
    rbd clone wuxingyi/disk1.img@SNAPSHOT wuxingyi/CLONEIMAGE

    #writing "WUXINGYI" at offset 4M  of cloned image
    python writetooffset.py CLONEIMAGE $((4*1048576)) WUXINGYI
    rbd snap create wuxingyi/CLONEIMAGE@CLONEDSNAPSHOT

    #modify  at offset 4M  of cloned image
    python writetooffset.py CLONEIMAGE $((4*1048576)) HEHEHEHE

    echo "start flattening CLONEIMAGE"
    rbd flatten wuxingyi/CLONEIMAGE

    echo "before rollback"
    rbd export wuxingyi/CLONEIMAGE &&  hexdump -C CLONEIMAGE
    rm CLONEIMAGE -f
    rbd snap rollback wuxingyi/CLONEIMAGE@CLONEDSNAPSHOT
    echo "after rollback"
    rbd export wuxingyi/CLONEIMAGE &&  hexdump -C CLONEIMAGE
    rm CLONEIMAGE -f

where writetooffset.py is a simple python script writing specific data to the specific offset of the image:
    #!/usr/bin/python
    #coding=utf-8
    import sys
    import rbd
    import rados

    cluster = rados.Rados(conffile='/etc/ceph/ceph.conf')
    cluster.connect()
    ioctx = cluster.open_ioctx('wuxingyi')
    rbd_inst = rbd.RBD()
    image=rbd.Image(ioctx, sys.argv[1])
    image.write(sys.argv[3], int(sys.argv[2]))

The output is something like:

before rollback
Exporting image: 100% complete...done.
00000000  46 4f 4f 42 41 52 00 00  00 00 00 00 00 00 00 00  |FOOBAR..........|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00400000  48 45 48 45 48 45 48 45  00 00 00 00 00 00 00 00  |HEHEHEHE........|
00400010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00800000
Rolling back to snapshot: 100% complete...done.
after rollback
Exporting image: 100% complete...done.
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00400000  57 55 58 49 4e 47 59 49  00 00 00 00 00 00 00 00  |WUXINGYI........|
00400010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00800000

We can easily fount that the first object of the image is definitely lost, and I found the data loss is happened when flattening, there is only a "head" version of the first object, actually a "snapid" version of the object should also be created and writed when flattening.
But when running this scripts on upstream code, I cannot hit this problem. I look through the upstream code but could not find which commit fixes this bug. I also found the whole state machine dealing with RBD layering changed a lot since giant release.

Could you please give me some hints on which commits should I backport?
Thanks~~~~ 		 	   		  
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com