really sorry for the bad format, I will put it here again. I found data lost when flattening a cloned image on giant(0.87.2). The problem can be easily reproduced by runing the following script: #!/bin/bash ceph osd pool create wuxingyi 1 1 rbd create --image-format 2 wuxingyi/disk1.img --size 8 #writing "FOOBAR" at offset 0 python writetooffset.py disk1.img 0 FOOBAR rbd snap create wuxingyi/disk1.img@SNAPSHOT rbd snap protect wuxingyi/disk1.img@SNAPSHOT echo "start cloing" rbd clone wuxingyi/disk1.img@SNAPSHOT wuxingyi/CLONEIMAGE #writing "WUXINGYI" at offset 4M of cloned image python writetooffset.py CLONEIMAGE $((4*1048576)) WUXINGYI rbd snap create wuxingyi/CLONEIMAGE@CLONEDSNAPSHOT #modify at offset 4M of cloned image python writetooffset.py CLONEIMAGE $((4*1048576)) HEHEHEHE echo "start flattening CLONEIMAGE" rbd flatten wuxingyi/CLONEIMAGE echo "before rollback" rbd export wuxingyi/CLONEIMAGE && hexdump -C CLONEIMAGE rm CLONEIMAGE -f rbd snap rollback wuxingyi/CLONEIMAGE@CLONEDSNAPSHOT echo "after rollback" rbd export wuxingyi/CLONEIMAGE && hexdump -C CLONEIMAGE rm CLONEIMAGE -f where writetooffset.py is a simple python script writing specific data to the specific offset of the image: #!/usr/bin/python #coding=utf-8 import sys import rbd import rados cluster = rados.Rados(conffile='/etc/ceph/ceph.conf') cluster.connect() ioctx = cluster.open_ioctx('wuxingyi') rbd_inst = rbd.RBD() image=rbd.Image(ioctx, sys.argv[1]) image.write(sys.argv[3], int(sys.argv[2])) The output is something like: before rollback Exporting image: 100% complete...done. 00000000 46 4f 4f 42 41 52 00 00 00 00 00 00 00 00 00 00 |FOOBAR..........| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00400000 48 45 48 45 48 45 48 45 00 00 00 00 00 00 00 00 |HEHEHEHE........| 00400010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00800000 Rolling back to snapshot: 100% complete...done. after rollback Exporting image: 100% complete...done. 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00400000 57 55 58 49 4e 47 59 49 00 00 00 00 00 00 00 00 |WUXINGYI........| 00400010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00800000 We can easily fount that the first object of the image is definitely lost, and I found the data loss is happened when flattening, there is only a "head" version of the first object, actually a "snapid" version of the object should also be created and writed when flattening. But when running this scripts on upstream code, I cannot hit this problem. I look through the upstream code but could not find which commit fixes this bug. I also found the whole state machine dealing with RBD layering changed a lot since giant release. Could you please give me some hints on which commits should I backport? Thanks~~~~ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com