Hi Jason, Thanks for the reply. We are not sure this issue is only occurring on cloned images. We think it would be a generic synchronization issue. Our production/test setup are all based on Hammer, so we don?t have a chance to touch Jewel. But we will try Jewel latter. We don?t use cache tiering in our test environment. Currently we haven?t found an easy way to reproduce this issue. However, we will continue to figure out a test case to verify the root cause. Anyway, if my colleague Xuehan?s root cause analysis is true, then this would be a serious defect in Ceph?s snapshot mechanism. You mentioned the fix is scheduled to be included in Hammer 0.94.10, Is there any fix already there?? Thanks, Zhongyan On Mon, Feb 20, 2017 at 9:17 PM, Jason Dillaman <jdillama at redhat.com> wrote: > AFAIK, that fix is scheduled to be included in Hammer 0.94.10 (which > hasn't been released yet). > > Is this issue only occurring on cloned images? Since Hammer is nearly > end-of-life, can you repeat this issue on Jewel? Are the affected > images using cache tiering? Can you determine an easy-to-reproduce > case? > > On Sun, Feb 19, 2017 at 10:21 PM, Zhongyan Gu <zhongyan.gu at gmail.com> > wrote: > > BTW, we used hammer version with the following fix. the issue is also > > reported by us during the former backup testing. > > https://github.com/ceph/ceph/pull/12218/files > > librbd: diffs to clone's first snapshot should include parent diffs > > > > > > > > Zhongyan > > > > On Mon, Feb 20, 2017 at 11:13 AM, Zhongyan Gu <zhongyan.gu at gmail.com> > wrote: > >> > >> > >> Hi Sage and Jason, > >> > >> My company is building backup system based on rbd export-diff and > >> import-diff cmds. > >> > >> However, in recent test we found some strange behaviors of cmd > >> export-diff. long words in short: sometimes repeatedly executing rbd > >> export-diff ?from-snap snap1 image at snap2 -|md5sum, and md5sum returns > >> different values. > >> > >> The details are: > >> > >> We used two ceph rbd clusters: A for online vms usage and B for backup > >> usage. > >> > >> For a specific vm image, this image is cloned from a parent image. And > >> initially our backup system will do a full backup with rbd export/import > >> cmds. Then every day we will do incremental backup with rbd > >> export-diff/import-diff cmds. > >> > >> The make sure the data consistency, we also do the md5 comparison of > >> online vm images at snapN and backup vm images at snapN. > >> > >> Our test found some times for some vm images the md5 check is failed: > >> online vm images at snapN doesn?t match backup vm images at snapN. > >> > >> To narrow this issue, we manually generated the incremental file > generated > >> by rbd export-diff between the specific snaps and found its md5 didn?t > match > >> the file generated by backup scripits. > >> > >> Compared those two binary files we found only a little difference: some > >> bytes are not the same. > >> > >> I doubt could this be an export-diff bug? As far as I know, if we create > >> two snaps, then the diffs between two snaps should always be the same. > But > >> why export-diff doesn?t work as expected and return different md5 check? > >> Some corner case not well considered or anyone else has the same > experience? > >> BTW, we did some fio io workload 24 hours in vms during the backup test. > >> > >> > >> > >> Thanks, > >> > >> Zhongyan > > > > > > > > -- > Jason > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20170221/676a479f/attachment.htm>