Hi Jens- On Thu, 14 Feb 2013, Jens Kristian S?gaard wrote: > Hi Sage, > > > block device level. We plan to implement an incremental backup function for > > the relative change between two snapshots (or a snapshot and the head). > > It's O(n) the size of the device vs the number of files, but should be more > > efficient for all but the most sparse of images. The implementation should > > be simple; the challenge is mostly around the incremental file format, > > probably. > > That doesn't help you now, but would be a relatively self-contained piece of > > functionality for someone to contribute to RBD. This isn't a top > > I'm very interesting in having an incremental backup tool for Ceph, so if it > is possible for me to do, I would like to take a shot at implementing it. It > will be a spare time project, so I cannot say how fast it will progress > though. > > If you have any details on how you would like to see the implementation work, > please let me know! Great to hear you're interested in this! There is a feature in the tracker open: http://tracker.ceph.com/issues/4084 (Not that there is much information there yet!) I think this breaks down into a few different pieces: 1) Decide what output format to use. We want to use something that is resembles a portable, standard way of representing an incremental set of changes to a block device (or large file). I'm not sure what is out there, but we should look carefully before making up our own format. 2) Expose changes objects between rados snapshots. This is some generic functionality we would bake into librbd that would probably work similarly to how read_iterate() currently does (you specify a callback). We probably also want to provide this information directly to a user, so that they can get a dump of (offsets, length) pairs for integration with their own tool. I expect this is just a core librbd method. 3) Write a dumper based on #2 that outputs in format from #1. The callback would (instead of printing file offsets) write the data to the output stream with appropriate metadata indicating which part of the image it is. Ideally the output part would be modular, too, so that we can come back later and implement support for new formats easily. The output data stream should be able to be directed at stdout or a file. 4) Write an importer for #1. It would take as input an existing image, assumed to be in the state of the reference snapshot, and write all the changed bits. Take input from stdin or a file. 5) If necessary, extend the above so that image resize events are properly handled. Probably the trickiest bit here is #2, as it will probably involve adding some low-level rados operations to efficiently query the snapshot state from the client. With this (and any of the rest), we can help figure out how to integrate it cleanly. My suggestion is to start with #1, though (and make sure the rest of this all makes sense to everyone). THanks! sage _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com