On Tue, 1 Feb 2011, Colin McCabe wrote: > Yehuda is probably right though... it's not 100% clear that the > benefits outweigh the disadvantages, given that it would need an extra > lookup for every operation. In the end it's something that probably > will take some experimentation to get right. Right. The nice thing about RBD is it's simplicity: there is almost no metadata. Just the block size, image size, and object name prefix. That's enough to name the object with the data you want, and that object may or may not exist, depending on whether it's been written to. There are no consistency concerns. When I mentioned allocation bitmap before, I meant simply a bitmap specifying whether the block exists, that would let us avoid looking for an object in the parent image. In its simplest form, you would mark the image read-only, then generate the bitmap once. Anything more complicated with that and you have to worry about keeping the metadata consistent with the data. CAS, for example, requires lots of metadata: if I want to read block 1234, I have to look up in some table that says 1234 has sha1 FOO, and then go read that object. Then writing is a whole other story. Again, doing CAS on a read-only image simplifies things greatly, but I don't think we should go down that road now. Mainly I'm interested in feedback on the simple layering use-case... sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html