Replying to a response that was off-list: On Wed, May 5, 2010 at 9:02 AM, Martin Fick <mogulguy@xxxxxxxxx> wrote: > > --- On Wed, 5/5/10, Alex Elsayed <eternaleye@xxxxxxxxx> wrote: > > > >...This would open up the use of RBD devices for linux > > > containers or linux vservers which could run on any > > > machine in a cluster (similar to the idea of using it > > > with kvm/qemu). > > > > As it currently stands you could likely run a vserver or an > > OpenVZ/Virtuozzo/LXC container on Ceph (the distributed FS) > > directly, rather than layering a local FS over RBD. Also, > > this would probably provide better performance in the end. > > Could you please explain why you would think that this > would provide better performance in the end? I would think > that a simpler local filesystem (with remote read writes) > could outperform ceph in most situations that would matter > for virtual systems (i.e. low latencies for small > read/writes), would it not? I would recommend benchmarking to have empirical results rather than going with my presumptions, but in Ceph the metadata servers cache the metadata and the OSDs journal writes, so any writes which fit in the journal will be quite fast. Also, RBD has no way of knowing what reads/writes are 'small' in the RBD block device, because it works by splitting the disk image into 4MB chunks and deals with those. That means that even small reads and writes have a minimum size of 4MB. > > > ...This is probably a better solution for container-based > > virtualization than RBD-based options, due to the advantage > > one can take of all guests sharing a kernel with the host. > > I am not sure I understand why you are saying the guest/host > sharing thing is an advantage that would benefit using ceph > over RBD, could you pleas expound? This is an advantage in the container virtualization case because you can (say) mount the entire Ceph FS on the host and treat the containers simply run the containers from a very basic LXC or other container config, treating the Ceph filesystem as just another directory tree from the point of view of the container. This simplifies your container config, and gives the advantages I named earlier (online resize, etc). > > RBD is more likely to be useful for full virtualization > > like KVM, > > Again, why so specifically? Because for containers, the config is simplest when you can hand them a directory tree, but for full virtualization, the config is simplest when you can hand them a block device. Simplicity reduces the number of potential points where errors can be introduced. > I agree that ceph would also have it's advantages, but > RBD based solutions would likely have some advantages > that ceph will never have. RBD allows one to use any > local filesystem with any semantics/features that one > wishes. RBD is simpler. RBD is likely currently more > mature than ceph? Ceph has POSIX (or as close as possible) semantics, matching local filesystems, and provides more features than any local FS except BtrFS, which is similarly under heavy development. RBD is actually a rather recent addition - the first mailing list message about it was on March 7th, 2010, whereas Ceph has been in development since 2007. I am posting this to the mailing list as well, as others may find it interesting. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html