Re: Replacing DRBD use with RBD

Alex Elsayed <eternaleye@xxxxxxxxx> · Wed, 05 May 2010 13:02:44 -0700

Replying to a response that was off-list:

On Wed, May 5, 2010 at 9:02 AM, Martin Fick <mogulguy@xxxxxxxxx> wrote:
>
> --- On Wed, 5/5/10, Alex Elsayed <eternaleye@xxxxxxxxx> wrote:
>
> > >...This would open up the use of RBD devices for linux
> > > containers or linux vservers which could run on any
> > > machine in a cluster (similar to the idea of using it
> > > with kvm/qemu).
> >
> > As it currently stands you could likely run a vserver or an
> > OpenVZ/Virtuozzo/LXC container on Ceph (the distributed FS)
> > directly, rather than layering a local FS over RBD. Also,
> > this would probably provide better performance in the end.
>
> Could you please explain why you would think that this
> would provide better performance in the end?  I would think
> that a simpler local filesystem (with remote read writes)
> could outperform ceph in most situations that would matter
> for virtual systems (i.e. low latencies for small
> read/writes), would it not?

I would recommend benchmarking to have empirical results rather
than going with my presumptions, but in Ceph the metadata servers
cache the metadata and the OSDs journal writes, so any writes which
fit in the journal will be quite fast. Also, RBD has no way of
knowing what reads/writes are 'small' in the RBD block device,
because it works by splitting the disk image into 4MB chunks and
deals with those. That means that even small reads and writes
have a minimum size of 4MB.

>
> > ...This is probably a better solution for container-based
> > virtualization than RBD-based options, due to the advantage
> > one can take of all guests sharing a kernel with the host.
>
> I am not sure I understand why you are saying the guest/host
> sharing thing is an advantage that would benefit using ceph
> over RBD, could you pleas expound?

This is an advantage in the container virtualization case because
you can (say) mount the entire Ceph FS on the host and treat the
containers simply run the containers from a very basic LXC or other 
container config, treating the Ceph filesystem as just another
directory tree from the point of view of the container. This
simplifies your container config, and gives the advantages I named
earlier (online resize, etc).

> > RBD is more likely to be useful for full virtualization
> > like KVM,
>
> Again, why so specifically?

Because for containers, the config is simplest when you can hand
them a directory tree, but for full virtualization, the config is
simplest when you can hand them a block device. Simplicity reduces
the number of potential points where errors can be introduced.

> I agree that ceph would also have it's advantages, but
> RBD based solutions would likely have some advantages
> that ceph will never have.  RBD allows one to use any
> local filesystem with any semantics/features that one
> wishes. RBD is simpler.  RBD is likely currently more
> mature than ceph?

Ceph has POSIX (or as close as possible) semantics, matching local
filesystems, and provides more features than any local FS except
BtrFS, which is similarly under heavy development.

RBD is actually a rather recent addition - the first mailing
list message about it was on March 7th, 2010, whereas Ceph has
been in development since 2007.

I am posting this to the mailing list as well, as others may find it 
interesting.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html