Re: Replacing DRBD use with RBD

Yehuda Sadeh Weinraub <yehudasa@xxxxxxxxx> · Wed, 5 May 2010 13:13:06 -0700

On Wed, May 5, 2010 at 1:02 PM, Alex Elsayed <eternaleye@xxxxxxxxx> wrote:
> Replying to a response that was off-list:
>
> On Wed, May 5, 2010 at 9:02 AM, Martin Fick <mogulguy@xxxxxxxxx> wrote:
>>
>> --- On Wed, 5/5/10, Alex Elsayed <eternaleye@xxxxxxxxx> wrote:
>>
>> > >...This would open up the use of RBD devices for linux
>> > > containers or linux vservers which could run on any
>> > > machine in a cluster (similar to the idea of using it
>> > > with kvm/qemu).
>> >
>> > As it currently stands you could likely run a vserver or an
>> > OpenVZ/Virtuozzo/LXC container on Ceph (the distributed FS)
>> > directly, rather than layering a local FS over RBD. Also,
>> > this would probably provide better performance in the end.
>>
>> Could you please explain why you would think that this
>> would provide better performance in the end?  I would think
>> that a simpler local filesystem (with remote read writes)
>> could outperform ceph in most situations that would matter
>> for virtual systems (i.e. low latencies for small
>> read/writes), would it not?
>
> I would recommend benchmarking to have empirical results rather
> than going with my presumptions, but in Ceph the metadata servers
> cache the metadata and the OSDs journal writes, so any writes which
> fit in the journal will be quite fast. Also, RBD has no way of
> knowing what reads/writes are 'small' in the RBD block device,
> because it works by splitting the disk image into 4MB chunks and
> deals with those. That means that even small reads and writes
> have a minimum size of 4MB.

Just a correction. Although rbd stripes data over 4MB objects, it can
do reads and writes in a sector size granularity, that is 512 bytes.

>
>>
>> > ...This is probably a better solution for container-based
>> > virtualization than RBD-based options, due to the advantage
>> > one can take of all guests sharing a kernel with the host.
>>
>> I am not sure I understand why you are saying the guest/host
>> sharing thing is an advantage that would benefit using ceph
>> over RBD, could you pleas expound?
>
> This is an advantage in the container virtualization case because
> you can (say) mount the entire Ceph FS on the host and treat the
> containers simply run the containers from a very basic LXC or other
> container config, treating the Ceph filesystem as just another
> directory tree from the point of view of the container. This
> simplifies your container config, and gives the advantages I named
> earlier (online resize, etc).
>
>> > RBD is more likely to be useful for full virtualization
>> > like KVM,
>>
>> Again, why so specifically?
>
> Because for containers, the config is simplest when you can hand
> them a directory tree, but for full virtualization, the config is
> simplest when you can hand them a block device. Simplicity reduces
> the number of potential points where errors can be introduced.
>
>> I agree that ceph would also have it's advantages, but
>> RBD based solutions would likely have some advantages
>> that ceph will never have.  RBD allows one to use any
>> local filesystem with any semantics/features that one
>> wishes. RBD is simpler.  RBD is likely currently more
>> mature than ceph?
>
> Ceph has POSIX (or as close as possible) semantics, matching local
> filesystems, and provides more features than any local FS except
> BtrFS, which is similarly under heavy development.
>
> RBD is actually a rather recent addition - the first mailing
> list message about it was on March 7th, 2010, whereas Ceph has
> been in development since 2007.

It is a recent addition, although most of it uses the same ceph
filesystem infrastructure that is in development since 2007, so in a
sense rbd is just a small extension of the ceph filesystem. The ceph
filesystem is indeed much more mature and had undergone much more
extensive testing. Hopefully, rbd is simple enough that it won't take
too long to get it on par with the fs.

Thanks,
Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html