Re: Basic Ceph questions

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Wed, 8 Oct 2014 18:37:19 -0700

Comments inline.
On Tue, Oct 7, 2014 at 5:51 PM, Marcus White <roastedseaweed.k@xxxxxxxxx> wrote:
Hello,

Some basic Ceph questions, would appreciate your help:) Sorry about

the number and detail in advance!

a. Ceph RADOS is strongly consistent and different from usual object,

does that mean all metadata also, container and account etc is all

consistent and everything is updated in the path of the client

operation itself, for a single site?

Yes.  In a single site, it's CP out of CAP.

b. If it is strongly consistent, is that the case across sites also?

How can it be performant across geo sites if that is the case? If its

choosing consistency over partitioning and availability...For object,

I read somewhere that it is now eventually consistent(local CP,

remotely AP) via DR. Gets a bit confusing with all the literature out

there. If it is DR, isnt that slightly different from the Swift case?

If you're referring to RadosGW Federation, no.  That replication is async.  The replication has several delays built in, so the fastest you could to see your data show up in the secondary is about a minute.  Longer if the file takes a while to transfer, or you have a lot of activity to replicate.

Each site is still CP.  There is just delay getting data from the primary to the secondary.

If you want CP in multiple locations, that's doable by creating one cluster that spans both locations, and tuning the CRUSH rules to make sure the object is written to both locations. You really want a low latency connection between the two sites.

I tested one cluster in two colos with 20ms of latency between them.  It worked, but it was noticeably slow.  I went with two clusters and async replication.

c. For block, is it CP on a single site and then usual DR to another

site using snapshotting?

Yes.

d. For block, is it just a linux block device or is it SCSI? Is it a

custom device driver running within Linux which hooks into the block

layer? Trying to understand the layering diagram.

I'm a bit out of my element here, but there is a kernel module and a FUSE module.  The kernel module connects RDB images to a /dev/rbd/... block device.  It can then be used however you would use a block device.  Most people put a filesystem on it, but it's not required.  I'm really unfamiliar with the FUSE module.

Several people are exporting RDB images via iSCSI and Fiber Channel.

e. Do the snapshot, compression features come from the underlying file system?

It depends on the filesystem.  Ceph will emulate any required features that the FS doesn't support.  For example, ext4 and XFS have no snapshots, so Ceph has track them itself.  On BtrFS, Ceph uses the native snapshots, and it much quicker because of it.

f. What is the plan for deduplication? If that comes from the local

file system, how would it deduplicate across nodes to achieve the best

dedup ratio?

I don't believe Ceph does anything with de-dup.  If the FS underneath has it turned on, it can de-dup the stuff it sees, but there's no cluster-wide de-dup.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com