Re: Basic Ceph questions

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Fri, 10 Oct 2014 10:18:27 -0700

Just curious, what kind of applications use RBD? It cant be

applications which need high speed SAN storage performance

characteristics?

Most people seem to be using it as storage for OpenStack.

I've heard about people using RDB + Heartbeats to make an HA NFS, while they wait for CephFS to be production ready.

People that are re-exporting images via iSCSI and Fiber Channel are probably doing something different.  If I had to hazard a guess, I'd guess that they're doing some sort of HA Clustered service, like a database.  That's the traditional use for shared storage.

For VMs, I am trying to visualize how the RBD device would be exposed.

Where does the driver live exactly? If its exposed via libvirt and

QEMU, does the kernel driver run in the host OS, and communicate with

a backend Ceph cluster? If yes, does libRBD provide a target (SCSI?)

interface which the kernel driver connects to? Trying to visualize

what the stack looks like, and the flow of IOs for block devices.

I'll have to leave that for others to answer. 

>>

>> b. If it is strongly consistent, is that the case across sites also?

>> How can it be performant across geo sites if that is the case? If its

>> choosing consistency over partitioning and availability...For object,

>> I read somewhere that it is now eventually consistent(local CP,

>> remotely AP) via DR. Gets a bit confusing with all the literature out

>> there. If it is DR, isnt that slightly different from the Swift case?

>

>

> If you're referring to RadosGW Federation, no.  That replication is async.

> The replication has several delays built in, so the fastest you could to see

> your data show up in the secondary is about a minute.  Longer if the file

> takes a while to transfer, or you have a lot of activity to replicate.

>

> Each site is still CP.  There is just delay getting data from the primary to

> the secondary.

In that case, it is like Swift, only differently done. The async makes

it eventually consistent across sites, no?

I'm not sure regarding Swift.  Also outside my experience.

But yes, the async replication is eventually consistent, with no guarantee.  Problems during replication can cause the clusters to get out of sync.  The replication agent will retry failures, but it doesn't store the information anywhere.  If you restart the replication agent when it had known failures, those failures won't be retried.  Every one of the errors is logged, so I was able to manually download & re-upload the file to the primary cluster, which triggered re-replication.

So far, all of the inconsistencies have shown up by comparing bucket listing.  I'm in the process of manually verifying checksums (my application stores a SHA256 for every object uploaded).  So far, I haven't had any failures in files that were marked as successfully replicated.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com