The primary OSD for an object is responsible for the replication. In a healthy cluster the workflow is as such:
- Client looks up primary OSD in CRUSH map
- Client sends object to be written to primary OSD
- Primary OSD looks up replication OSD(s) in its CRUSH map
- Primary OSD contacts replication OSD(s) and sends objects
- All OSDs commit object to local journal
- Replication OSD(s) report back to primary that the write is committed
- On the primary OSD, after ack of write from replication OSD(s) and it's own local journal does the primary OSD ack the write to the client
- Client receives ack and knows that the object is safely stored and replicated in the cluster
Ceph has a strong consistency model and will not tell the client the write is complete until it is replicated in the cluster.
On Thu, Mar 12, 2015 at 12:26 PM, tombo <tombo@xxxxxx> wrote:
Hello,
I need to understand how replication is accomplished or who is taking care of replication, osd itsef? Because we are using librados to read/write to cluster. If librados is not doing parallel writes according desired number of object copies, it could happen that objects are in journal waiting for flush and osd went down so objects are hung in journal? Or do they already have their copies on other osds which means that librados is resposible for redundancy?
Thanks for explanation.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com