Gary,
Yes, Ceph can provide a highly available infrastructure. A few pointers:
- IO will stop if anything less than a majority of the monitors are
functioning properly. So, run a minimum of 3 Ceph monitors distributed
across your data center's failure domains. If you choose to add more
monitors, it is advisable to add them in pairs to maintain an odd number.
- Run at least a replication size of 2 (lots of operators choose 3 for
more redundancy). Ceph will gracefully handle failure conditions of the
primary OSD once it is automatically or manually marked as "down".
- Design your CRUSH map to mimic the failure domains in your datacenter
(root, room, row, rack, chassis, host, etc). Use the CRUSH chooseleaf
rules to spread replicas across the largest failure domain that will
have more entries than your replication factor. For instance, try to
replicate across racks rather than hosts if your cluster will be large
enough.
- Don't set the "ceph osd set nodown" flag on your cluster, as it will
prevent osds from being marked as down automatically if unavailable,
substantially diminishing the HA capabilities.
Cheers,
Mike Dawson
On 12/2/2013 11:43 AM, Gary Harris (gharris) wrote:
Hi, I have a question about CEP high availability when integrated with
OpenStack.
Assuming you have all openstack controller nodes in HA mode would your
actually
have an HA CEPH implementation as well meaning two Primary OSDs or both
pointing to the primary?
Or, do the client requests get forwarded automatically to the secondary
OSD should the primary fail?
(Excuse the simplifications):)
So my assumption is that if the primary fails, the mon would detect and
a new primary OSD candidate would be presented to clients?
Thanks,
Gary
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com