Le 24/08/2015 15:11, Julien Escario a écrit : > Hello, > First, let me advise I'm really a noob with Cephsince I have only read some > documentation. > > I'm now trying to deploy a Ceph cluster for testing purposes. The cluster is > based on 3 (more if necessary) hypervisors running proxmox 3.4. > > Before going futher, I have an essential question : is Ceph usable in a case of > multiple sites storage ? It depends on what you really need it to do (access patterns and behaviour when a link goes down). > > Long story : > My goal is to run hypervisors on 2 datacenters separated by 4ms latency. Note : unless you are studying Ceph behaviour in this case this goal is in fact a method to reach a goal. If you describe the actual goal you might get different suggestions. > Bandwidth is 1Gbps actually but will be upgraded in a near future. > > So is it possible to run a an active/active Ceph cluster to get a shared storage > between the two sites. It is but it probably won't behave correctly in your case. The latency and the bandwidth will hurt a lot. Any application requiring that data is confirmed stored on disk will be hit by the 4ms latency and 1Gbps will have to be shared between inter-site replication traffic and regular VM disk accesses. Your storage will most probably behave like a very slow single hard drive shared between all your VMs. Some workloads might work correctly (if you don't have any significant writes and most of your data will fit in caches for example). When the link between your 2 datacenters is severed, in the worst case (no quorum reachable or a crushmap that won't allow each pg to reach min_size with only one datacenter) everything will freeze, in the best case (giving priority to a single datacenter by running more monitors on it and a crushmap storing at least min_size replicas on it) when the link will be going down everything will run on this datacenter. You can get around a part of the performance problems by going with a 3-way replication, 2 replicas on your primary datacenter and 1 on the secondary where all OSD are configured with primary affinity 0. All reads will be served from the primary datacenter and only writes would go to the secondary. You'll have to run all your VM on the primary datacenter and setup your monitors such that the elected master will be in the primary datacenter (I believe it is chosen by the first name according to alphabetical order). You'll have a copy of your data on the secondary datacenter in case of a disaster on the primary but recovering will be hard (you'll have to reach a quorum of monitors in the secondary datacenter and I'm not sure how to proceed if you only have one out of 3 for example). > Of course, I'll have to be sure that no machien is > running at the same time on both sites. With your bandwidth and latency, without knowing more about your workloads it's probable that running VM on both sites will get you very slow IOs. Multi datacenter for simple object storage using RGW seems to work, but RBD volumes accesses are usually more demanding. > Hypervisor will be in charge of this. > > Is there a mean to ask Ceph to keep at least one copy (or two) in each site and > ask it to make all blocs reads from the nearest location ? > I'm aware that writes would have to be replicated and there's only a synchronous > mode for this. > > I've read many documentation and use cases about Ceph and it seems some are > saying it could be used in such replication and others are not. Need of erasure > coding isn't clear too. Don't use erasure coding for RBD volumes. You'll need a caching tier and it seems tricky to get right and might not be fully tested (I've seen a snapshot bug discussed here last week). Best regards, Lionel _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com