Re: Ceph storage project for virtualization

Eneko Lacunza <elacunza@xxxxxxxxx> · Tue, 5 Mar 2024 12:26:30 +0100

Hi Egoitz,

I don't think it is a good idea, but can't comment about if that's 
possible because I don't know well enough Ceph's inner workings, maybe 
others can comment.

This is what worries me:
"

ach NFS redundant service of each datacenter will be composed by two
NFS gateways accessing to the OSDs of the placement group located in the
own datacenter. I planned achieving this with OSD weights and getting
with that the fact that the crush algorithm to build the map so that
each datacenter accesses end up having as master, the OSD of the own
datacenter in the placement group. Obviously, slave OSD replicas will
exist in the other three datacenters or even I don't discard the fact of
using erasure coding in some manner.

"
First, I don't think you got OSD weights right. Also, any write will be 
synchronous to the replicas so that's why I asked about latencies first. 
You may be able to read from DC-local "master" pgs (I recall someone 
doing this with host-local pgs...)

In the best case you'll have your data in a corner-case configuration, 
which may trigger strange bugs and/or behaviour not seen elsewhere.

I wouldn't like to be in such a position, but I don't know how valuable 
your data is...

I think it would be best to determine inter-DC network latency first; if 
you can choose DCs, then choose wisely with low enough latency ;) Then 
see if a regular Ceph storage configuration will give you good enough 
performance.

Another option would be to run DC-local ceph storages and to mirror to 
other DC.

Cheers

El 5/3/24 a las 11:50, egoitz@xxxxxxxxxxxxx escribió:
Hi Eneko!

I don't really have that data but I was planning to have as master OSD
only the ones in the same datacenter as the hypervisor using the
storage. The other datacenters would be just replicas. I assume you ask
it because replication is totally synchronous.

Well for doing step by step. Imagine for the moment, the point of
failure is a rack and all the replicas will be in the same datacenter in
different racks and rows. In this case the latency should be acceptable
and low.

My question was more related to the redundant nfs and if you have some
experience with similar setups. I was trying to know if first is
feasible what I'm planning to do.

Thank you so much :)

Cheers!

El 2024-03-05 11:43, Eneko Lacunza escribió:

Hi Egoitz,

What network latency between datacenters?

Cheers

El 5/3/24 a las 11:31, egoitz@xxxxxxxxxxxxx escribió:

Hi!

I have been reading some ebooks of Ceph and some doc and learning about
it. The goal of all it, is the fact of creating a rock solid storage por
virtual machines. After all the learning I have not been able to answer
by myself to this question so I was wondering if perhaps you could
clarify my doubt.

Let's imagine three datacenters, each one with for instance, 4
virtualization hosts. As I was planning to build a solution for diferent
hypervisors I have been thinking in the following env.

- I planed to have my Ceph storage (with different pools inside) with
OSDs in three different datacenters (as failure point).

- Each datacenter's hosts, will be accessing to a NFS redundant service
in the own datacenter.

- Each NFS redundant service of each datacenter will be composed by two
NFS gateways accessing to the OSDs of the placement group located in the
own datacenter. I planned achieving this with OSD weights and getting
with that the fact that the crush algorithm to build the map so that
each datacenter accesses end up having as master, the OSD of the own
datacenter in the placement group. Obviously, slave OSD replicas will
exist in the other three datacenters or even I don't discard the fact of
using erasure coding in some manner.

- The NFS gateways could be a NFS redundant gateway service from Ceph (I
have seen now they have developed something for this purpose
https://docs.ceph.com/en/quincy/mgr/nfs/) or perhaps two different
Debian machines, accessing to Ceph with rados and sharing to the
hypervisors that information over NFS. In case of Debian machines I have
heard good results using pacemaker/corosync for providing HA to that NFS
(between 0,5 and 3 seconds for fail over and service up again).

What do you think about this plan?. Do you see it feasible?. We will
work too with KVM and there we could access to Ceph directly but I would
needed to provide too storage por Xen and Vmware.

Thank you so much in advance,

Cheers!
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx