Re: cephadm docs on HA NFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 22/07/21 10:31 -0400, Jeff Layton wrote:
I think we probably need to redo this bit of documentation:

   https://docs.ceph.com/en/latest/cephadm/nfs/#high-availability-nfs

I would just spin up a patch, but I think we might also just want to
reconsider recommending an ingress controller at all.

Some people seem to be taking this to mean that they can shoot down one
of the nodes in the NFS server cluster, and the rest will just pick up
the load. That's not at all how this works.

If a NFS cluster node goes down, then it _must_ be resurrected in some
fashion, period. Otherwise, the MDS will eventually (in 5 mins) time out
the state it held and the NFS clients will not be able to reclaim their
state.

Given that, the bulleted list at the start of the doc above is wrong. We
cannot do any sort of failover if there is a host failure. My assumption
was that the orchestrator took care of starting up an NFS server
elsewhere if the host it was running on went down. Is that not the case?

In any case, think we should reconsider recommending an ingress
controller at all. It's really just another point of failure, and a lot
of people seem to be misconstruing what guarantees that offers.

Round-robin DNS would be a better option in this situation, and it
wouldn't be as problematic if we want to support things like live
shrinking the cluster in the future.
--
Jeff Layton <jlayton@xxxxxxxxxx>

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx


IIUC this section should not have "high availability" in its title.

You have implemented active-active load sharing but since one NFS cluster node cannot take over the work of another and there is no method (as in Kubernetes) to detect that the node is down and restart it, there is currently no HA. Arguably the k8s approach is more eventual recovery than HA, but I think you are saying there isn't even a relatively weak availability mechanism of that kind with the cephadm backed orchestrator.

Have I got this right?

-- Tom Barron

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux