Re: cephadm docs on HA NFS

Tom Barron <tbarron@xxxxxxxxxx> · Mon, 26 Jul 2021 15:40:03 -0400

On 26/07/21 20:50 +0200, Andreas Weisker wrote:

Am 26.07.21 um 20:26 schrieb Tom Barron:
On 26/07/21 09:17 -0500, Sage Weil wrote:
The design is for cephadm to resurrect the failed ganesha rank on
another host, just like k8s does.  The current gap in the
implementation is that the cephadm refresh is pretty slow (~10m) which
means that probably won't happen before the NFS state times out.  We
are working on improving the refresh/responsiveness right now, but
you're right that the current code isn't yet complete.

I think a PR that updates the docs to note that the ingress mode isn't
yet complete, and also adds a section that documents how to do
round-robin DNS would be the best immediate step?

sage

OK, good to know that ganesha will be resurrected on another node 
(as I had been thinking earlier) and that the refresh delay is being 
worked on.

Before Jeff's note I had also been assuming that the ingress isn't 
just another single point of failure but behaves more like a k8s 
ingress.  Is that correct or not?  FWIW, OpenStack wants to work 
with IPs rather than DNS/round robin so this does matter.

On my system, it didn't because I directly specified two nodes for 
ganesha instead of just the amount of daemons. Maybe that also should 
be more clear in the documentation.

Thanks.  For my use case (OpenStack Manila) we need one highly 
available IP per NFS cluster, with an ability to scale the daemons 
behind it to share load.

Btw, I tried to send another finding to the users list and got spam 
rejected or something. I tried iSCSI for ESXi and got segfaults in 
librados2.0 and losing paths all over the place. If this is something 
of interest, please let me know on how to give you more information.

BR

Andreas

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx