On Mon, Jul 26, 2021 at 1:26 PM Tom Barron <tbarron@xxxxxxxxxx> wrote: > On 26/07/21 09:17 -0500, Sage Weil wrote: > >The design is for cephadm to resurrect the failed ganesha rank on > >another host, just like k8s does. The current gap in the > >implementation is that the cephadm refresh is pretty slow (~10m) which > >means that probably won't happen before the NFS state times out. We > >are working on improving the refresh/responsiveness right now, but > >you're right that the current code isn't yet complete. > > > >I think a PR that updates the docs to note that the ingress mode isn't > >yet complete, and also adds a section that documents how to do > >round-robin DNS would be the best immediate step? > > OK, good to know that ganesha will be resurrected on another node (as > I had been thinking earlier) and that the refresh delay is being > worked on. > > Before Jeff's note I had also been assuming that the ingress isn't > just another single point of failure but behaves more like a k8s > ingress. Is that correct or not? FWIW, OpenStack wants to work with > IPs rather than DNS/round robin so this does matter. Yeah, ingress is meant to fill a similar role as k8s ingress, where that role is roughly "whatever magic is necessary to make traffic distributed and highly-available". Currently we use keepalived and haproxy, although that implementation could conceivably be switched around in the future. With cephadm, the endpoint is a single user-specified virtual IP. We haven't implemented the orchestrator+k8s glue yet to control k8s ingress services, but my (limited) understanding is that there is a broad range of k8s ingress implementations that may have a slightly different model (e.g., ingress using AWS services may dynamically allocate an IP and/or DNS instead of asking the user to provide one). To make NFS work using round-robin DNS, the user would need to extract the list of IPs for ganeshas from cephadm (e.g., examine 'ceph orch ps --daemon-type nfs --format json'), probably on a periodic basis in case failures or configuration changes lead cephadm to redeploy ganesha daemons elsewhere in the cluster. Working on a documentation patch to describe this approach. sage > > Thanks! > > -- Tom Barron > > > > > > >On Thu, Jul 22, 2021 at 9:32 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > >> > >> I think we probably need to redo this bit of documentation: > >> > >> https://docs.ceph.com/en/latest/cephadm/nfs/#high-availability-nfs > >> > >> I would just spin up a patch, but I think we might also just want to > >> reconsider recommending an ingress controller at all. > >> > >> Some people seem to be taking this to mean that they can shoot down one > >> of the nodes in the NFS server cluster, and the rest will just pick up > >> the load. That's not at all how this works. > >> > >> If a NFS cluster node goes down, then it _must_ be resurrected in some > >> fashion, period. Otherwise, the MDS will eventually (in 5 mins) time out > >> the state it held and the NFS clients will not be able to reclaim their > >> state. > >> > >> Given that, the bulleted list at the start of the doc above is wrong. We > >> cannot do any sort of failover if there is a host failure. My assumption > >> was that the orchestrator took care of starting up an NFS server > >> elsewhere if the host it was running on went down. Is that not the case? > >> > >> In any case, think we should reconsider recommending an ingress > >> controller at all. It's really just another point of failure, and a lot > >> of people seem to be misconstruing what guarantees that offers. > >> > >> Round-robin DNS would be a better option in this situation, and it > >> wouldn't be as problematic if we want to support things like live > >> shrinking the cluster in the future. > >> -- > >> Jeff Layton <jlayton@xxxxxxxxxx> > >> > >> _______________________________________________ > >> Dev mailing list -- dev@xxxxxxx > >> To unsubscribe send an email to dev-leave@xxxxxxx > >_______________________________________________ > >Dev mailing list -- dev@xxxxxxx > >To unsubscribe send an email to dev-leave@xxxxxxx > > > _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx