Re: cephadm docs on HA NFS

Sage Weil <sage@xxxxxxxxxxxx> · Fri, 30 Jul 2021 14:39:45 -0500

On Fri, Jul 30, 2021 at 10:39 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> Ok, got it. So basically you're mainly using haproxy to implement some
> poor-man's virtual IP handling. That obviously works, and you _could_
> just set up multiple haproxy addresses to allow for scale-out.

Right.  And you can set up multiple ingress services today in front of
the same (set of) ganesha daemon(s), each with its own virtual IP.

> As a side note, someone asking the other day about HA ganesha mentioned
> using ucarp, which looks pretty simple.
>
>     https://wiki.greentual.com/index.php/Ucarp
>
> It may be worth considering that instead, but it may not give you much
> if you need deal with haproxy for RGW anyway.

Interesting!  At first blush looks more elegant than keepalived, but
given that keepalived seems to work okay it might not be worth
fiddling with.

> That said, floating a VIP
> between machines is probably more efficient than proxying packets.

Yeah, it just doesn't have the same scale-out properties. What isn't
completely clear to me is whether there are cases where we will need
to scale ganesha based on CPU/memory before we approach network
limits.  If so, then that extra hop makes sense, but if ganesha is
lightweight and scales up, then making the ganesha be the only gateway
makes more sense.  (With RGW it's clear that we need the haproxy layer
in front for performance, not to mention the other HTTP-related
features in haproxy that we can take advantage of.)

> For migration plans, nothing is fully baked. Implementing migration to
> allow for taking a node out of the cluster live is not _too_ difficult.
> I have a rough draft implementation of that here:
>
>     https://github.com/jtlayton/nfs-ganesha/commits/fsloc
>
> With that you can mark an export's config on a node with
> "Moved = true;" and the node's clients should vacate it soon afterward.

One thing to keep in mind: currently all of the ganesha are sharing
most of their config--specifically including the export config blocks.
We'd want a way to set "Moved = true" (or some equivalent) in one
ganesha daemon's top-level config so we can drain everything...

> What I don't have yet is a way to allow the client to reclaim state on a
> different node. That mechanism could potentially be useful for breaking
> the constraint that reclaiming hosts always need to go to the same
> server. That would also enable us to live-move clients for balancing as
> well. I haven't sat down to design anything like that yet so I don't
> know how difficult it is, but I think it's possible.

If that big ganesha limitation were lifted, then our set of options
would expand dramatically.  We could have live-standby and fast
failover, and load balance via haproxy or anything else,
expand/contract the cluster, etc., without worrying about clients
moving between ganeshas.  I wonder if it would be less work and
ultimately more flexible than doing the delegations to make ganesha
responsible for moving clients around...

sage
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx