Re: how should we manage ganesha's export tables from ceph-mgr ?

Ricardo Dias <rdias@xxxxxxxx> · Fri, 4 Jan 2019 12:35:02 +0000

On 04/01/19 12:05, Jeff Layton wrote:
> On Fri, 2019-01-04 at 10:35 +0000, Ricardo Dias wrote:
>>
>> On 03/01/19 19:05, Jeff Layton wrote:
>>> Ricardo,
>>>
>>> We chatted earlier about a new ceph mgr module that would spin up EXPORT
>>> blocks for ganesha and stuff them into a RADOS object. Here's basically
>>> what we're aiming for. I think it's pretty similar to what SuSE's
>>> solution is doing so I think it'd be good to collaborate here.
>>
>> Just to make things more clear, We (SUSE) didn't implement a specific
>> downstream implementation. The implementation we developed targets the
>> upstream ceph-dashboard code.
>>
>> The dashboard backend code to manage ganesha exports is almost done. We
>> still haven't opened a PR because we are still finishing the frontend
>> code, which might make the backend to change a bit.
>>
>> The current code is located here:
>> https://github.com/rjfd/ceph/tree/wip-dashboard-nfs
>>
>>> Probably I should write this up in a doc somewhere, but here's what I'd
>>> envision. First an overview:
>>>
>>> The Rook.io ceph+ganesha CRD basically spins up nfs-ganesha pods under
>>> k8s that don't export anything by default and have a fairly stock
>>> config. Each ganesha daemon that is started has a boilerplate config
>>> file that ends with a %url include like this:
>>>
>>>     %url rados://<pool>/<namespace>/conf-<nodeid>
>>>
>>> The nodeid in this case is the unique nodeid within a cluster of ganesha
>>> servers using the rados_cluster recovery backend in ganesha. Rook
>>> enumerates these starting with 'a' and going through 'z' (and then 'aa',
>>> 'ab', etc.). So node 'a' would have a config object called "conf-a".
>>>
>>
>> This was the same assumption we made, and the current implementation
>> code can manage the exports of different servers (configuration objects).
>>
>>> What we currently lack is the code to set up those conf-<nodeid>
>>> objects. I know you have some code to do this sort of configuration via
>>> the dashboard and a REST API. Would it make more sense to split this bit
>>> out into a separate module, which would also allow it to be usable from
>>> the command line?
>>
>> Yes, and no :) I think the benefit of splitting the code into a separate
>> module is on the possibility of other mgr modules to manage ganesha
>> exports using the mgr "_remote" call infrastructure, or if someone wants
>> to manage ganesha exports without enabling the dashboard module.
>>
>> Regarding CLI commands, since the dashboard code exposes the export
>> management through a REST API, we can always use curl to call it
>> (although it will be a more verbose command).
>>
>> In the dashboard source directory we have a small bash script to help
>> calling the REST API from the CLI. Here's an example of an export
>> creation using the current implementation:
>>
>> $ ./run-backend-api-request.sh POST /api/nfs-ganesha/export \
>>   '{
>>      "hostname": "node1.domain",  \
>>      "path": "/foo", \
>>      "fsal": {"name": "CEPH", "user_id":"admin", "fs_name": "myfs"}, \
>>
>>      "pseudo": "/foo", \
>>      "tag": null, \
>>      "access_type": "RW", \
>>      "squash": "no_root_squash", \
>>      "protocols":[4], \
>>      "transports": ["TCP"], \
>>      "clients": [{ \
>>        "addresses":["10.0.0.0/8"], \
>>        "access_type": "RO", \
>>        "squash": "root" \
>>      }]}'
>>
>> The json fields and structure is similar to the ganesha export
>> configuration structure.
>>
>> We also have other commands:
>>
>> # list all exports
>> $ ./run-backend-api-request.sh GET /api/nfs-ganesha/export
>>
>> # get an export
>> $ ./run-backend-api-request.sh GET \
>> 	/api/nfs-ganesha/export/<hostname>/<id>
>>
>> # update an export
>> $ ./run-backend-api-request.sh PUT \
>> 	/api/nfs-ganesha/export/<hostname>/<id> <json string>
>>
>> # delete an export
>> $ ./run-backend-api-request.sh DELETE \
>> 	/api/nfs-ganesha/export/<hostname>/<id>
>>
>>
>> In the dashboard implementation, the server configuration is identified
>> by the <hostname> field, which does not need to be a real hostname.
>> The dashboard keeps a map between the hostname and the rados object URL
>> that stores the configuration of the server.
>>
> 
> Ok, that all sounds fine, actually. I think we can probably live with
> the REST API for this.
> 
> It might be good to rename the "hostname" field to something more
> generic (maybe nodeid). The rados_cluster recovery backend for ganesha
> requires a unique nodeid for each node. If it's not specified then it
> will use the hostname.

Sounds good to me.

> 
>> The bootstrap of this host/rados_url map can be done in two ways:
>> a) automatically: when an orchestrator backend is avaliable, the
>> dashboard asks the orchestrator for this information.
>> b) manually: the dashboard provides some CLI commands to add this
>> information. Example:
>> $  ceph dashboard ganesha-host-add <hostname> <rados_url>
>>
> 
> I'll have to think about this bit.
> 
> The main use case we're interested in currently is Openstack Manila:
> 
>     https://wiki.openstack.org/wiki/Manila
> 
> It has its own REST API, and admins can request new servers to be
> started or volumes to be created and exported.
> 
> What I had envisioned was that requests to manila would get translated
> into requests to do things like:
> 
> create volumes and subvolumes
> ask the orchestrator to spin up a new daemon
> modify the conf-* objects and ask the orchestrator to send daemons a
> SIGHUP
> 
> I think what you're proposing should be fine there. Probably I just need
> to pull down your wip branch and play with it to better understand.

I think all should work in the above use case if the dashboard is using
the orchestrator. After the orchestrator spins the new daemon, the
dashboard will have access to the new daemon configuration without
manual intervention.

> 
>>
>>> My thinking was that we'd probably want to create a new mgr module for
>>> that, and could wire it up to the command line with something like:
>>>
>>>     $ ceph nfsexport create --id=100			\
>>> 			--pool=mypool			\
>>> 			--namespace=mynamespace		\
>>> 			--type=cephfs			\
>>> 			--volume=myfs			\
>>> 			--subvolume=/foo		\
>>> 			--pseudo=/foo			\
>>> 			--cephx_userid=admin		\
>>> 			--cephx_key=<base64 key>	\
>>> 			--client=10.0.0.0/8,ro,root	\
>>> 			--client=admhost,rw,none
>>>
>>> ...the "client" is a string that would be a tuple of client access
>>> string, r/o or r/w, and the userid squashing mode, and could be
>>> specified multiple times.
>>
>> The above command is similar to what we provide in the REST API with the
>> difference that the dashboard generates the export ID.
>>
>> Do you think it is important for the user to explicitly specify the
>> export ID?
>>
> 
> No, it'd be fine to autogenerate those in some fashion.
> 
>>> We'd also want to add a way to remove and enumerate exports. Maybe:
>>>
>>>     $ ceph nfsexport ls
>>>     $ ceph nfsexport rm --id=100
>>>
>>> So the create command above would create an object called "export-100"
>>> in the given rados_pool/rados_namespace. 
>>>
>>> From there, we'd need to also be able to "link" and "unlink" these
>>> export objects into the config files for each daemon. So if I have a
>>> cluster of 2 servers with nodeids "a" and "b":
>>>
>>>     $ ceph nfsexport link --pool=mypool			\
>>> 			--namespace=mynamespace		\
>>> 			--id=100 			\
>>> 			--node=a			\
>>> 			--node=b
>>>
>>> ...with a corresponding "unlink" command. That would append objects
>>> called "conf-a" and "conf-b" with this line:
>>>
>>>     %url rados://mypool/mynamespace/export-100
>>>
>>> ...and then call into the orchestrator to send a SIGHUP to the daemons
>>> to make them pick up the new configs. We might also want to sanity check
>>> whether any conf-* files are still linked to the export-* files before
>>> removing those objects.
>>>
>>> Thoughts?
>>
>> I got a bit lost with this link/unlink part. In the current dashboard
>> implementation, when we create an export the implementation will add the
>> export configuration into the rados://<pool>/<namespace>/conf-<nodeid>
>> object and call the orchestrator to update/restart the service.
>>
>> It looks to me that you are separating the export creation from the
>> export deployment. First you create the export, and then you add it to
>> the service configuration.
>>
>> We can also implement this two-step behavior in the dashboard
>> implementation and in the dashboard Web UI we can have a checkbox where
>> the user can specify if it wants to apply the new export right away or not.
>>
>> In the dashboard, we will also implement a "copy" command to copy an
>> export configuration to another ganesha server. That will help with
>> creating similar exports in different servers.
>> Another option would be to instead of having a single "<hostname>" field
>> in the create export function, to have a list of <hostname>s.
>>
> 
> The two-step process was not so much for immediacy as to eliminate the
> need to replicate all of the EXPORT blocks across a potentially large
> series of objects. If we (e.g.) needed to modify a CLIENT block to allow
> a new subnet to have access, we'd only need to change the one object and
> then SIGHUP all of the daemons.
> 
> That said, if replicating those blocks across multiple objects is
> simpler then we'll adapt.

After thinking more about this, I think the approach you suggest about
using an object for each export and then link it to the servers
configuration makes more sense, and avoids the need of a "Copy Export"
operation.

Since you will also consume the dashboard REST API besides the dashboard
frontend, I'll open a PR with just the backend implementation, so that
it can be merged quickly without waiting for the frontend to be ready.

-- 
Ricardo Dias
Senior Software Engineer - Storage Team
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284
(AG Nürnberg)

Attachment:
signature.asc

Description: OpenPGP digital signature