On 08/01/19 14:13, Jeff Layton wrote: > On Fri, 2019-01-04 at 12:35 +0000, Ricardo Dias wrote: >> >> On 04/01/19 12:05, Jeff Layton wrote: >>> On Fri, 2019-01-04 at 10:35 +0000, Ricardo Dias wrote: >>>> On 03/01/19 19:05, Jeff Layton wrote: >>>>> Ricardo, >>>>> >>>>> We chatted earlier about a new ceph mgr module that would spin up EXPORT >>>>> blocks for ganesha and stuff them into a RADOS object. Here's basically >>>>> what we're aiming for. I think it's pretty similar to what SuSE's >>>>> solution is doing so I think it'd be good to collaborate here. >>>> >>>> Just to make things more clear, We (SUSE) didn't implement a specific >>>> downstream implementation. The implementation we developed targets the >>>> upstream ceph-dashboard code. >>>> >>>> The dashboard backend code to manage ganesha exports is almost done. We >>>> still haven't opened a PR because we are still finishing the frontend >>>> code, which might make the backend to change a bit. >>>> >>>> The current code is located here: >>>> https://github.com/rjfd/ceph/tree/wip-dashboard-nfs >>>> >>>>> Probably I should write this up in a doc somewhere, but here's what I'd >>>>> envision. First an overview: >>>>> >>>>> The Rook.io ceph+ganesha CRD basically spins up nfs-ganesha pods under >>>>> k8s that don't export anything by default and have a fairly stock >>>>> config. Each ganesha daemon that is started has a boilerplate config >>>>> file that ends with a %url include like this: >>>>> >>>>> %url rados://<pool>/<namespace>/conf-<nodeid> >>>>> >>>>> The nodeid in this case is the unique nodeid within a cluster of ganesha >>>>> servers using the rados_cluster recovery backend in ganesha. Rook >>>>> enumerates these starting with 'a' and going through 'z' (and then 'aa', >>>>> 'ab', etc.). So node 'a' would have a config object called "conf-a". >>>>> >>>> >>>> This was the same assumption we made, and the current implementation >>>> code can manage the exports of different servers (configuration objects). >>>> >>>>> What we currently lack is the code to set up those conf-<nodeid> >>>>> objects. I know you have some code to do this sort of configuration via >>>>> the dashboard and a REST API. Would it make more sense to split this bit >>>>> out into a separate module, which would also allow it to be usable from >>>>> the command line? >>>> >>>> Yes, and no :) I think the benefit of splitting the code into a separate >>>> module is on the possibility of other mgr modules to manage ganesha >>>> exports using the mgr "_remote" call infrastructure, or if someone wants >>>> to manage ganesha exports without enabling the dashboard module. >>>> >>>> Regarding CLI commands, since the dashboard code exposes the export >>>> management through a REST API, we can always use curl to call it >>>> (although it will be a more verbose command). >>>> >>>> In the dashboard source directory we have a small bash script to help >>>> calling the REST API from the CLI. Here's an example of an export >>>> creation using the current implementation: >>>> >>>> $ ./run-backend-api-request.sh POST /api/nfs-ganesha/export \ >>>> '{ >>>> "hostname": "node1.domain", \ >>>> "path": "/foo", \ >>>> "fsal": {"name": "CEPH", "user_id":"admin", "fs_name": "myfs"}, \ >>>> >>>> "pseudo": "/foo", \ >>>> "tag": null, \ >>>> "access_type": "RW", \ >>>> "squash": "no_root_squash", \ >>>> "protocols":[4], \ >>>> "transports": ["TCP"], \ >>>> "clients": [{ \ >>>> "addresses":["10.0.0.0/8"], \ >>>> "access_type": "RO", \ >>>> "squash": "root" \ >>>> }]}' >>>> >>>> The json fields and structure is similar to the ganesha export >>>> configuration structure. >>>> >>>> We also have other commands: >>>> >>>> # list all exports >>>> $ ./run-backend-api-request.sh GET /api/nfs-ganesha/export >>>> >>>> # get an export >>>> $ ./run-backend-api-request.sh GET \ >>>> /api/nfs-ganesha/export/<hostname>/<id> >>>> >>>> # update an export >>>> $ ./run-backend-api-request.sh PUT \ >>>> /api/nfs-ganesha/export/<hostname>/<id> <json string> >>>> >>>> # delete an export >>>> $ ./run-backend-api-request.sh DELETE \ >>>> /api/nfs-ganesha/export/<hostname>/<id> >>>> >>>> >>>> In the dashboard implementation, the server configuration is identified >>>> by the <hostname> field, which does not need to be a real hostname. >>>> The dashboard keeps a map between the hostname and the rados object URL >>>> that stores the configuration of the server. >>>> >>> >>> Ok, that all sounds fine, actually. I think we can probably live with >>> the REST API for this. >>> >>> It might be good to rename the "hostname" field to something more >>> generic (maybe nodeid). The rados_cluster recovery backend for ganesha >>> requires a unique nodeid for each node. If it's not specified then it >>> will use the hostname. >> >> Sounds good to me. >> >>>> The bootstrap of this host/rados_url map can be done in two ways: >>>> a) automatically: when an orchestrator backend is avaliable, the >>>> dashboard asks the orchestrator for this information. >>>> b) manually: the dashboard provides some CLI commands to add this >>>> information. Example: >>>> $ ceph dashboard ganesha-host-add <hostname> <rados_url> >>>> >>> >>> I'll have to think about this bit. >>> >>> The main use case we're interested in currently is Openstack Manila: >>> >>> https://wiki.openstack.org/wiki/Manila >>> >>> It has its own REST API, and admins can request new servers to be >>> started or volumes to be created and exported. >>> >>> What I had envisioned was that requests to manila would get translated >>> into requests to do things like: >>> >>> create volumes and subvolumes >>> ask the orchestrator to spin up a new daemon >>> modify the conf-* objects and ask the orchestrator to send daemons a >>> SIGHUP >>> >>> I think what you're proposing should be fine there. Probably I just need >>> to pull down your wip branch and play with it to better understand. >> >> I think all should work in the above use case if the dashboard is using >> the orchestrator. After the orchestrator spins the new daemon, the >> dashboard will have access to the new daemon configuration without >> manual intervention. >> >>>>> My thinking was that we'd probably want to create a new mgr module for >>>>> that, and could wire it up to the command line with something like: >>>>> >>>>> $ ceph nfsexport create --id=100 \ >>>>> --pool=mypool \ >>>>> --namespace=mynamespace \ >>>>> --type=cephfs \ >>>>> --volume=myfs \ >>>>> --subvolume=/foo \ >>>>> --pseudo=/foo \ >>>>> --cephx_userid=admin \ >>>>> --cephx_key=<base64 key> \ >>>>> --client=10.0.0.0/8,ro,root \ >>>>> --client=admhost,rw,none >>>>> >>>>> ...the "client" is a string that would be a tuple of client access >>>>> string, r/o or r/w, and the userid squashing mode, and could be >>>>> specified multiple times. >>>> >>>> The above command is similar to what we provide in the REST API with the >>>> difference that the dashboard generates the export ID. >>>> >>>> Do you think it is important for the user to explicitly specify the >>>> export ID? >>>> >>> >>> No, it'd be fine to autogenerate those in some fashion. >>> >>>>> We'd also want to add a way to remove and enumerate exports. Maybe: >>>>> >>>>> $ ceph nfsexport ls >>>>> $ ceph nfsexport rm --id=100 >>>>> >>>>> So the create command above would create an object called "export-100" >>>>> in the given rados_pool/rados_namespace. >>>>> >>>>> From there, we'd need to also be able to "link" and "unlink" these >>>>> export objects into the config files for each daemon. So if I have a >>>>> cluster of 2 servers with nodeids "a" and "b": >>>>> >>>>> $ ceph nfsexport link --pool=mypool \ >>>>> --namespace=mynamespace \ >>>>> --id=100 \ >>>>> --node=a \ >>>>> --node=b >>>>> >>>>> ...with a corresponding "unlink" command. That would append objects >>>>> called "conf-a" and "conf-b" with this line: >>>>> >>>>> %url rados://mypool/mynamespace/export-100 >>>>> >>>>> ...and then call into the orchestrator to send a SIGHUP to the daemons >>>>> to make them pick up the new configs. We might also want to sanity check >>>>> whether any conf-* files are still linked to the export-* files before >>>>> removing those objects. >>>>> >>>>> Thoughts? >>>> >>>> I got a bit lost with this link/unlink part. In the current dashboard >>>> implementation, when we create an export the implementation will add the >>>> export configuration into the rados://<pool>/<namespace>/conf-<nodeid> >>>> object and call the orchestrator to update/restart the service. >>>> >>>> It looks to me that you are separating the export creation from the >>>> export deployment. First you create the export, and then you add it to >>>> the service configuration. >>>> >>>> We can also implement this two-step behavior in the dashboard >>>> implementation and in the dashboard Web UI we can have a checkbox where >>>> the user can specify if it wants to apply the new export right away or not. >>>> >>>> In the dashboard, we will also implement a "copy" command to copy an >>>> export configuration to another ganesha server. That will help with >>>> creating similar exports in different servers. >>>> Another option would be to instead of having a single "<hostname>" field >>>> in the create export function, to have a list of <hostname>s. >>>> >>> >>> The two-step process was not so much for immediacy as to eliminate the >>> need to replicate all of the EXPORT blocks across a potentially large >>> series of objects. If we (e.g.) needed to modify a CLIENT block to allow >>> a new subnet to have access, we'd only need to change the one object and >>> then SIGHUP all of the daemons. >>> >>> That said, if replicating those blocks across multiple objects is >>> simpler then we'll adapt. >> >> After thinking more about this, I think the approach you suggest about >> using an object for each export and then link it to the servers >> configuration makes more sense, and avoids the need of a "Copy Export" >> operation. >> >> Since you will also consume the dashboard REST API besides the dashboard >> frontend, I'll open a PR with just the backend implementation, so that >> it can be merged quickly without waiting for the frontend to be ready. >> >> > > Thanks! I spent some time playing around with your branch and I think it > looks like it'll work just fine for us. > > Just a few notes for anyone else that wants to do this. As Ricardo > mentioned separately, RADOS namespace support is not yet plumbed in, so > I'm using a separate "nfs-ganesha" pool here to house the RADOS objects > needed for ganesha's configs and recovery backend: > > $ MON=3 OSD=1 MGR=1 MDS=1 ../src/vstart.sh -n > > $ ganesha-rados-grace -p nfs-ganesha `hostname` > > $ rados create -p nfs-ganesha conf-`hostname` > > $ ceph dashboard ganesha-host-add `hostname` rados://nfs-ganesha/conf-`hostname` > > $ ./run-backend-api-request.sh POST /api/nfs-ganesha/export "`cat ~/export.json`" > > ...where export.json is something like: > > -----------------[snip]------------------ > { > "hostname": "server_hostname", > "path": "/foo", > "fsal": {"name": "CEPH", "user_id":"admin", "fs_name": "myfs"}, > "pseudo": "/foo", > "tag": null, > "access_type": "RW", > "squash": "no_root_squash", > "protocols": [4], > "transports": ["TCP"], > "clients": [{ > "addresses":["10.0.0.0/8"], > "access_type": "RO", > "squash": "root" > }] > } > -----------------[snip]------------------- > > This creates an object with a ganesha config EXPORT block which looks > valid. We may need to tweak it a bit, but I think this should work just > fine. Thanks for posting the above steps! > > I know the web UI is still pretty raw, but here are some comments > anyway: > > For safety reasons, the default Access Type should probably be "RO", and > the default Squash mode should be "Root" or maybe even "All". You may > also want to somehow ensure that the admin consciously decides to export > to the world instead of making that the default when no client is > specified. This is very valuable information. I never administrated an NFS ganesha server and therefore don't have experience on what should be the defaults. Thanks for the suggestions. > > It'd be nice to be able to granularly select the NFSv4 minorversions. If > you exclusively have NFSv4.1+ clients, then the grace period can be > lifted early after a restart. That's a big deal for continuous > operation. In our clustered configurations, we plan to not support > anything before v4.1 by default. I didn't know about the existence of minorversions. Where can get the list of all possible values for the protocol version? > > We'll probably need some way to specify the fsal.user_id field in the UI > too. Maybe a dropdown box that enumerates the available principals? Yes, and that's already been done by Tiago Melo (tmelo on IRC) in its development branch. I believe he has added a dropdown with the list of cephx users. > > That's all for now. I think what I'll probably do is close out my PR to > add NFS support to the orchestrator and concentrate on wiring the rook > orchestrator into what you have, since it's more complete. > > Cheers! > -- Ricardo Dias Senior Software Engineer - Storage Team SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
Attachment:
signature.asc
Description: OpenPGP digital signature