On Wed, 2019-01-09 at 23:37 +0000, Ricardo Dias wrote: > > On 08/01/19 14:13, Jeff Layton wrote: > > On Fri, 2019-01-04 at 12:35 +0000, Ricardo Dias wrote: > > > On 04/01/19 12:05, Jeff Layton wrote: > > > > On Fri, 2019-01-04 at 10:35 +0000, Ricardo Dias wrote: > > > > > On 03/01/19 19:05, Jeff Layton wrote: > > > > > > Ricardo, > > > > > > > > > > > > We chatted earlier about a new ceph mgr module that would spin up EXPORT > > > > > > blocks for ganesha and stuff them into a RADOS object. Here's basically > > > > > > what we're aiming for. I think it's pretty similar to what SuSE's > > > > > > solution is doing so I think it'd be good to collaborate here. > > > > > > > > > > Just to make things more clear, We (SUSE) didn't implement a specific > > > > > downstream implementation. The implementation we developed targets the > > > > > upstream ceph-dashboard code. > > > > > > > > > > The dashboard backend code to manage ganesha exports is almost done. We > > > > > still haven't opened a PR because we are still finishing the frontend > > > > > code, which might make the backend to change a bit. > > > > > > > > > > The current code is located here: > > > > > https://github.com/rjfd/ceph/tree/wip-dashboard-nfs > > > > > > > > > > > Probably I should write this up in a doc somewhere, but here's what I'd > > > > > > envision. First an overview: > > > > > > > > > > > > The Rook.io ceph+ganesha CRD basically spins up nfs-ganesha pods under > > > > > > k8s that don't export anything by default and have a fairly stock > > > > > > config. Each ganesha daemon that is started has a boilerplate config > > > > > > file that ends with a %url include like this: > > > > > > > > > > > > %url rados://<pool>/<namespace>/conf-<nodeid> > > > > > > > > > > > > The nodeid in this case is the unique nodeid within a cluster of ganesha > > > > > > servers using the rados_cluster recovery backend in ganesha. Rook > > > > > > enumerates these starting with 'a' and going through 'z' (and then 'aa', > > > > > > 'ab', etc.). So node 'a' would have a config object called "conf-a". > > > > > > > > > > > > > > > > This was the same assumption we made, and the current implementation > > > > > code can manage the exports of different servers (configuration objects). > > > > > > > > > > > What we currently lack is the code to set up those conf-<nodeid> > > > > > > objects. I know you have some code to do this sort of configuration via > > > > > > the dashboard and a REST API. Would it make more sense to split this bit > > > > > > out into a separate module, which would also allow it to be usable from > > > > > > the command line? > > > > > > > > > > Yes, and no :) I think the benefit of splitting the code into a separate > > > > > module is on the possibility of other mgr modules to manage ganesha > > > > > exports using the mgr "_remote" call infrastructure, or if someone wants > > > > > to manage ganesha exports without enabling the dashboard module. > > > > > > > > > > Regarding CLI commands, since the dashboard code exposes the export > > > > > management through a REST API, we can always use curl to call it > > > > > (although it will be a more verbose command). > > > > > > > > > > In the dashboard source directory we have a small bash script to help > > > > > calling the REST API from the CLI. Here's an example of an export > > > > > creation using the current implementation: > > > > > > > > > > $ ./run-backend-api-request.sh POST /api/nfs-ganesha/export \ > > > > > '{ > > > > > "hostname": "node1.domain", \ > > > > > "path": "/foo", \ > > > > > "fsal": {"name": "CEPH", "user_id":"admin", "fs_name": "myfs"}, \ > > > > > > > > > > "pseudo": "/foo", \ > > > > > "tag": null, \ > > > > > "access_type": "RW", \ > > > > > "squash": "no_root_squash", \ > > > > > "protocols":[4], \ > > > > > "transports": ["TCP"], \ > > > > > "clients": [{ \ > > > > > "addresses":["10.0.0.0/8"], \ > > > > > "access_type": "RO", \ > > > > > "squash": "root" \ > > > > > }]}' > > > > > > > > > > The json fields and structure is similar to the ganesha export > > > > > configuration structure. > > > > > > > > > > We also have other commands: > > > > > > > > > > # list all exports > > > > > $ ./run-backend-api-request.sh GET /api/nfs-ganesha/export > > > > > > > > > > # get an export > > > > > $ ./run-backend-api-request.sh GET \ > > > > > /api/nfs-ganesha/export/<hostname>/<id> > > > > > > > > > > # update an export > > > > > $ ./run-backend-api-request.sh PUT \ > > > > > /api/nfs-ganesha/export/<hostname>/<id> <json string> > > > > > > > > > > # delete an export > > > > > $ ./run-backend-api-request.sh DELETE \ > > > > > /api/nfs-ganesha/export/<hostname>/<id> > > > > > > > > > > > > > > > In the dashboard implementation, the server configuration is identified > > > > > by the <hostname> field, which does not need to be a real hostname. > > > > > The dashboard keeps a map between the hostname and the rados object URL > > > > > that stores the configuration of the server. > > > > > > > > > > > > > Ok, that all sounds fine, actually. I think we can probably live with > > > > the REST API for this. > > > > > > > > It might be good to rename the "hostname" field to something more > > > > generic (maybe nodeid). The rados_cluster recovery backend for ganesha > > > > requires a unique nodeid for each node. If it's not specified then it > > > > will use the hostname. > > > > > > Sounds good to me. > > > > > > > > The bootstrap of this host/rados_url map can be done in two ways: > > > > > a) automatically: when an orchestrator backend is avaliable, the > > > > > dashboard asks the orchestrator for this information. > > > > > b) manually: the dashboard provides some CLI commands to add this > > > > > information. Example: > > > > > $ ceph dashboard ganesha-host-add <hostname> <rados_url> > > > > > > > > > > > > > I'll have to think about this bit. > > > > > > > > The main use case we're interested in currently is Openstack Manila: > > > > > > > > https://wiki.openstack.org/wiki/Manila > > > > > > > > It has its own REST API, and admins can request new servers to be > > > > started or volumes to be created and exported. > > > > > > > > What I had envisioned was that requests to manila would get translated > > > > into requests to do things like: > > > > > > > > create volumes and subvolumes > > > > ask the orchestrator to spin up a new daemon > > > > modify the conf-* objects and ask the orchestrator to send daemons a > > > > SIGHUP > > > > > > > > I think what you're proposing should be fine there. Probably I just need > > > > to pull down your wip branch and play with it to better understand. > > > > > > I think all should work in the above use case if the dashboard is using > > > the orchestrator. After the orchestrator spins the new daemon, the > > > dashboard will have access to the new daemon configuration without > > > manual intervention. > > > > > > > > > My thinking was that we'd probably want to create a new mgr module for > > > > > > that, and could wire it up to the command line with something like: > > > > > > > > > > > > $ ceph nfsexport create --id=100 \ > > > > > > --pool=mypool \ > > > > > > --namespace=mynamespace \ > > > > > > --type=cephfs \ > > > > > > --volume=myfs \ > > > > > > --subvolume=/foo \ > > > > > > --pseudo=/foo \ > > > > > > --cephx_userid=admin \ > > > > > > --cephx_key=<base64 key> \ > > > > > > --client=10.0.0.0/8,ro,root \ > > > > > > --client=admhost,rw,none > > > > > > > > > > > > ...the "client" is a string that would be a tuple of client access > > > > > > string, r/o or r/w, and the userid squashing mode, and could be > > > > > > specified multiple times. > > > > > > > > > > The above command is similar to what we provide in the REST API with the > > > > > difference that the dashboard generates the export ID. > > > > > > > > > > Do you think it is important for the user to explicitly specify the > > > > > export ID? > > > > > > > > > > > > > No, it'd be fine to autogenerate those in some fashion. > > > > > > > > > > We'd also want to add a way to remove and enumerate exports. Maybe: > > > > > > > > > > > > $ ceph nfsexport ls > > > > > > $ ceph nfsexport rm --id=100 > > > > > > > > > > > > So the create command above would create an object called "export-100" > > > > > > in the given rados_pool/rados_namespace. > > > > > > > > > > > > From there, we'd need to also be able to "link" and "unlink" these > > > > > > export objects into the config files for each daemon. So if I have a > > > > > > cluster of 2 servers with nodeids "a" and "b": > > > > > > > > > > > > $ ceph nfsexport link --pool=mypool \ > > > > > > --namespace=mynamespace \ > > > > > > --id=100 \ > > > > > > --node=a \ > > > > > > --node=b > > > > > > > > > > > > ...with a corresponding "unlink" command. That would append objects > > > > > > called "conf-a" and "conf-b" with this line: > > > > > > > > > > > > %url rados://mypool/mynamespace/export-100 > > > > > > > > > > > > ...and then call into the orchestrator to send a SIGHUP to the daemons > > > > > > to make them pick up the new configs. We might also want to sanity check > > > > > > whether any conf-* files are still linked to the export-* files before > > > > > > removing those objects. > > > > > > > > > > > > Thoughts? > > > > > > > > > > I got a bit lost with this link/unlink part. In the current dashboard > > > > > implementation, when we create an export the implementation will add the > > > > > export configuration into the rados://<pool>/<namespace>/conf-<nodeid> > > > > > object and call the orchestrator to update/restart the service. > > > > > > > > > > It looks to me that you are separating the export creation from the > > > > > export deployment. First you create the export, and then you add it to > > > > > the service configuration. > > > > > > > > > > We can also implement this two-step behavior in the dashboard > > > > > implementation and in the dashboard Web UI we can have a checkbox where > > > > > the user can specify if it wants to apply the new export right away or not. > > > > > > > > > > In the dashboard, we will also implement a "copy" command to copy an > > > > > export configuration to another ganesha server. That will help with > > > > > creating similar exports in different servers. > > > > > Another option would be to instead of having a single "<hostname>" field > > > > > in the create export function, to have a list of <hostname>s. > > > > > > > > > > > > > The two-step process was not so much for immediacy as to eliminate the > > > > need to replicate all of the EXPORT blocks across a potentially large > > > > series of objects. If we (e.g.) needed to modify a CLIENT block to allow > > > > a new subnet to have access, we'd only need to change the one object and > > > > then SIGHUP all of the daemons. > > > > > > > > That said, if replicating those blocks across multiple objects is > > > > simpler then we'll adapt. > > > > > > After thinking more about this, I think the approach you suggest about > > > using an object for each export and then link it to the servers > > > configuration makes more sense, and avoids the need of a "Copy Export" > > > operation. > > > > > > Since you will also consume the dashboard REST API besides the dashboard > > > frontend, I'll open a PR with just the backend implementation, so that > > > it can be merged quickly without waiting for the frontend to be ready. > > > > > > > > > > Thanks! I spent some time playing around with your branch and I think it > > looks like it'll work just fine for us. > > > > Just a few notes for anyone else that wants to do this. As Ricardo > > mentioned separately, RADOS namespace support is not yet plumbed in, so > > I'm using a separate "nfs-ganesha" pool here to house the RADOS objects > > needed for ganesha's configs and recovery backend: > > > > $ MON=3 OSD=1 MGR=1 MDS=1 ../src/vstart.sh -n > > > > $ ganesha-rados-grace -p nfs-ganesha `hostname` > > > > $ rados create -p nfs-ganesha conf-`hostname` > > > > $ ceph dashboard ganesha-host-add `hostname` rados://nfs-ganesha/conf-`hostname` > > > > $ ./run-backend-api-request.sh POST /api/nfs-ganesha/export "`cat ~/export.json`" > > > > ...where export.json is something like: > > > > -----------------[snip]------------------ > > { > > "hostname": "server_hostname", > > "path": "/foo", > > "fsal": {"name": "CEPH", "user_id":"admin", "fs_name": "myfs"}, > > "pseudo": "/foo", > > "tag": null, > > "access_type": "RW", > > "squash": "no_root_squash", > > "protocols": [4], > > "transports": ["TCP"], > > "clients": [{ > > "addresses":["10.0.0.0/8"], > > "access_type": "RO", > > "squash": "root" > > }] > > } > > -----------------[snip]------------------- > > > > This creates an object with a ganesha config EXPORT block which looks > > valid. We may need to tweak it a bit, but I think this should work just > > fine. > > Thanks for posting the above steps! > > > I know the web UI is still pretty raw, but here are some comments > > anyway: > > > > For safety reasons, the default Access Type should probably be "RO", and > > the default Squash mode should be "Root" or maybe even "All". You may > > also want to somehow ensure that the admin consciously decides to export > > to the world instead of making that the default when no client is > > specified. > > This is very valuable information. I never administrated an NFS ganesha > server and therefore don't have experience on what should be the > defaults. Thanks for the suggestions. > No problem. We definitely want this to be a "safe by default" design, as much as possible. Getting exports wrong is a great way to compromise security in some environments. > > It'd be nice to be able to granularly select the NFSv4 minorversions. If > > you exclusively have NFSv4.1+ clients, then the grace period can be > > lifted early after a restart. That's a big deal for continuous > > operation. In our clustered configurations, we plan to not support > > anything before v4.1 by default. > > I didn't know about the existence of minorversions. Where can get the > list of all possible values for the protocol version? > Those are all governed by the IETF RFCs. Basically we have v4.0, v4.1 and v4.2 so far, and I wouldn't worry about anything beyond that at this point. We may eventually end up with a v4.3, but we're sort of moving to a model that is based on feature flags so that may not ever materialize. > > We'll probably need some way to specify the fsal.user_id field in the UI > > too. Maybe a dropdown box that enumerates the available principals? > > Yes, and that's already been done by Tiago Melo (tmelo on IRC) in its > development branch. I believe he has added a dropdown with the list of > cephx users. > Nice. > > That's all for now. I think what I'll probably do is close out my PR to > > add NFS support to the orchestrator and concentrate on wiring the rook > > orchestrator into what you have, since it's more complete. > > FWIW...After I took a closer look, I think the PR I had to add NFS support to the orchestrator is most orthogonal to your changes, so I think we'll probably want to merge the latest version of it after all. -- Jeff Layton <jlayton@xxxxxxxxxx>