Re: Teuthology & Rook (& DeepSea, ceph-ansible, ...)

Travis Nielsen <tnielsen@xxxxxxxxxx> · Wed, 24 Apr 2019 12:10:46 -0600

Great questions. At a high level to run tests against rook, I would
expect the process to be:
- Install Kubernetes
- Install Rook/Ceph
- Start pods where the ceph clients can consume the RBD/CephFS mounts
or use an S3 endpoint.

More specific comments inline...

Travis

On Wed, Apr 24, 2019 at 8:49 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
> Hello Travis, all,
> I’ve been looking at the interfaces our ceph-qa-suite tasks expect
> from the underlying teuthology and Ceph deployment tasks to try and
> 1) narrow them down into something we can implement against other
> backends (ceph-ansible, Rook, DeepSea, etc)
> 2) see how those interfaces need to be adapted to suit the differences
> between physical hosts and kubernetes pods.
>
> Some very brief background about teuthology: it expects you to select
> a group of hosts (eg smithi001, smithi002, to map those hosts to
> specific roles (eg a host with osd.1, mon.a, client.0 and another with
> osd.2, mon.b, client.1, client.2), and to then run specific tasks
> against those configurations (eg install, ceph, kclient, fio).  (Those
> following along at home who want more details may wish to view one of
> the talks I’ve given on teuthology, eg
> https://www.youtube.com/watch?v=gj1OXrKdSrs .)
>

After you select the hosts, will this involve setting up kubernetes
before deploying Rook? Or will the hosts be from a pool that already
have Kubernetes running?

It is possible to tell rook which nodes to use for which daemons if
desired. See the "placement" configuration settings in the
cluster.yaml. Basically it requires setting a Kubernetes label on the
node, then specifying the label(s) in the cluster.yaml.
https://rook.io/docs/rook/master/ceph-cluster-crd.html#placement-configuration-settings

> The touch points between a ceph-qa-suite task and the remote hardware
> are actually not a very large interface in direct function terms, but
> some of the functions are very large themselves so we’ll need to
> rework them a bit. I’ve taken pretty extensive notes at
> https://pad.ceph.com/p/teuthology-rook, but I’ll summarize here.
>
> The important touch points are 1) the “install” task, 2) the “ceph”
> task, and 3) the “RemoteProcess” abstraction.
>
> The install task
> (https://github.com/ceph/teuthology/blob/master/teuthology/task/install/__init__.py)
> is actually not too hard in terms of follow-on tasks. Its job is
> simply to get the system ready for any following tasks. In raw
> teuthology/ceph-qa-suite this includes installing the Ceph packages
> from shaman, plus any other special pieces we need from our own builds
> or the default distribution (Samba, python3, etc). Presumably for Rook
> this would mean setting up Kubernetes (Vasu has a PR enabling that in
> teuthology at https://github.com/ceph/teuthology/pull/1262) — or
> perhaps pointing at an existing cluster — and setting configurations
> so that Rook would install container images reflecting the Ceph build
> we want to test instead of its defaults. (I’m sure these are all very
> big tasks that I’m skipping over, but I want to focus on the
> teuthology/qa-suite interfaces for now.)
>

Once Kubernetes is started, starting Rook should be as "simple" as
creating the needed manifests such as common.yaml, operator.yaml, and
cluster.yaml. The cluster.yaml is where you will need to modify the
settings for how to launch the Ceph daemons. For example, you can set
which ceph image to launch. The default examples only show released
images such as ceph/ceph:v14, but you can insert any container build
of ceph here. Also you can specify how many mons, on which nodes you
want the daemons placed, what devices to use for OSDs (or all
devices), etc.
https://github.com/rook/rook/tree/master/cluster/examples/kubernetes/ceph

> The ceph task itself
> (https://github.com/ceph/ceph/blob/master/qa/tasks/ceph.py) is pretty
> large and supports a big set of functionality. It’s responsible for
> actually turning on the Ceph cluster, cleaning up when the test is
> over, and providing some validation. This includes stuff like running
> with valgrind, options to make sure the cluster goes healthy or scrubs
> at the end of a test, checking for issues in the logs, etc. However,
> most of that stuff can be common code once we have the right
> interfaces. The parts that get shared out to other tasks are 1)
> functions to stop and restart specific daemons, 2) functions to check
> if a cluster is healthy and to wait for failures, 3) the “task”
> function that serves to actually start up the Ceph cluster, and most
> importantly 4) exposing a “DaemonGroup” that links to the
> “RemoteProcess” representing each Ceph daemon in the system. I presume
> 1-3 are again not too complicated to map onto Rook commands we can get
> at programmatically.
>

Depending on the action, there are a few ways to approach running the commands:
- Restarting daemons: 'kubectl delete <pod>`
- To check for rook operator health status, you can query the "status"
section of the CephCluster CR: `kubectl get cephcluster rook-ceph -o
yaml`.
- To run any ceph command, it's recommended to do this from the "rook
toolbox" pod. Then you can connect to the pod such as: 'kubectl exec
-it <pod> -- ceph status`. See
https://rook.io/docs/rook/master/ceph-toolbox.html
- You could also connect directly to any of the daemon pods with the
`kubectl exec`, although the ceph.conf isn't always setup in the
default location in those pods. And note that if you change any config
inside the pod it will be lost when you restart the daemon.

> The most interesting part of this interface, and of the teuthology
> model more generally, is the RemoteProcess. Teuthology was created to
> interface with machines via a module called “orchestra”
> (https://github.com/ceph/teuthology/tree/master/teuthology/orchestra)
> that wraps SSH connections to remote nodes. That means you can invoke
> “remote.run” on host objects that passes a literal shell command and
> get back a RemoteProcess object
> (https://github.com/ceph/teuthology/blob/master/teuthology/orchestra/run.py#L21)
> representing it. On that RemoteProcess you can wait() until it’s done
> and/or look at the exitstatus(), you can query if it’s finished()
> running. And you can access the stdin, stdout, and stderr channels!
> Most of this usage tends to fall into a few patterns: stdout is used
> to get output, stderr is mostly used for prettier error output in the
> logs, and stdin is used in a few places for input but is mostly used
> as a signal to tasks to shut down when the channel closes.
>

Hopefully you could get a lot of this functionality of remote
execution with `kubectl exec`, although if you're waiting for the
daemon to exit or something else that causes the pod to die, you would
need to instead query the k8s api for pod running status and then
capture the logs with `kubectl logs` (all the stderr and stdout is
captured here).

> It’s definitely possible to define all those options as higher-level
> interfaces and that’s probably the eventual end goal, but it’ll be a
> hassle to convert all the existing tests up front.
>
> So I’d like to know how this all sounds. In particular, how
> implausible is it that we can ssh into Ceph containers and execute
> arbitrary shell commands? Is there a good replacement interface for
> most of what I’ve described above? While a lot of the role-to-host
> mapping doesn’t matter, in a few test cases it is critical — is there
> a good way to deal with that (are tags flexible enough for us to force
> this model through)?
>

Hopefully the info above helps answer these questions.

> Anybody have any other thoughts I’ve missed out on?
> -Greg