Re: Teuthology & Rook (& DeepSea, ceph-ansible, ...)

Sebastian Wagner <sebastian.wagner@xxxxxxxx> · Mon, 29 Apr 2019 12:36:43 +0200

Hi Greg,

let me share my experience in automatically testing the rook orchestrator.

Am 24.04.19 um 16:49 schrieb Gregory Farnum:
> The ceph task itself
> (https://github.com/ceph/ceph/blob/master/qa/tasks/ceph.py) is pretty
> large and supports a big set of functionality. It’s responsible for
> actually turning on the Ceph cluster, cleaning up when the test is
> over, and providing some validation. This includes stuff like running
> with valgrind, options to make sure the cluster goes healthy or scrubs
> at the end of a test, checking for issues in the logs, etc. However,
> most of that stuff can be common code once we have the right
> interfaces. The parts that get shared out to other tasks are 1)
> functions to stop and restart specific daemons, 2) functions to check
> if a cluster is healthy and to wait for failures, 3) the “task”
> function that serves to actually start up the Ceph cluster, and most
> importantly 4) exposing a “DaemonGroup” that links to the
> “RemoteProcess” representing each Ceph daemon in the system. I presume
> 1-3 are again not too complicated to map onto Rook commands we can get
> at programmatically.

This sounds very much incompatible to how a Rook cluster is deployed in
my scenario:

> https://github.com/sebastian-philipp/test-rook-orchestrator/blob/f0fbaaaa63cfc5ee6a2ebafb44a2af292b706138/fixtures.py#L42-L64

Like for example:

* There is no control about which physical processes are running
* There is no access to physical log files
* You're not supposed to start or stop individual daemons
* Pods are automatically restarted, thus waiting for failed daemons will
  be hard and only possible via looking for CrashLoopBackOff
* No remote processes (except for calling ceph commands in a separated
  and isolated pod)

Adding support for anything in this list, like executing remote
processes inside running pods, would introduce a very tight coupling
between Rook and Teuthology, as the processes and daemons started by
Rook are an implementation detail of Rook itself.

Things that are easy:

* Calling the `ceph` command
* Running kubectl
* Querying Pods

> 
> The most interesting part of this interface, and of the teuthology
> model more generally, is the RemoteProcess. Teuthology was created to
> interface with machines via a module called “orchestra”
> (https://github.com/ceph/teuthology/tree/master/teuthology/orchestra)
> that wraps SSH connections to remote nodes. That means you can invoke
> “remote.run” on host objects that passes a literal shell command and
> get back a RemoteProcess object
> (https://github.com/ceph/teuthology/blob/master/teuthology/orchestra/run.py#L21)
> representing it. On that RemoteProcess you can wait() until it’s done
> and/or look at the exitstatus(), you can query if it’s finished()
> running. And you can access the stdin, stdout, and stderr channels!
> Most of this usage tends to fall into a few patterns: stdout is used
> to get output, stderr is mostly used for prettier error output in the
> logs, and stdin is used in a few places for input but is mostly used
> as a signal to tasks to shut down when the channel closes.
> 
> So I’d like to know how this all sounds.

Agreeing with Sage here.

> In particular, how
> implausible is it that we can ssh into Ceph containers and execute
> arbitrary shell commands?

I'd recommend against doing this.

> Is there a good replacement interface for
> most of what I’ve described above? While a lot of the role-to-host
> mapping doesn’t matter, in a few test cases it is critical — is there
> a good way to deal with that (are tags flexible enough for us to force
> this model through)?

There is support for executing Pods on specific hosts in Rook, but
that's not yet supported in the mgr/rook orchestrator.

> Anybody have any other thoughts I’ve missed out on?

As long as the tests are only using a very limited subset of Teuthlogy,
it might be possible to let them run in a Rook environment. Maybe some
of the Dashboard tests?

> -Greg
> 

-- 
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah, HRB 21284 (AG Nürnberg)