Re: Teuthology & Rook (& DeepSea, ceph-ansible, ...)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 24, 2019 at 11:11 AM Travis Nielsen <tnielsen@xxxxxxxxxx> wrote:
>
> Great questions. At a high level to run tests against rook, I would
> expect the process to be:
> - Install Kubernetes
> - Install Rook/Ceph
> - Start pods where the ceph clients can consume the RBD/CephFS mounts
> or use an S3 endpoint.
>
> More specific comments inline...
>
> Travis
>
> On Wed, Apr 24, 2019 at 8:49 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> >
> > Hello Travis, all,
> > I’ve been looking at the interfaces our ceph-qa-suite tasks expect
> > from the underlying teuthology and Ceph deployment tasks to try and
> > 1) narrow them down into something we can implement against other
> > backends (ceph-ansible, Rook, DeepSea, etc)
> > 2) see how those interfaces need to be adapted to suit the differences
> > between physical hosts and kubernetes pods.
> >
> > Some very brief background about teuthology: it expects you to select
> > a group of hosts (eg smithi001, smithi002, to map those hosts to
> > specific roles (eg a host with osd.1, mon.a, client.0 and another with
> > osd.2, mon.b, client.1, client.2), and to then run specific tasks
> > against those configurations (eg install, ceph, kclient, fio).  (Those
> > following along at home who want more details may wish to view one of
> > the talks I’ve given on teuthology, eg
> > https://www.youtube.com/watch?v=gj1OXrKdSrs .)
> >
>
> After you select the hosts, will this involve setting up kubernetes
> before deploying Rook? Or will the hosts be from a pool that already
> have Kubernetes running?

I think for ease of integration into the existing system it will
probably start by setting up Kubernetes ourselves, as I allude to with
Vasu's PR. But Sage in the past has suggested pretty hard we might
want to use Kubernetes to run our infrastructure, and part of the
point of going through this exercise is that when we do change it we
know exactly what we need to implement (and no further changes are
needed outside of that setup).

>
> It is possible to tell rook which nodes to use for which daemons if
> desired. See the "placement" configuration settings in the
> cluster.yaml. Basically it requires setting a Kubernetes label on the
> node, then specifying the label(s) in the cluster.yaml.
> https://rook.io/docs/rook/master/ceph-cluster-crd.html#placement-configuration-settings

Cool!

>
> > The touch points between a ceph-qa-suite task and the remote hardware
> > are actually not a very large interface in direct function terms, but
> > some of the functions are very large themselves so we’ll need to
> > rework them a bit. I’ve taken pretty extensive notes at
> > https://pad.ceph.com/p/teuthology-rook, but I’ll summarize here.
> >
> > The important touch points are 1) the “install” task, 2) the “ceph”
> > task, and 3) the “RemoteProcess” abstraction.
> >
> > The install task
> > (https://github.com/ceph/teuthology/blob/master/teuthology/task/install/__init__.py)
> > is actually not too hard in terms of follow-on tasks. Its job is
> > simply to get the system ready for any following tasks. In raw
> > teuthology/ceph-qa-suite this includes installing the Ceph packages
> > from shaman, plus any other special pieces we need from our own builds
> > or the default distribution (Samba, python3, etc). Presumably for Rook
> > this would mean setting up Kubernetes (Vasu has a PR enabling that in
> > teuthology at https://github.com/ceph/teuthology/pull/1262) — or
> > perhaps pointing at an existing cluster — and setting configurations
> > so that Rook would install container images reflecting the Ceph build
> > we want to test instead of its defaults. (I’m sure these are all very
> > big tasks that I’m skipping over, but I want to focus on the
> > teuthology/qa-suite interfaces for now.)
> >
>
> Once Kubernetes is started, starting Rook should be as "simple" as
> creating the needed manifests such as common.yaml, operator.yaml, and
> cluster.yaml. The cluster.yaml is where you will need to modify the
> settings for how to launch the Ceph daemons. For example, you can set
> which ceph image to launch. The default examples only show released
> images such as ceph/ceph:v14, but you can insert any container build
> of ceph here. Also you can specify how many mons, on which nodes you
> want the daemons placed, what devices to use for OSDs (or all
> devices), etc.
> https://github.com/rook/rook/tree/master/cluster/examples/kubernetes/ceph
>
> > The ceph task itself
> > (https://github.com/ceph/ceph/blob/master/qa/tasks/ceph.py) is pretty
> > large and supports a big set of functionality. It’s responsible for
> > actually turning on the Ceph cluster, cleaning up when the test is
> > over, and providing some validation. This includes stuff like running
> > with valgrind, options to make sure the cluster goes healthy or scrubs
> > at the end of a test, checking for issues in the logs, etc. However,
> > most of that stuff can be common code once we have the right
> > interfaces. The parts that get shared out to other tasks are 1)
> > functions to stop and restart specific daemons, 2) functions to check
> > if a cluster is healthy and to wait for failures, 3) the “task”
> > function that serves to actually start up the Ceph cluster, and most
> > importantly 4) exposing a “DaemonGroup” that links to the
> > “RemoteProcess” representing each Ceph daemon in the system. I presume
> > 1-3 are again not too complicated to map onto Rook commands we can get
> > at programmatically.
> >
>
> Depending on the action, there are a few ways to approach running the commands:
> - Restarting daemons: 'kubectl delete <pod>`
> - To check for rook operator health status, you can query the "status"
> section of the CephCluster CR: `kubectl get cephcluster rook-ceph -o
> yaml`.
> - To run any ceph command, it's recommended to do this from the "rook
> toolbox" pod. Then you can connect to the pod such as: 'kubectl exec
> -it <pod> -- ceph status`. See
> https://rook.io/docs/rook/master/ceph-toolbox.html
> - You could also connect directly to any of the daemon pods with the
> `kubectl exec`, although the ceph.conf isn't always setup in the
> default location in those pods. And note that if you change any config
> inside the pod it will be lost when you restart the daemon.

Hmm, do these Kubernetes commands let us specify how nice to be to the
daemon dying? Sometimes we just want it down nowish, sometimes we want
to force a hard kill (SIGABRT), etc.



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux