Re: Fwd: Running a single teuthology job locally using containers

Loic Dachary <loic@xxxxxxxxxxx> · Fri, 11 Sep 2015 21:42:20 +0200

Hi Ivo,

On 11/09/2015 21:07, Ivo Jimenez wrote:
> Hi Loic,
> 
> Based on your feedback, a few action-items emerged for improving this
> containerized approach to running teuthology jobs:
> 
>  1. use install-deps.sh for installing dependencies
>  2. modify the sshd configuration so that the ssh port is specified at
>     runtime via an environment variable. This has the consequence of
>     being able to use --net=host and thus more than one remote can run
>     locally (for jobs with multiple remotes).
>  3. add an option to provide a sha1 so that the code gets checked out
>     as part of the entrypoint of the container and gets built.

Would it be something like http://tracker.ceph.com/issues/13031 ? 

>  4. write a 'dockerize-config' script for taking a failed job's YAML
>     file and modify it so that it can run with containers.

+1 :-)

>  5. write a 'failed-devenv' script that given a url to a failed job
>     (a) fetches the YAML file (b) runs the dockerize-config script (c)
>     checks out the corresponding sha1 version (d) compiles the code

In this case it might be easier to have the original teuthology-suite command available as part of the job config (original-cli-call: teuthology-suite --ceph foo --....) and just re-run that command with --filter="$description" where $description is the description of the job. It's a lot more stable / reliable than reworking the config.yaml file. Doing that kind of dockerization of the config.yaml is very useful for any existing job archive for which you don't have enough information to reschedule the job with --filter. But if you have the original command, then it's way easier to just re-issue a teuthology-suite with a change in argumente to use --filter.

>  6. write a 'run-failed-job' that (a) re-builds the code (b)
>     instantiates one container for each specified remote and (c)
>     executes the job.

+2000 

Here is how we do it for backports : http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_run_integration_and_upgrade_tests#Re-scheduling-failed-or-dead-jobs-from-an-existing-suite

It's simple and tedious. It would be awesome to have something easier and shorter :-)

> 
> I've implemented 1-3 and am working on 4-6. In short, the goal of all
> the above is to capture the dev/build/test loop and make it easier to
> go from 'failed job' to 'working on a fix'. The high-level sequence is
> (1) run 'make-failed-devenv' so you get the dev environment for the
> failed job (2) work on a fix and (3) invoke 'run-failed-job' and
> inspect results (possibly going back to 2 if need it).
> 
> Thoughts on 4-6?
> 
> cheers,
> ivo
> 

Cheers

> 
> On Thu, Sep 3, 2015 at 3:23 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote:
>>
>>
>> On 03/09/2015 23:45, Ivo Jimenez wrote:> On Thu, Sep 3, 2015 at 3:09 AM Loic Dachary <loic@xxxxxxxxxxx> wrote:
>>>>
>>>>>  2. Initialize a `cephdev` container (the following assumes `$PWD` is
>>>>>     the folder containing the ceph code in your machine):
>>>>>
>>>>>     ```bash
>>>>>     docker run \
>>>>>       --name remote0
>>>>>       -p 2222:22
>>>>>       -d -e AUTHORIZED_KEYS="`cat ~/.ssh/id_rsa.pub`" \
>>>>>       -v `pwd`:/ceph \
>>>>>       -v /dev:/dev \
>>>>>       -v /tmp/ceph_data/$RANDOM:/var/lib/ceph \
>>>>>       --cap-add=SYS_ADMIN --privileged \
>>>>>       --device /dev/fuse
>>>>>       ivotron/cephdev
>>>>>     ```
>>>>
>>>> $PWD is ceph built from sources ? Could you share the dockerfile you used to create ivotron/cephdev ?
>>>
>>>
>>> Yes, the idea is to wrap your ceph folder in a container so that it
>>> becomes a target for teuthology. The link to the dockerfile:
>>>
>>> https://github.com/ivotron/docker-cephdev
>>
>> You may want to use install-deps.sh instead of apt-get build-dep to get the packages from sources instead of a presumably older from the source repositories.
>>>
>>>>
>>>>>
>>>>> Caveats:
>>>>>
>>>>>   * only a single job can be executed and has to be manually
>>>>>     assembled. I plan to work on supporting suites, which, in short,
>>>>>     implies stripping out the `install` task from existing suites and
>>>>>     leaving only the `install.ship_utilities` subtask instead (the
>>>>>     container image has all the dependencies in it already).
>>>>
>>>> Maybe there could be a script to transform config files such as http://qa-proxy.ceph.com/teuthology/loic-2015-09-02_15:41:18-rbd-master---basic-multi/1042448/config.yaml into a config file suitable for this use case ?
>>>
>>>
>>> that's what I have in mind but haven't looked into it yet. I was
>>> thinking about extending teuthology-suite so that you pass a
>>> --filter-tasks flag so that we can remove the unwanted tasks, in the
>>> similar way that --filter leaves some suites out.
>>>
>>>>
>>>> Together with git clone -b $sha1 + make in the container, it would be a nice way to replay / debug a failed job using a single vm and without going through packages.
>>>
>>>
>>> that'd be relatively straight-forward to accomplish, at least the
>>> docker-side of things (a dockerfile that is given the $SHA1). Prior to
>>> that, we'd need to have a script that extracts the failed job from
>>> paddles (does this exist already?), creates a new sha1-predicated
>>
>> What do you mean by "extract the failed job" ? Do you expect paddles to have more information than the config.yaml file ( loic-2015-09-02_15:41:18-rbd-master---basic-multi/1042448/config.yaml for instance) ?
>>
>>> container and passes the yaml file of the failed job to teuthology
>>> (which would be invoked with the hypothetical --filter-tasks flag
>>> mentioned above).
>>
>> It's probably more than just filtering out tasks. What about a script that would
>>
>>    dockerize-config < config.yaml > docker-config.yaml
>>
>> and be smart enough to do whatever is necessary to transform an existing config.yaml so that it is suitable to run on docker targets. And fail loudly if it can't ;-)
>>
>>>
>>>>
>>>>>   * I have only tried the above with the `radosbench` and `ceph-fuse`
>>>>>     tasks. Using `--cap-add=ALL` and `-v /lib/modules:/lib/modules`
>>>>>     flags allows a container to load kernel modules so, in principle,
>>>>>     it should work for `rbd` and  `kclient` tasks but I haven't tried
>>>>>     it yet.
>>>>>   * For jobs specifying multiple remotes, multiple containers can be
>>>>>     launched (one per remote). While it is possible to run these
>>>>>     on the same docker host, the way ceph daemons dynamically
>>>>>     bind to ports in the 6800-7300 range makes it difficult to
>>>>>     determine which ports to expose from each container (exposing the
>>>>>     same port from multiple containers in the same host is not
>>>>>     allowed, for obvious reasons). So either each remote runs on a
>>>>>     distinct docker host machine, or a deterministic port assignment
>>>>>     is implemented such that, for example, 6800 is always assigned to
>>>>>     osd.0, regardless of where it runs.
>>>>
>>>> Would docker run --publish-all=true help ?
>>>
>>>
>>> That option doesn't work with --net=container, which is what we are
>>> using in this case since we remap sshd's 22 port of the container. In
>>> other words, for --publish-all to work we need to use --net=host but
>>> that disables the virtual network that docker provides. An alternative
>>> would be to configure the base image we're using
>>> (https://github.com/tutumcloud/tutum-ubuntu/) so that the port that
>>> sshd uses is passed in an env var.
>>
>> Why not use --net=host then ?
>>
>>>
>>>>
>>>>
>>>> Clever hack, congrats :-)
>>>
>>>
>>> thanks!
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment:
signature.asc

Description: OpenPGP digital signature