Re: Cephalocon QA: Teuthology infrastructure/new labs

Vasu Kulkarni <vakulkar@xxxxxxxxxx> · Wed, 4 Apr 2018 12:43:35 -0700

On Wed, Apr 4, 2018 at 12:55 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
> We then moved on to running teuthology in non-sepia labs. It turns out
> all four companies present have done this, with varying levels of
> trouble. The biggest issue at an institutional level is just that the
> teuthology code base still embeds a lot of assumptions about accessing
> the sepia lab — even after the work to support running in openstack
> clouds, it expects to be querying ceph.com subdomains for packages and
> a bunch of other things, and many of them are hard-coded into the
> source rather than being configurable. (Even leaving aside the
> teuthology core code, I know that I have certainly written tests that
> pull random tarballs from ceph.com/qa, and although that ought to
> function it means you need external network bandwidth sufficient to
> download them on every run!) So far every group has done the tedious
> patching themselves to change this. Hopefully some of them can share
> the patches they’ve made so we can get an idea of the problem size,
> and if nobody else volunteers sooner the next group who needs an
> install can make it configurable instead of simply swapping out URLs?
> :)
> PROBLEM TOPIC: replace hard-coded Sepia lab URLs with configurables

I think these are pretty easy to handle in general, many of them are
already in /etc/teuthology.yaml
and whatever is hidden in code can be moved to site specific configs,
PRs are welcome and should be
easy to merge them once shown in logs that its doing the right thing.
The visibility outside of how it
works in other labs is closed.

>
> One of the bigger-picture questions I had was whether we need to
> rethink the way we use or build teuthology, given the number of
> components it has these days (Shaman, Pulpito, beanstalkd for job
> scheduling, Jenkins for building and some other stuff, I think a
> “Chacra” but I don’t know what for, etc). It would be great if we
> could share resources with other open-source projects doing similar
> things. After discussion, it sounds like for institutions it is not a
> problem to install and keep those pieces running (at least, in
> comparison to the other challenges of actually developing and testing
> Ceph).
> I asked specifically about replacing beanstalkd, since I know that’s
> been a long-term todo item and when there is lab contention that is
> not a sufficient scheduler. The room seemed to think that might be
> nice but isn’t a big issue unless the lab doesn’t have enough capacity
> for the testing we need. Moreover, any replacement would have to be a
> full job scheduling system and that would require a lot more
> (teuthology-based) custom coding and (site-based) configuration.
>
> We would like to do more test framework sharing with other groups, but
> the gains for us are limited unless we manage to build out a serious
> community. The Sepia maintainers have explored sharing with the CentOS
> CI system in the past, and Joao brought up OpenQA, but all the systems
> we’ve explored so far don’t handle the build volume we need and mostly
> replicate smaller components of our overall system. (Both of those
> projects are focused on building existing packages, installing them on
> a single node, and making sure they can be turned on; they don’t
> handle cross-system orchestration, deep testing, or any kind of job
> scheduling beyond “there is a machine to build the package now”.) John
> suggested the best areas for collaboration are around building
> packages and displaying test results if we want to explore that, but
> in terms of distributed testing we would be donating far more time
> than we could plausibly get back.

I have similar thoughts as well, different lab system will have
different configurations and there
are quite a few assumptions in the code for number of devices per
node, dns hostnames etc thats
required for many tests to work properly. There will be always cases
where one has to configure it
to a specific lab(unless VM testing). The minimum requirements for lab
could be better documented.

Also, it is not easy to integrate with other CI system because many
assume single node tests or lack scale test
assumption, I think a more meaningful approach would be to have less
dependency on integrated components
(like shaman, pulpito etc and in some cases provide a switch to turn
on/off) and make it site configurable
and keep the teuthology core component clean and more shareable across labs.

> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html