Cephalocon QA: Missing test coverage

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 4 Apr 2018 00:54:43 -0700

We identified several under-tested components in the Ceph project.
Several of these consisted of tests that simply weren’t written:
NFS-Ganesha has light testing in RGW, but none with CephFS; Samba’s
testing is very light.

Significantly more interesting is that none of the
installers/orchestrators/normal process management (Ansible or DeepSea
with systemd; containers under Kubernetes) are currently tested in
teuthology. Changing that is a big desire for most of the integrators,
but is a large project covering both the internal implementation and
testing tasks. Right now, teuthology directly invokes Ceph processes
via ssh and relies on that for control, for checking state (ie, the
process is still running), and for easy logging of issues, and that
has spilled over into important “task" modules such as the thrasher
and cluster managers. There were rumors of individual efforts that
might have been started to enable testing of a normal deployment, but
nobody in the room knew for sure.
PROBLEM TOPIC: support testing orchestration frameworks and the normal
init system in teuthology

Orit also discussed RGW in this context. She noted that RGW has good
coverage of the basic S3 functionality but that more advanced features
tend to miss some tests because they aren’t a good fit for the way we
currently use teuthology. Her specific example was bucket sharding:
trivial tests exist to make sure the commands operate and don’t
immediately break, but actually stressing the sharding code requires
millions of entries with ongoing IO, and dumping that much data into a
cluster simply takes too long to reasonably be part of every suite run
right now. So most testing is infrequent, manual, and ad-hoc.
After discussion we suggested developers should build those tests even
if they can’t be run regularly right now, because they can at least be
run by teams prior to releases and it’s still cheaper and more
reliable to find machine time than make a person do them all. I
committed to discussing with the Ceph Leadership team whether it would
be appropriate to start setting aside a small portion of time in the
sepia lab to regularly do larger-scale tests like this, once they
exist. (We suggested one or two days a month.)
PROBLEM TOPIC: build scale tests in separate suites and reserve lab
time to run them.

The topic of distribution testing came up briefly. In modern history
the lab has run Ubuntu (one or two LTSes) and CentOS (the latest
release), with a mostly random mix unless your job demanded a specific
OS. I believe Flipkart mentioned Debian as a possible target; we
certainly build packages for Debian and a few other distros that
aren’t tested in the lab at all. But the main issue with adding
distros is that in addition to needing to keep the images up-to-date,
they require (minor) changes to teuthology that we can’t keep alive
without somebody committing to them. In cases where that happens,
we’re happy to bring in new systems: both RHEL and Suse have been
added to the sepia lab and teuthology in the last few months.
A general note: we recently changed to doing a full OS provision on
every test (via FOG), so if you want a random mix of OSes you now need
to specify that. There’s a new teuthology “+” file operator for saying
“select any one of these yaml frags for each test”.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html