Re: Cephalocon QA: Missing test coverage

Vasu Kulkarni <vakulkar@xxxxxxxxxx> · Wed, 4 Apr 2018 13:16:45 -0700

On Wed, Apr 4, 2018 at 12:54 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> We identified several under-tested components in the Ceph project.
> Several of these consisted of tests that simply weren’t written:
> NFS-Ganesha has light testing in RGW, but none with CephFS; Samba’s
> testing is very light.
>
> Significantly more interesting is that none of the
> installers/orchestrators/normal process management (Ansible or DeepSea
> with systemd; containers under Kubernetes) are currently tested in
> teuthology. Changing that is a big desire for most of the integrators,
> but is a large project covering both the internal implementation and
> testing tasks. Right now, teuthology directly invokes Ceph processes
> via ssh and relies on that for control, for checking state (ie, the
> process is still running), and for easy logging of issues, and that
> has spilled over into important “task" modules such as the thrasher
> and cluster managers. There were rumors of individual efforts that
> might have been started to enable testing of a normal deployment, but
> nobody in the room knew for sure.
> PROBLEM TOPIC: support testing orchestration frameworks and the normal
> init system in teuthology

Correcting the ceph-ansible testing part:

We are running ceph-ansible/ceph-deploy testing for quite some time that does
systemd testing internally. There is also a systemd task in smoke that
tests process explicitly for correctness.

    a) http://pulpito.ceph.com/?suite=ceph-ansible
    b) In smoke: https://github.com/ceph/ceph/blob/master/qa/tasks/systemd.py
          http://pulpito.ceph.com/teuthology-2018-04-04_07:02:02-smoke-master-testing-basic-ovh/2352423
          http://pulpito.ceph.com/teuthology-2018-04-04_07:02:02-smoke-master-testing-basic-ovh/2352436

But definitely more work needs to be done to integrate better with
thrashers and I am hopeful we will fix
this issue soon atleast for some suites: http://tracker.ceph.com/issues/23488

On the container side we dont have any tests and I believe we should
start this by fixing the install guide and recommendations
so that we can fix in suites.

>
> Orit also discussed RGW in this context. She noted that RGW has good
> coverage of the basic S3 functionality but that more advanced features
> tend to miss some tests because they aren’t a good fit for the way we
> currently use teuthology. Her specific example was bucket sharding:
> trivial tests exist to make sure the commands operate and don’t
> immediately break, but actually stressing the sharding code requires
> millions of entries with ongoing IO, and dumping that much data into a
> cluster simply takes too long to reasonably be part of every suite run
> right now. So most testing is infrequent, manual, and ad-hoc.
> After discussion we suggested developers should build those tests even
> if they can’t be run regularly right now, because they can at least be
> run by teams prior to releases and it’s still cheaper and more
> reliable to find machine time than make a person do them all. I
> committed to discussing with the Ceph Leadership team whether it would
> be appropriate to start setting aside a small portion of time in the
> sepia lab to regularly do larger-scale tests like this, once they
> exist. (We suggested one or two days a month.)
> PROBLEM TOPIC: build scale tests in separate suites and reserve lab
> time to run them.
>
> The topic of distribution testing came up briefly. In modern history
> the lab has run Ubuntu (one or two LTSes) and CentOS (the latest
> release), with a mostly random mix unless your job demanded a specific
> OS. I believe Flipkart mentioned Debian as a possible target; we
> certainly build packages for Debian and a few other distros that
> aren’t tested in the lab at all. But the main issue with adding
> distros is that in addition to needing to keep the images up-to-date,
> they require (minor) changes to teuthology that we can’t keep alive
> without somebody committing to them. In cases where that happens,
> we’re happy to bring in new systems: both RHEL and Suse have been
> added to the sepia lab and teuthology in the last few months.
> A general note: we recently changed to doing a full OS provision on
> every test (via FOG), so if you want a random mix of OSes you now need
> to specify that. There’s a new teuthology “+” file operator for saying
> “select any one of these yaml frags for each test”.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html