We identified several under-tested components in the Ceph project. Several of these consisted of tests that simply weren’t written: NFS-Ganesha has light testing in RGW, but none with CephFS; Samba’s testing is very light. Significantly more interesting is that none of the installers/orchestrators/normal process management (Ansible or DeepSea with systemd; containers under Kubernetes) are currently tested in teuthology. Changing that is a big desire for most of the integrators, but is a large project covering both the internal implementation and testing tasks. Right now, teuthology directly invokes Ceph processes via ssh and relies on that for control, for checking state (ie, the process is still running), and for easy logging of issues, and that has spilled over into important “task" modules such as the thrasher and cluster managers. There were rumors of individual efforts that might have been started to enable testing of a normal deployment, but nobody in the room knew for sure. PROBLEM TOPIC: support testing orchestration frameworks and the normal init system in teuthology Orit also discussed RGW in this context. She noted that RGW has good coverage of the basic S3 functionality but that more advanced features tend to miss some tests because they aren’t a good fit for the way we currently use teuthology. Her specific example was bucket sharding: trivial tests exist to make sure the commands operate and don’t immediately break, but actually stressing the sharding code requires millions of entries with ongoing IO, and dumping that much data into a cluster simply takes too long to reasonably be part of every suite run right now. So most testing is infrequent, manual, and ad-hoc. After discussion we suggested developers should build those tests even if they can’t be run regularly right now, because they can at least be run by teams prior to releases and it’s still cheaper and more reliable to find machine time than make a person do them all. I committed to discussing with the Ceph Leadership team whether it would be appropriate to start setting aside a small portion of time in the sepia lab to regularly do larger-scale tests like this, once they exist. (We suggested one or two days a month.) PROBLEM TOPIC: build scale tests in separate suites and reserve lab time to run them. The topic of distribution testing came up briefly. In modern history the lab has run Ubuntu (one or two LTSes) and CentOS (the latest release), with a mostly random mix unless your job demanded a specific OS. I believe Flipkart mentioned Debian as a possible target; we certainly build packages for Debian and a few other distros that aren’t tested in the lab at all. But the main issue with adding distros is that in addition to needing to keep the images up-to-date, they require (minor) changes to teuthology that we can’t keep alive without somebody committing to them. In cases where that happens, we’re happy to bring in new systems: both RHEL and Suse have been added to the sepia lab and teuthology in the last few months. A general note: we recently changed to doing a full OS provision on every test (via FOG), so if you want a random mix of OSes you now need to specify that. There’s a new teuthology “+” file operator for saying “select any one of these yaml frags for each test”. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html