This mail provides a braindump introduction to a testing system I've been working on for libvirt drivers. Also following that is a short guide on how to actually write test cases to give a better flavour of what its all about. Be warned, this is a very long email :-) libvirt TCK : Technology Compatibility Kit =========================================== The libvirt TCK provides a framework for performing testing of the integration between libvirt drivers, the underlying virt hypervisor technology, related operating system services and system configuration. The idea (and name) is motivated by the Java TCK In particular the libvirt TCK is intended to address the following scenarios - Validate that a new libvirt driver is in compliance with the (possibly undocumented!) driver API semantics - Validate that an update to an existing driver does not change the API semantics in a non-compliant manner - Validate that a new hypervisor release is still providing compatability with the corresponding libvirt driver usage - Validate that an OS distro deployment consisting of a hypervisor and libvirt release is configured correctly Thus the libvirt TCK will allow developers, administrators and users to determine the level of compatability of their platform, and evaluate whether it will meet their needs, and get awareness of any regressions that may have occurred since a previous test run In relation to other libvirt testing, the split of responsibiity will be libvirt testsuite (aka $CHECKOUT/tests) - unit testing of specific internal APIs - functional testing of the libvirtd using the 'test'+'remote' drivers - functional testing of the virsh command using the 'test' driver libvirt TCK - functional/integration testing of the 'live' drivers Framework requirements ====================== The libvirt TCK is built using Perl in order to take advantage of the advanced, but yet very simple, testing frameworks available with Perl. Thus the libvirt interactions will all be done via the libvirt Perl bindings, Sys::Virt (or perl-Sys-Virt RPMs) The framework is thus built on the following Perl modules - Test::More - simple framework for writing individual tests - TAP::Harness - simple framework for running sets of tests - Sys::Virt - binding for libvirt API - XML::Writer - module for generating XML documents - XML::Twig - module for parsing XML documents & XPath - Config::Record - module for parsing simple configuration files There are a handful of other modules these depend on, but these are the most important 'top level' modules in use. These is all currently available within Fedora 10, with exception of perl-accessors, perl-TAP-Formatter-HTML. A similar situation existss for RHEL-5 considering modules from EPEL-5 These modules are all well tested, actively maintained parts of Perl / CPAN, so easily available for every other operating system in existance. As a convenience I have published repositories for Fedora 10 and RHEL-5 on x86_64 including these modules. Use the following YUM configs For Fedora 10: [libvirt-tck] name=libvirt TCK baseurl=http://berrange.fedorapeople.org/libvirt-tck/yum-f10/x86_64/ gpgcheck=0 enabled=1 For RHEL-5: [libvirt-tck] name=libvirt TCK baseurl=http://berrange.fedorapeople.org/libvirt-tck/yum-rhel5/x86_64/ gpgcheck=0 enabled=1 NB: These repos assume you have already updated to the latest 0.6.2 libvirt RPM available for the distro in question. Overview of framework structure =============================== For following discussions, it may be convenient to refer to the source code of the framework. This is available from hg clone http://libvirt.org/hg/libvirt-tck First there are a couple of Perl modules to provide assistance when dealing with libvirt / writing tests - Sys::Virt::TCK in lib/Sys/Virt/TCK.pm The core module for connecting to libvirt, creating a clean environment (ie blowing away all existing domains), and generating simple XML configs for guests - Sys::Virt::TCK::DomainBuilder in lib/Sys/Virt/TCK/DomainBuilder.pm A helper for constructing XML configs for guest domains - Sys::Virt::TCK::Capabilities in lib/Sys/Virt/TCK/Capabilities.pm A helper for parsing the libvirt capabilities XML - Sys::Virt::TCK::TAP::XMLFormatter in lib/Sys//Virt/TCK/TAP/XMLFormatter.pm A plugin for TAP::Harness that is able to record all test results in a structured XML document. As mentioned before, the framework is built about Test:More and the TAP::Harness modules from Perl. This already comes with a simple command called 'prove' for running tests & reporting on results. It has a rather baffling array of options, so to make it simpler to run the libvirt TCK, there is a small program - bin/libvirt-tck (installed to /usr/bin/libvirt-tck) Given no arguments, this will connect using the default hypervisor URI and a previously obtained kernel+initrd and run all the tests currently available for the libvirt TCK, and report on failures. It comes witha number of options to alter the output format or choose different configurations. 'man 1 libvirt-tck' will produce details The actual tests themselves are simply short Perl scripts using the Test::More, Sys::Virt and Sys::Virt::TCK modules. Each test decides on what aspect it wants to test, and then implements that logic and tests results. As a demonstration, there are 4 initial scripts - scripts/domain/050-transient-lifecycle.t Creates a guest from XML, destroys it, and then verifies that it has actually gone away. - scripts/domain/060-persistent-lifecycle.t Defines a guest config XML, starts it, destroys it, verifies that the config still exists, and then undefines the config and verifies that it has actually gone. - scripts/domain/070-transient-to-persistent.t Creates a guest from XML, then defines a persistent config for it, destroys the running guest, and then verifies the config is still present. - scripts/domain/080-unique-identifiers.t Defines a guest, and then tries to define / create more guests with clashing name or UUID, and verifies that suitable errors are raised by libvirt. Even these 4 simple proof of concept scripts have highlighted some horrible problems - The QEMU driver 'define domain' method doesn't check for name or UUID uniqueness correctly (well, at all) - After starting an inactive domain, the remote driver does not update the 'ID' field in the virDomainPtr - After destroying a active domain, the remote driver does not update the 'ID' field in the virDomainPtr - When defining a persistent config for an already running domain the Xen XM driver blows away the current 'ID' field for the running domain, replacing it with -1. - QEMU refuses to boot kernel+initrd unless at least one disk image is provided In looking at some other things to test I specifically noticed that there terrible inconsistency in the virErrorPtr error codes used for reporting problems. For each API there needs to be a formal core set of error codes that will always be used for a certain set of conditions. eg, when looking up a domain by name, if no such domain exists the driver *must* always return VIR_ERR_NO_DOMAIN Output information ================== The libvirt-tck tool outputs results in a number of formats. The default format is a simple plain test summary listing each test case, and the pass/fail state, and details of each check failure A more verbose text format otputs the full Perl TAP (Test Anything Protocol) format results as described in 'man 3 TAP' or 'man 3 Test::Harness::TAP'. For producing pretty web pages, it is possible to request an HTML output format. Ultimately though it will be desirable to do automated analysis, and comparison of results across releases, OS, drivers, etc. To assist in creating tools todo this, an XML format is also provided There are some examples of these formats, when run against RHEL-5 Xen, Fedora 10 QEMU (libvirt 0.6.2), and Fedora 10 QEMU (libvirt 0.6.2 plus a bunch of code fixes to make it pass) In pretty HTML format: http://berrange.fedorapeople.org/libvirt-tck/results/libvirt-tck-rhel-5.html http://berrange.fedorapeople.org/libvirt-tck/results/libvirt-tck-f10-broken.html http://berrange.fedorapeople.org/libvirt-tck/results/libvirt-tck-f10-fixed.html In full plain text format: http://berrange.fedorapeople.org/libvirt-tck/results/libvirt-tck-rhel-5.txt http://berrange.fedorapeople.org/libvirt-tck/results/libvirt-tck-f10-broken.txt http://berrange.fedorapeople.org/libvirt-tck/results/libvirt-tck-f10-fixed.txt In formal XML format http://berrange.fedorapeople.org/libvirt-tck/results/libvirt-tck-rhel-5.xml http://berrange.fedorapeople.org/libvirt-tck/results/libvirt-tck-f10-broken.xml http://berrange.fedorapeople.org/libvirt-tck/results/libvirt-tck-f10-fixed.xml Running the test suite ====================== For those feeling brave it is possible to try out the current test suite. The best bet is to install the perl-Sys-Virt-TCK RPM from the YUM repos listed above. Then create a config file # cat > /home/fred/qemu.cfg <<EOF uri = "qemu:///session" kernel=/home/fred/vmlinuz initrd=/home/fred/initrd.img disk=/home/fred/empty.img EOF For current tests, it is sufficient to have a kernel + initrd that is able to boot and just sit there being bored. The Fedora 10 PXEboot install kernels suit this job perfectly # wget http://download.fedora.redhat.com/pub/fedora/linux/releases/10/Fedora/x86_64/os/images/pxeboot/initrd.img # wget http://download.fedora.redhat.com/pub/fedora/linux/releases/10/Fedora/x86_64/os/images/pxeboot/vmlinuz A empty disk image is need to keep QEMU happy - it won't boot a kernel/initrd without a disk image, so just create a 10 MB image # dd if=/dev/zero of=empty.img bs=1M count=10 Next, start an unprivileged instance of libvirtd, eg as your normal user account # /usr/sbin/libvirtd Now it should be possible to run # libvirt-tck -c /home/fred/qemu.cfg It will no doubt show many failures. To get more information add the -v flag # libvirt-tck -c /home/fred/qemu.cfg -v Or to get XML format # libvirt-tck -c /home/fred/qemu.cfg --format xml What's still todo ================= The code as it stands is the bare minimum to get a proof of concept working for testing of domain APIs for Xen and QEMU drivers. The test suite though is intended to be independant of any driver, and also allow for coverage of all the libvirt APIs. Of the top of my head, some important things that need doing - libvirt-tck-prepare For now it was sufficient to just grab the kernel+initrd from Fedora 10 pxeboot location, but not every kind of virtualization can boot off a kernel + initrd. Older Xen HVM cannot do this. VMWare cannot do this. OpenVZ / LXC have no concept of separate kernels, etc. The libvirt-tck-prepare command would automate the process of setting up some pre-requisite pieces. Specifically it would - Download kernel+initrd.img from a suitable location - Build an bootable ISO image using the kernel+initrd - Create a virtual root filesystem, populated with busybox commands (for LXC/OpenVZ) - Create an empty disk image - Be more intelligent about building domain XML configs. In particular look at the capabilities XML to decide whether to create a config that boots off kernel+initrd vs ISO vs a virtual root filesystem for containers - Add helpers for building network, storage and inteface XML configs - Expand on configuration to allow admin to indicate some resources that can be safely used during tests - A friendly NFS server & some of its exports - A spare disk (eg /dev/sdb) that can be played with and trashed - A spare network interface (or two) that can be played with - A spare PCI device that can be detached from host - A spare USB device that can be detached from host - A path with X GB of free space to play with for storage pool usage - A friendly iSCSI server & some of its targets If any of those resources were not available, the test cases needing them would simply be skipped. This is very easy to cope with in the Test::More framework plan skip_all "no iscsi server available" unless $tck->has_config("iscsi-server"); - Fix up all the horribly broken areas of libvirt that this uncovers. This will entail deciding that the semantics for various edge cases are with each API. Deciding what errors codes need to be formally defined for each API. Figuring out how to implement/fix the neccessary semantics in the drivers. - A way to record specific test failures as 'known broken' for a particular driver / platform combination. - A tool to take an XML report from the TCK, mask out all tests that are 'known broken', and then report on the remaining problems which need fixing. - A tool to compare two XML reports and show interesting differences in functionality, and / or bugs How this might be used ====================== The original motivating goal for this is obviously to improve the overall quality control of libvirt and things it interacts with. There are a number of scenarios in which I see this being used: - Fedora rawhide updates to a release of QEMU => Run the TCK to make sure it didn't break anything (new) in libvirt - Declaring feature freeze for a libvirt release => Run the TCK to determine what state of each driver is. Decide what problems should be release blockers. It is expected that some drivers may have long term failures, due to features that will not be implemented - Released new libvirt => Provide online 'reference' reports for the new libvirt release against various platforms. Allows OS distributors to determine whether their changes cause regressions, or if it is a known-broken item. - App developer looking to understand feature support => Look at the TCK reports for the driver and decide if it implements enough of the functionality to be worth supporting. The key factor I think is that it is unreasonable to expect that the TCK will complete without failures for every libvirt driver, let alone every OS. Some features will simply be impossible to implement for certain platforms. Thus the key is in tracking what areas are known to be broken, to make it possible to identify regressions in areas that are expected to work. The known broken areas may also provide motivation for new feature development in associated tools. libvirt TCK: An introduction to writing tests ============================================= The libvirt TCK provides a framework for testing the correct operation of libvirt drivers and their integration with the host operating system virtualization services. Since the focus is on functional integration testing, the tests and driven from the public libvirt API, closely replicating the kind of usage expected from applications using libvirt. The tests are written in Perl, primarily using the libvirt Perl language bindings (Sys::Virt / perl-Sys-Virt), and the common Test::More framework. The libvirt TCK also provides a number of helper modules to simplify the process of creating interesting tests Output format for a test case ============================= To enable automated reporting and analysis of test results, there is a well defined output format that tests must follow. A single test case consists of a sequence of checks each with a pass/fail status, the aggregate status giving the pass/fail state of the test as a whole. This information is presented in a simple, line oriented text format The general format can be summarized as 1..N ok 1 Description # Directive # Diagnostic not ok 2 Description .... ok N Description The first line here defines the plan, giving the expected number of checks that will be run in the test case. This enables the test harness to determine if a test case crashed / exited earlier than expected without running all checks. Each line starts with a word 'ok' or 'not ok' to indicate state of the check, followed by the check number (assigned incremnetally), and a description of the check performed. Diagnostic comments can be output using a leading '#' to assist in debugging / interpreting the results. A more real example would be 1..4 ok 1 - Input file opened not ok 2 - First line of the input valid ok 3 - Read the rest of the file not ok 4 - Summarized correctly # TODO Not written yet This is more or less all that it is neccessary to know about the output format, though far far more details can be found by reading the Perl documentation for 'TAP' (aka Test Anything Protocol) available either in 'man 3 TAP' or 'man 3 Test::Harness::TAP' depending on the Perl version. Writing tests with a compliant output format ============================================ As if that output format were not simple enough to understand and generate, there are helper modules to make this even easier to deal with. The Perl Test::More module provides a set of useful functions can be invoked to perform checks. The first step is to declare how many checks are intended to be run. This is typically done at time Test::More is imported into the script use Test::More tests => 15; The rest of the test case should be a Perl script that implements the logic you wish to test. At key points throughout the script, checks can be inserted to validate the state. Depending on the type of check desired, there are a number of helper functions available For a simple boolean condition, the 'ok' function can be used ok($boolean, $description); eg my $id = $dom->get_id; ok($id >= 0, "virtual domain ID is greater than or equal to 0"); To compare two pieces of data for equality (or inequality), the 'is'/'isnt' functions are preferred: is($expect, $actual, $description); isnt($expect, $actual, $description); eg my $name = $dom->get_name; is($name, "apache", "virtual domain has name 'apache'"); To compare a list or hash table, then a deep comparison is required. NB, if comparing lists, it will also often be desirable to sort their elements my @domains = sort { $a cmp $b } $conn->list_domains; is_deeply(\@domains, ['apache', 'dns'], $description); Finally to output a diagnostic message, the 'diag' command is suitable diag("Checking that the running guest has an ID > -1"); There are quite a few other variations on these functions, and extensive documentation can be found in the 'Test::More' manual page. Helpers for writing libvirt TCK test checks =========================================== While the above functions are useful for testing simple properties and conditions, they can be a little tedious to use when having to deal with exceptions and objects. The libvirt TCK thus provides a couple of helper functions. The first thing when writing a test is to get a connection to libvirt and make sure the test environment is clean. ie there are no existing guests lieing around. If anything goes wrong here, we need to bail out and not bother with rest of the test. We also want to ensure cleanup when the test case finishes. For this there is a simple boilerplate piece of code that can be included use Sys::Virt::TCK; my $tck = Sys::Virt::TCK->new(); my $conn = eval { $tck->setup(); }; BAIL_OUT "failed to setup test harness: $@" if $@; END { $tck->cleanup if $tck; } Going line by line, this first imports the 'Sys::Virt::TCK' package and its functions. Then it creates an instance of the 'Sys::Virt::TCK' object. Then it runs the 'setup' method to obtain a libvirt connection, catching any error that may be thrown. The fourth line willl abort the entire test if an error occurred during setup. The final line registers a 'END' block which will perform cleanup when Perl exits. When testing APIs, it will often be neccessary to create / define real guest domains with a config. Much of the time the test won't care about the exact config, just wanting a minimal generic domain config that is highly likely to work without error. For such cases, a nice simple API is provided: my $xml = $tck->generic_domain("test")->as_xml; This creates an XML document for a guest that is of the correct OS and domain type to be able to run on the current hypervisor, with a name of 'test', and a single disk. It is possible to set further parameters if required. For example, to set an explicit UUID, give 3 virtual CPUs and turn on ACPI: my $xml = $tck->generic_domain("test")->vcpus(3) ->uuid("11111111-1111-2222-3333-444444444444") ->with_acpi()->as_xml() Notice how it allows for chaining the method calls together to build the domain config, turning it into XML at the last step If testing a method that is expected to return a virtual domain object (ie an instance of Sys::Virt::Domain), the 'ok_domain' helper should be used. This takes 2 or 3 parameters. The first is the code block to be checks, the second is a description and the optional third parameter is the expected name of the guest domain. eg to test domain creation from an XML doc my $dom; ok_domain { $dom = $conn->create_domain($xml) } \ "created a running domain", "test"; This creates a new running guest from '$xml', and checks that it succeeeded and returns a domain object with an expected name of 'test'. If an exception was thrown during guest creation this will be reported as an error. If the guest has the incorrect name, that will also be reported as an error. If testing a method that is expected to thrown an exception and thus not return a value, the 'ok_error' helper should be used. This takes 2 or 3 parameters. The first is the code block to be checked, the second is a description and the optional third parameter is the expected error code. eg to ensure that the guest named 'test' does not exists, and that an error is raised when attempting to do a lookup for it. ok_error { $conn->lookup_domain_by_name("test") } \ "no such domain error raised", \ Sys::Virt::Error:ERR_NO_DOMAIN; This code block attempts to lookup a domain based on its name. For success, it requires that the domain does not exist and that libvirt throws an exception with a code VIR_ERR_NO_DOMAIN. If that does not happen, then a failure will be reported. A real test case example walkthrough ==================================== This example will illustrate how to test operation of persistent virtual domains. Our plan for the test is to run the following sequence of operations - Define a new inactive guest from XML - Start the guest config - Stop the running guest - Undefine the now inactive guest config There will be certain sanity checks at various stages. For example, after starting the guest, it will check that the guest ID is greater than zero. After stopping the guest, it will check the ID is -1. After undefining the guest, it will check that another lookup fails, to validate that it really went away. The first step is to write the core algorithm in Perl code using the Sys::Virt APIs. Very simplified it looks like this my $conn = ...get a libvirt connection... my $xml = "....the xml config..."; my $dom; $dom = $conn->define_domain($xml); $dom->create; $dom->destroy; $dom->undefine; Now it is time to start putting in sanity checks. When defining the domain, it is neccessary to check that returned a real domain object, and that no exception is thrown. The 'ok_domain' method can be used for that. It is also wise to print a diagnostic method before doing anything interesting So the define_domain line turns into diag "Defining inactive domain config again"; ok_domain { $dom = $conn->define_domain($xml) } "defined persistent domain config"; After then starting the domain, the test will check that it has a proper unique ID number. So the 'create' line turns into diag "Starting inactive domain config"; $dom->create; ok($dom->get_id() > 0, "running domain has an ID > 0"); Since this is testing persistent domains, after stopping the running guest, it should still be possible to look it up. Thus the line that stops the guest, gains a check for its ID number, followed by another check that the guest is still present diag "Destroying the running domain"; $dom->destroy(); is($dom->get_id(), -1 , "inactive domain has an ID == -1"); diag "Checking there is still an inactive domain config"; ok_domain { $dom1 = $conn->get_domain_by_name("test") } "the inactive domain object"; Finally, after undefining the guest it is neccessary to validate that it really has gone away, by trying to look it up based on name, and checking that an error is raised diag "Undefining the inactive domain config"; $dom->undefine; ok_error { $conn->get_domain_by_name("test") } \ "NO_DOMAIN error raised from missing domain", \ Sys::Virt::Error::ERR_NO_DOMAIN; The completed example test script ================================= It is good practice to include a short documentation comment in test scripts to outline what the script intends to validate. The Perl POD format is useful for this (see 'man perlpod') for more info. Taking this into account, the complete example script looks like # -*- perl -*- # # Copyright (C) 2009 A N Other =pod =head1 NAME example-persistent-domain.t - Persistent domain lifecycle =head1 DESCRIPTION The test case validates the core lifecycle operations on persistent domains. A persistent domain is one with a configuration enabling it to be tracked when inactive. =cut use strict; use warnings; use Test::More tests => 5; use Sys::Virt::TCK; my $tck = Sys::Virt::TCK->new(); my $conn = eval { $tck->setup(); }; BAIL_OUT "failed to setup test harness: $@" if $@; END { $tck->cleanup if $tck; } my $xml = $tck->generic_domain("test")->as_xml; my $dom; diag "Defining inactive domain config again"; ok_domain { $dom = $conn->define_domain($xml) } "defined persistent domain config"; diag "Starting inactive domain config"; $dom->create; ok($dom->get_id() > 0, "running domain has an ID > 0"); diag "Destroying the running domain"; $dom->destroy(); is($dom->get_id(), -1 , "inactive domain has an ID == -1"); diag "Checking there is still an inactive domain config"; my $dom1; ok_domain { $dom1 = $conn->get_domain_by_name("test") } "the inactive domain object"; diag "Undefining the inactive domain config"; $dom->undefine; ok_error { $conn->get_domain_by_name("test") } \ "NO_DOMAIN error raised from missing domain", \ Sys::Virt::Error::ERR_NO_DOMAIN; Running the test script ======================= Having created the test script it can be run directly using Perl, simply by setting an environment variable pointing to the config file # export LIBVIRT_TCK_CONFIG=/etc/libvirt-tck/xen.cfg # perl example-persistent-domain.t If the libvirt driver being tested were bug-free it would result in the following output 1..5 # Defining inactive domain config again ok 1 - defined persistent domain config # Starting inactive domain config ok 2 - running domain has an ID > 0 # Destroying the running domain ok 3 - inactive domain has an ID == -1 # Checking there is still an inactive domain config ok 4 - the inactive domain object # Undefining the inactive domain config ok 5 - NO_DOMAIN error raised from missing domain If something went wrong, it might look like 1..5 # Defining inactive domain config again ok 1 - defined persistent domain config # Starting inactive domain config not ok 2 - running domain has an ID > 0 # Failed test 'running domain has an ID > 0' # at /home/berrange/ex line 39. # Destroying the running domain libvirt error code: 7, message: invalid domain pointer in no domain with matching id -1 # Looks like you planned 5 tests but only ran 2. # Looks like you failed 1 test of 2 run. # Looks like your test died just after 2. Notice that since the tst script declared upfront that it intended to run 5 checks, Perl was able to detect that it aborted earlier than expected. -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list