Over the many years that libvirt has been in existance, we have developed a huge number of features, and rate of change in the codebase shows no sign of slowing down. Unfortunately the rate of regressions and/or new bugs also shows no sign of decreasing. As libvirt is used ever more widely, poor quality releases are having an increasingly serious negative impact on the impressions of libvirt by downstream consumers. In addition the introduction of the new access control facilities will have the unfortunate side-effect of increasing the severity of many bugs. Historically a read-write connection to libvirtd could be considered to be equivalent to root, so a bug affecting an authenticated user would not be considered a privilege escalation, merely denial-of-service. With the introduction of access control, a read-write connection to libvirtd may have substantially fewer privileges than root. The result is that many (arguably all) crasher bugs in libvirtd which were previously considered mere denial of service issues, and not worthy of security alerts, will now be classed as potential privilege escalation flaws. It is clear that the current rate of defects in libvirt code is no longer tenable / acceptable. There are a number of test suites which are targetting libvirt, but this document will focus on the unit testing framework, rather than the integration test harnesses. The rationale for this is - Unit tests are run by all developers daily via 'make check' - Unit tests gate the release process - Unit tests are run every time any RPM is built - Unit tests do not have an dependancy on external system infrastructure / setup - Unit tests are directly synchronized with the code commits since they're in the same GIT repo Unit testing techniques ======================= One of the challenges of unit testing is how to isolate the test suite from the broader environment of the machine being used for development. For example, the test suite must not rely on information located in the filesystem outside the source treee (eg it must not touch /proc or /sys). In addition it must be careful not to negatively impact anything running on the developers' machine (eg it must not spawn libvirtd in a way which would clash with existing running libvirtd) There is no silver bullet to achieve isolation of the test suite from the host OS environment. Instead a variety of different techniques must be applied depending on the scenario to be addressed. Code structure -------------- First, and foremost, code structure has an enourmous impact on the ability to unit test code. Ensuring there are clear boundaries & interfaces between different areas of cdoe is key to allowing pieces to be unit tested in isolation from each other. For example, testing of the QEMU command line generator, can only be achieved because this code for converting from XML to ARGV has been isolated from the code which actually starts up the QEMU process. While some areas of libvirt code are very well modularized, others are not. When writing new unit tests it is not at all unexpected to have to re-factor existing code to further modularize it. Such refactoring is usually good for overall readability of the code, as well as facilitating testing. Intermediate data forms ----------------------- As hinted at above, a key idea when refactoring code to make it more amenable to testing is to introduce an intermediate data form. For example, when generating the QEMU command line arguments from an XML document, the virCommandPtr object serves as the intermediate data form. Instead of having the API which builds the command line, directly execute the QEMU binary, it spits out a virCommandPtr object as an intermediate data form. The contents of this object can then be validated against pre-defined expected command lines. As a counter-example, the network filter code is very hard to unit test as it is currently designed because it does not have any intermediate data form. The methods which process the network filter configuration build up command lines and then directly execute them as they go. There is no way to extract the command line data, without them being executed. To enable unit testing it is neccessary to refactor the network filter code to introduce a clear intermediate data form which can represent the commands that will be executed. Static analysis --------------- While the elephant in the room is Coverity, there is a large amount of static analysis that can be done directly in the libvirt source tree. Indeed all of the 'syntax-check' rules are considered to be static analysis. While these rules all focus on style issues, there is no reason why this should be the case. What makes static analysis tools like Coverity so hard to write is the problem of understanding the semantics of C code. Fortunately, when doing static analysis of a specific project (like libvirt), there is often no need to solve the general case. If there are certain style patterns used in a project, these can make it possible to write project specific analysis rules with little more than a few regular expressions or other simple scripting statements. For example, the two style checks 'src/check-drivername.pl' 'src/check-driverimpls.pl' enforce a consistent naming convention on each internal driver method. With this naming pattern enforced, it is now possible to write a script which validates that every driver method includes a call to the correct access control hook. This is project-specific static analysis that could not be done with Coverity. For more advanced problems it is possible to make use of a framework such as CIL to perform static analysis. CIL is an OCaml project which can parse most C code providing an in-memory representation of its structure. Then it has a number of modules to perform data flow analysis on the code. Combined with knowledge of project coding patterns this tool allows the creation of scripts to validate mutex lock acquisition ordering rules, or correct error reporting on all error exit paths. Monkey patching / Duck Punching ------------------------------- One of the pain-points with a project such as libvirt is finding a way to test parts of the code which rely on external system state, such as /proc, or /sys filesystem data. It is not always possible to refactor the code to test it in isolation from the system services. In such cases, the technique of monkey patching or duck punching (better known from the world of scripting languages) can in fact be used from C. The idea behind the technique is to replace the APIs / objects that are used by the test suite, with set of "mock" objects or API implementations which simulate their operation, without relying on external state. The trick to applying this technique in the C world with native code, is to rely on the use of the dynamic linkers' library preload facility, via the LD_PRELOAD. Any symbol exported by a library listed in the LD_PRELOAD env variable, will override symbols with the same name in the library that the test suite is actually linked to. So, for example, if the preloaded library exports an 'open' symbol, this will replace the 'open' symbol exported by the standard C library. To use this technique it is neccessary to first understand what interfaces need to be replaced. If attempting to test some code which uses files in some location under /sys/, one might change the implementation of 'open', so that when passed a filename starting with '/sys', it will re-write the path to point to a location under the test suite directory. This technique is used in the vircgrouptest.c / vircgroupmock.c files to allow comprehensive testing of libvirt's cgroup management code, without actually ever touching the cgroup filesystem. It is also used by the securityselinuxtest.c / securityselinuxhelper.c files to stub out the sys calls which write SELinux attributes. Fuzz testing ------------ This refers to the technique of taking an API or interface and giving it some data which has some intentionale inaccuracies or edge cases. For example, a API accepting a 'const char *str' might be given a NULL string, or a zero length string, or a string containing non-printable control characters. An API accepting an 'int' might be tested with various boundary conditions such as -1, 0, -MAX_INT, +MAX_INT. In the case of an XML document, one technique would be to incrementally remove attributes and/or elements to see if the parser will ever crash on a NULL pointer (it should either succeed or return a error, never crash). Or alternatively change the values in various attributes as one would with parameters passed to an API. If an attribute appears to be a number, then replace it with a string. If an attribute appears to be an enum, replace it with some bogus value. Try zero length strings for various attributes, or again try non-printable control characters. In the case of an RPC protocol, the idea would be to make (intelligent) changes to packets in an attempt to trick the server into accepting bad data, or even crashing. It is not practical to test all possible wierd data sets, so the trick with fuzz testing is to try to intelligently tailor the data to the interface being tested. eg if a string parameter represents a file path, there's no point just randomly changing characters in the path, since that'll just generate loads of other valid file paths. Instead it is neccessary to make some more planned changes like removing the leading '/' to produce a relative path, inserting a number of '../../../' dirs to trick the application into writing to the incorrect location, and things of that nature. For any moderately complex parser, intelligently designed fuzz testing is likely to identify a wide range of bugs. Error testing ------------- Validating correct operation is great, but there is plenty of code which lives in error handling code paths, which will never be exercised this way. Since they are seldom run, and by their nature deal with unexpected data or events, the error paths are often a good source of bugs. The fuzz testing technique is one very effective way of exercising error handling paths, but should not be considered the only possible one. The monkey patching techniques can also be applied to the task of error checking. Alternatively the code structure may facilitate the injection of error conditions in a well defined manner. For example, all libvirt memory allocation goes via a wrapper API, instead of to malloc() directly. It is possible to tell the wrapper API to fail specific allocation attempts. By running the test suite repeatedly, failing a different memory allocation each time, it is possible to exhaustively test error handling of the tested code. Areas requiring testing ======================= The best way to identify areas lacking code coverage is to simply pick a random libvirt source code file. The odds are that it has insufficient testing. The more optimistic way is to look at the code coverage reports. Run 'autogen.sh' with the --enable-test-coverage arg. Then build it and run 'make check'. Once the tests have completed, then run 'make cov' to generate an HTML code coverage report. Some known areas needing work are: Cgroups setup for QEMU ---------------------- For a given QEMU guest XML configuration there is a well defined set of cgroups parameters that are expected to be set. The vircgroupmock.c file demonstrates how to monkey patch the cgroups filesystem, allowing unit tests to be written. The vircgroupmock.c helper could be reused to create a test suite for validating the operation of the qemu_cgroups.c APIs. Cgroups setup for LXC ---------------------- The same as the QEMU test, but for the lxc_cgroups.c file. SELinux security driver labelling --------------------------------- For a given XML configuration, there is a well defined set of file labels that the SELinux security driver must apply to the filesystem on VM startup. The securityselinuxhelper.c file demonstrates how to monkey patch the SELinux library file labelling APIs from unit tests. Currently the securityselinuxlabeltest.c test suite does checks for labels set on disks and serial devices. There are many other aspects of domain XML that imply labelling. In particular PCI/USB host devices & kernel/initrd boot images. It is also neccessary to deal with backing files for qcow2 format disks. Audit messages -------------- Upon QEMU or LXC guest startup a number of audit records must be logged detailing various pieces of metadata about the guest. By monkey patching the audit_log_user_message() and audit_open() library APIs with an LD_PRELOAD library, it would be possible to perform unit testsing of the domain_audit.c APIs to ensure that correct audit records are logged for various different guest configurations. Network filter -------------- As described earlier, the network filter code is really badly designed from the point of view of unit testing. It needs to be refactored to have an intermediate data form that represents the list of iptables/ebtables rules that are expected for a given XML configuration. Currently it generates shell scripts containing iptables commands, optionally re-directing via the 'firewall-cmd' program for integration with firewalld. Refactoring to unit test this offers a prime opportunity to remove the use of shell entirely The core observation of the nwfilter code is that it consists of lists of commands to be run to create ebtables/iptables rules. Periodically there are commands which are able to rollback the state changes made by earlier commands when an error occurs. The idea for the intermediate data form would thus be to introduce the concept of a "script" object, as a higher level interface than that offered by virCommandPtr. A script would comprise one or more sets of command lines for making changes. At various points it would be possible to add a checkpoint marker, together with one or more sets of command lines that would be execute when errors occur which require rollback. The network filter code would thus output "script" objects whose command sets could be validated by a unit test suite. These script objects can then be translated into virCommandPtr objects for actual execution by separate code. Or even translated into DBus API calls for direct communication with firewalld, bypassing the inefficient firewall-cmd program. The unit test would consist of XML data files with network filter configuration rules, alongside expected script command lines, similar to how the qemu xml -> argv test works. virsh testing ------------- The virsh command has rarely had any serious direct testing of its own. Its code is mostly exercised indirectly by test suites doing testing of QEMU or other drivers. Libvirt has a hypervisor driver that is explicit targetted to testing tools like virsh - the "test" driver. This is a 100% in-memory driver that stubs out all the libvirt APIs with semi-serious semantics. It is used to do automated testing of large parts of virt-install and virt-manager. It can be used to get near 100% testing coverage of the virsh command from within unit tests. Since it is all in-memory there is no scope for it to be affected the host system state. libvirtd testing ---------------- Testing libvirtd is a tricky proposition, because merely starting the daemon will have an effect on host OS state, particularly if any resources are set to autostart. It can also clash with any libvirtd process the developer is already running. The test driver is able to be run inside libvirtd. If it were possible to disable all the other hypervisor drivers in libvirtd, the test driver could be used to execise the RPC dispatch code in libvirtd in a safe contained manner. This could be particularly good for doing scalability testing of libvirtd when large numbers of objects. eg since the test driver doesn't actually run anything, it is possible to start many 1000's of "test" guests with the only memory overhead being that of the parsed XML configuration. This would allow the test suite to identify & validate limits in the RPC protocol or inefficiencies in libvirtd. lock driver client ------------------ The virtlockd client code recently had a bug where it generated invalid RPC packets in response to certain operations. It is not desirable to have the unit tests actually run the full virtlockd daemon, just to validate the client code. It would, however, be possible to have a unit test which provided a fake virtlockd UNIX domain socket for the client to connect to. It would blindly read any data off the socket & validate it against pre-defined RPC packets. This approach is already used with some success in the QEMU JSON monitor test suite. QEMU disk snapshots ------------------- The QEMU snapshot code is some of the most complex code in the QEMU driver. As such it would be desirable to get test coverage of it. The easiest way todo this would be via the libvirt TCK integration test harness. There is, however, value in refactoring some of the code to allow its testing in a unit test environment. One approach would be for the test suite to directly invoke methods in the qemu driver dispatch table. This would require refactoring the qemuStateInitialize code though, to make it possible to setup the basic QEMU driver global state. More desirable would be to refactor the snapshot code so that it was more detached from the public driver APIs. For example if some of it could be isolated in a qemu_snapshot.c file, this may make it practical to unit test some of it. Techniques from the qemumonitorjsontest.c file could be used to stub out the monitor APIs that the snapshot code requires in order to operate. QEMU driver startup ------------------- An important aspect of the QEMU driver is that it is able to reconnect to existing domains when it starts up. It would be desirable to unit test this functionality to validate the sequence of operations done when reconnecting to a guest. Again this would require some refactoring of the qemuStateInitialize method to allow its execution in a test suite. It would also likely require the techniques for stubing the QEMU monitor APIs. XML parser/formatter reliability -------------------------------- While we have good testing of valid XML documents, there have been a number of cases where invalid XML documents have caused crashes or other bad behaviour. It is not reasonable to add test cases for every possible invalid XML document. Instead we need to adopt a fuzzing approach, where we take valid XML documents and then make incremental changes to them & then ensure that the code does not crash. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list