Document the fuzzers in two ways. 1. Explain the high level working of the fuzzers under docs/kbase. 2. Add README to explain general setup of the fuzzer and its usage. Signed-off-by: Rayhan Faizel <rayhan.faizel@xxxxxxxxx> --- docs/kbase/index.rst | 3 + docs/kbase/internals/meson.build | 1 + docs/kbase/internals/xml-fuzzing.rst | 120 ++++++++++++++++++++++++ tests/fuzz/README.rst | 131 +++++++++++++++++++++++++++ 4 files changed, 255 insertions(+) create mode 100644 docs/kbase/internals/xml-fuzzing.rst create mode 100644 tests/fuzz/README.rst diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst index e51b35cbfc..9cf6268800 100644 --- a/docs/kbase/index.rst +++ b/docs/kbase/index.rst @@ -116,3 +116,6 @@ Internals `QEMU monitor event handling <internals/qemu-event-handlers.html>`__ Brief outline how events emitted by qemu on the monitor are handlded. + +`XML Fuzzing <internals/xml-fuzzing.html>`__ + Working of the structure-aware XML fuzzers. diff --git a/docs/kbase/internals/meson.build b/docs/kbase/internals/meson.build index f1e9122f8f..86b6639419 100644 --- a/docs/kbase/internals/meson.build +++ b/docs/kbase/internals/meson.build @@ -9,6 +9,7 @@ docs_kbase_internals_files = [ 'qemu-migration', 'qemu-threads', 'rpc', + 'xml-fuzzing', ] diff --git a/docs/kbase/internals/xml-fuzzing.rst b/docs/kbase/internals/xml-fuzzing.rst new file mode 100644 index 0000000000..85f565fda5 --- /dev/null +++ b/docs/kbase/internals/xml-fuzzing.rst @@ -0,0 +1,120 @@ +=================== +Libvirt XML fuzzing +=================== + +XML fuzzing is done using libFuzzer and libprotobuf-mutator. XML fuzzing +cannot be done with normal fuzzing methods, as XML is a highly structured +format. Structure-aware fuzzing is implemented using libprotobuf-mutator which +mutates and fuzzes protobuf inputs. Protobufs are used as an intermediate +format and serialized to XML. + +Protobuf to XML representation +============================== + +A protobuf definition written to fuzz libvirt XML formats may resemble the +following. + +:: + + message MainObj { + message SomeTagMessage { + optional uint32 A_number = 1; + optional DummyString A_name = 2; + + enum typeEnum { + typeA = 0; + typeB = 1; + typeC = 2; + } + + optional typeEnum A_type = 3; + + message InnerTagMessage { + optional uint32 A_number = 1; + } + + repeated InnerTagMessage T_innertag = 4; + + message SecondInnerTagMessage { + optional uint32 V_value = 1; + } + optional SecondInnerTagMessage T_secondinner = 5; + } + + optional SomeTagMessage T_sometag = 1; + } + +* Fields starting with ``T_`` represent XML tags. Their types are protobuf messages + which may further contain other protobuf-defined XML tags or attributes. + +* Fields starting with ``A_`` represent XML attributes. Most of the time, + it uses one of the primitive datatypes (Eg: ``uint32``, ``bool``, ``enum``, etc. ) available in protobuf. + + * If the attribute can take multiple data types, it is encapsulated in a ``oneof`` statement. + The field name also has a prefix of ``A_OPTXX_`` where ``XX`` is a number between 0 to 99. + * If the attribute name contains special characters, the real name is stored in + ``libvirt::real_name`` which is extended by ``FieldOptions``. + * If an enum value contains special characters, the real value is stored in + ``libvirt::real_value`` which is extended by ``EnumValueOptions``. + +* Fields starting with ``V_`` represent raw text in XML. + + * If ``T_`` and ``V_`` fields are defined in the same message, ``V_`` fields + will be preferred only if it has presence, otherwise it will process the + rest of the ``T`` fields as usual. + * ``V_`` fields can take on the same datatypes as ``A_`` fields. + +* ``repeated`` is used to allow multiple XML tags of the same name. + +``A_`` fields must always precede ``V_`` and ``T_`` fields. Likewise, ``V_`` +fields must precede ``T_`` fields if any. + +On fuzzing the above protobuf definition, one of the possible protobuf to XML +serializations could be + +:: + + <sometag number='1' name='dummy' type='typeB'> + <innertag number='2'/> + <innertag number='3'/> + <secondinner>1241232</secondinner> + </sometag> + +Custom Protobuf Datatypes +------------------------- + +Sometimes, primitive data types or enums are not enough to encode the +desired attribute values, especially if they themselves are structured. In this +case, such fields are represented by a handwritten protobuf message defined in +``xml_domain_datatypes.proto``. To serialize these messages to XML attribute +values, custom handlers are defined in ``proto_custom_datatypes.cc``. + +This is useful for data types such as IP addresses, MAC addresses, target +device names, etc. + +Protobuf generation +=================== + +``proto`` files are automatically generated on compile-time using the script +``relaxng_to_proto.py``. The script parses relaxng schemas to generate a protobuf +file containing fields and messages representing all the defined XML tags and +attributes. + +The script tries to figure out the correct datatype of the XML attribute. +However, on its own it can only figure out the general datatype or enum values +of the attribute but not the constraints or regex patterns. Some override tables +are present to improve upon that. + +Fuzzer Harnesses +================ + +Driver-specific harnesses in general re-use the existing test driver setup +as well as other existing test utilities under ``tests/``. Harnesses are +available for the following drivers: + +* QEMU XML Domain +* QEMU XML Hotplug +* CH XML Domain +* VMX XML Domain +* libXL XML Domain +* NWFilter XML diff --git a/tests/fuzz/README.rst b/tests/fuzz/README.rst new file mode 100644 index 0000000000..d92cdc94d7 --- /dev/null +++ b/tests/fuzz/README.rst @@ -0,0 +1,131 @@ +======= +Fuzzing +======= + +The XML fuzzing project was built as part of Google Summer of Code 2024. +The fuzzing project aims to find edge-case XML configurations that may crash +libvirt during parsing. The libvirt domain XML format is a highly structured +grammar so normal methods of fuzzing will not work. We use a combination +of libFuzzer and libprotobuf-mutator to perform structure-aware fuzzing of +various libvirt XML formats. The XML is represented through an intermediate +protobuf that is mutated by libprotobuf-mutator. This protobuf is automatically +generated by a Python script ``relaxng_to_proto.py`` which parses relaxNG +schemas. + +Currently, we fuzz the following: + +* QEMU XML Domain (qemu_xml_domain_fuzz, qemu_xml_domain_fuzz_disk, qemu_xml_domain_fuzz_interface) +* QEMU XML Hotplug (qemu_xml_hotplug_fuzz) +* CH XML Domain (ch_xml_domain_fuzz) +* VMX XML Domain (vmx_xml_domain_fuzz) +* LibXL XML Domain (libxl_xml_domain_fuzz) +* NWFilter XML (xml_nwfilter_fuzz) + +libprotobuf-mutator +=================== + +libprotobuf-mutator is the crux of our fuzzing methodology that +allows us to perform grammar-aware fuzzing of the XML format in the first +place. However, its setup is a bit involved. The general build and install +instructions can be followed in +https://github.com/google/libprotobuf-mutator/blob/master/README.md +but we will have to tweak it depending on the distro. One of the biggest +problems is that most distros have very outdated versions of protobuf +which will cause various build and linkage issues with the mutator. + +- If you are on a rolling release distro, the system package can likely be + used as-is. However, you may need to pass ``-std=c++17`` in ``CXXFLAGS`` + and ``-Wl,--copy-dt-needed-entries`` in ``LDFLAGS``.\ +- For every other distro with old protobuf installations, you can supply + ``-DLIB_PROTO_MUTATOR_DOWNLOAD_PROTOBUF=ON`` during libprotobuf-mutator + setup. After this, provide ``-Dexternal_protobuf_dir=<dir>`` to libvirt + meson setup pointing to the ``external.protobuf`` directory generated + during libprotobuf-mutator compilation. +- On some distros like Fedora which predominantly use PIC compiled + libraries, you may need to pass ``-fPIC`` in ``CFLAGS/CXXFLAGS`` or you + will encounter relocation errors during libvirt compilation. + +Setup +===== + +:: + + env CC=clang CXX=clang++ \ + meson setup build -Dsystem=true -Ddriver_qemu=enabled -Db_lundef=false \ + -Db_sanitize=address,undefined -Dfuzz=enabled -Dexternal_protobuf_dir=<dir> + +- This command line will introduce LLVM SanitizerCoverage across all + object files. +- libFuzzer is supported only on clang/clang++. +- To use an external protobuf dependency, use + ``-Dexternal_protobuf_dir=<dir>``. If your system has a new enough protobuf + dependency, you can ignore this. +- ``b_sanitize`` is not compulsory but it does improve the odds of the fuzzer + finding interesting test cases. It is recommended to pass + ``address,undefined`` to enable both ASAN and UBSan. Note that ASAN will + cut your performance by a factor of 2 on average. +- You can set ``b_sanitize`` to ``thread`` to enable TSAN which is useful for + fuzzing race conditions in the ``qemu_xml_hotplug_fuzz`` fuzzer especially. + +NOTE: This has only been tested on x86_64 and aarch64 Linux, but should work +identically on other architectures and possibly even other UNIX based OSes +(BSD, macOS, etc.). + +Usage +===== + +Run ``./tests/fuzz/run_fuzz <fuzzer>``. + +If the fuzzer finds a crashing test case, it will dump a separate file in your +working directory. Run +``./tests/fuzz/run_fuzz <fuzzer> --testcase <file_name>`` to reproduce the crash. +More options to configure the fuzzer can be found with the ``-h`` flag. To save/ +load a corpus, add ``--corpus <corpus_dir>``. + +To merge or minimize corpuses, run +:: + ./tests/fuzz/run_fuzz <fuzzer> --libfuzzer-options="-merge=1 <dest_corpus> <src_corpus>" + +Notable options are listed below. + +- ``--arch``: Set architecture of the domain XML to fuzz. +- ``-j, --jobs``: Run parallel fuzzing workers using either ``jobs`` or + ``fork`` based on ``--parallel-mode``. Eg: + ``./tests/fuzz/run_fuzz qemu_xml_domain_fuzz -j8 --parallel-mode fork``. +- ``--dump-xml``: Print all fuzzed XMLs (useful for debugging reproducers) +- ``--format-xml``: Exercise format function on XML domain fuzzers. +- ``--corpus``: Save or use corpus on-disk. +- ``--libfuzzer-options``: Pass additional libFuzzer flags as documented in + https://llvm.org/docs/LibFuzzer.html#options. + +Coverage Report +=============== + +- libvirt supports instrumenting builds with gcov for coverage data collection + using ``-Dtest_coverage=true``. +:: + + ./tests/fuzz/run_fuzz <fuzzer> --total_time=<duration> --corpus=<corpus_dir> + ./tests/fuzz/run_fuzz <fuzzer> --corpus=<corpus_dir> --libfuzzer-options="-runs=0" + find -name '*.gcda' -exec llvm-cov gcov {} \; # Run in build directory + gcovr --gcov-executable "llvm-cov gcov" --html-details coverage.html -r <source_directory> + +- Alternatively, we can use clang profile coverage instrumentation + enabled with ``-Dtest_coverage_clang=true``. +:: + + ./tests/fuzz/run_fuzz <fuzzer> --total_time=<duration> --corpus=<corpus_dir> + ./tests/fuzz/run_fuzz <fuzzer> --corpus=<corpus_dir> --llvm-profile-file=coverage.profraw + llvm-profdata merge coverage.profraw -output coverage.profdata + llvm-cov show --instr-profile coverage.profdata <objects> --sources <sources> --format html > coverage.html + +Tips +==== + +- libFuzzer will try to pass comparison checks using its internal TORC + (Table of Recent Comparisons), but this can get easily overwhelmed in the + case of libvirt due to its code being quite complex. You can alleviate + this to some extent by passing ``--use-value-profile`` to the fuzzer. +- If you want the fuzzer to proceed even after encountering a crash, + add ``-j<N> --parallel-mode=fork``. Do note that the memory usage will + increase exponentially with each parallel fuzzing worker. -- 2.34.1