[PATCH 14/14] docs: Document the fuzzers

Rayhan Faizel <rayhan.faizel@xxxxxxxxx> · Mon, 19 Aug 2024 21:39:52 +0530

Document the fuzzers in two ways.

1. Explain the high level working of the fuzzers under docs/kbase.
2. Add README to explain general setup of the fuzzer and its usage.

Signed-off-by: Rayhan Faizel <rayhan.faizel@xxxxxxxxx>
---
 docs/kbase/index.rst                 |   3 +
 docs/kbase/internals/meson.build     |   1 +
 docs/kbase/internals/xml-fuzzing.rst | 120 ++++++++++++++++++++++++
 tests/fuzz/README.rst                | 131 +++++++++++++++++++++++++++
 4 files changed, 255 insertions(+)
 create mode 100644 docs/kbase/internals/xml-fuzzing.rst
 create mode 100644 tests/fuzz/README.rst

diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst
index e51b35cbfc..9cf6268800 100644
--- a/docs/kbase/index.rst
+++ b/docs/kbase/index.rst
@@ -116,3 +116,6 @@ Internals
 
 `QEMU monitor event handling <internals/qemu-event-handlers.html>`__
    Brief outline how events emitted by qemu on the monitor are handlded.
+
+`XML Fuzzing <internals/xml-fuzzing.html>`__
+   Working of the structure-aware XML fuzzers.
diff --git a/docs/kbase/internals/meson.build b/docs/kbase/internals/meson.build
index f1e9122f8f..86b6639419 100644
--- a/docs/kbase/internals/meson.build
+++ b/docs/kbase/internals/meson.build
@@ -9,6 +9,7 @@ docs_kbase_internals_files = [
   'qemu-migration',
   'qemu-threads',
   'rpc',
+  'xml-fuzzing',
 ]
 
 
diff --git a/docs/kbase/internals/xml-fuzzing.rst b/docs/kbase/internals/xml-fuzzing.rst
new file mode 100644
index 0000000000..85f565fda5
--- /dev/null
+++ b/docs/kbase/internals/xml-fuzzing.rst
@@ -0,0 +1,120 @@
+===================
+Libvirt XML fuzzing
+===================
+
+XML fuzzing is done using libFuzzer and libprotobuf-mutator. XML fuzzing
+cannot be done with normal fuzzing methods, as XML is a highly structured
+format. Structure-aware fuzzing is implemented using libprotobuf-mutator which
+mutates and fuzzes protobuf inputs. Protobufs are used as an intermediate
+format and serialized to XML.
+
+Protobuf to XML representation
+==============================
+
+A protobuf definition written to fuzz libvirt XML formats may resemble the
+following.
+
+::
+
+    message MainObj {
+        message SomeTagMessage {
+            optional uint32 A_number = 1;
+            optional DummyString A_name = 2;
+
+            enum typeEnum {
+                typeA = 0;
+                typeB = 1;
+                typeC = 2;
+            }
+
+            optional typeEnum A_type = 3;
+
+            message InnerTagMessage {
+                optional uint32 A_number = 1;
+            }
+
+            repeated InnerTagMessage T_innertag = 4;
+
+            message SecondInnerTagMessage {
+                optional uint32 V_value = 1;
+            }
+            optional SecondInnerTagMessage T_secondinner = 5;
+        }
+
+        optional SomeTagMessage T_sometag = 1;
+    }
+
+* Fields starting with ``T_`` represent XML tags. Their types are protobuf messages
+  which may further contain other protobuf-defined XML tags or attributes.
+
+* Fields starting with ``A_`` represent XML attributes. Most of the time,
+  it uses one of the primitive datatypes (Eg: ``uint32``, ``bool``, ``enum``, etc. ) available in protobuf.
+
+  * If the attribute can take multiple data types, it is encapsulated in a ``oneof`` statement.
+    The field name also has a prefix of ``A_OPTXX_`` where ``XX`` is a number between 0 to 99.
+  * If the attribute name contains special characters, the real name is stored in
+    ``libvirt::real_name`` which is extended by ``FieldOptions``.
+  * If an enum value contains special characters, the real value is stored in
+    ``libvirt::real_value`` which is extended by ``EnumValueOptions``.
+
+* Fields starting with ``V_`` represent raw text in XML.
+
+  * If ``T_`` and ``V_`` fields are defined in the same message, ``V_`` fields
+    will be preferred only if it has presence, otherwise it will process the
+    rest of the ``T`` fields as usual.
+  * ``V_`` fields can take on the same datatypes as ``A_`` fields.
+
+* ``repeated`` is used to allow multiple XML tags of the same name.
+
+``A_`` fields must always precede ``V_`` and ``T_`` fields. Likewise, ``V_``
+fields must precede ``T_`` fields if any.
+
+On fuzzing the above protobuf definition, one of the possible protobuf to XML
+serializations could be
+
+::
+
+    <sometag number='1' name='dummy' type='typeB'>
+        <innertag number='2'/>
+        <innertag number='3'/>
+        <secondinner>1241232</secondinner>
+    </sometag>
+
+Custom Protobuf Datatypes
+-------------------------
+
+Sometimes, primitive data types or enums are not enough to encode the
+desired attribute values, especially if they themselves are structured. In this
+case, such fields are represented by a handwritten protobuf message defined in
+``xml_domain_datatypes.proto``. To serialize these messages to XML attribute
+values, custom handlers are defined in ``proto_custom_datatypes.cc``.
+
+This is useful for data types such as IP addresses, MAC addresses, target
+device names, etc.
+
+Protobuf generation
+===================
+
+``proto`` files are automatically generated on compile-time using the script
+``relaxng_to_proto.py``. The script parses relaxng schemas to generate a protobuf
+file containing fields and messages representing all the defined XML tags and
+attributes.
+
+The script tries to figure out the correct datatype of the XML attribute.
+However, on its own it can only figure out the general datatype or enum values
+of the attribute but not the constraints or regex patterns. Some override tables
+are present to improve upon that.
+
+Fuzzer Harnesses
+================
+
+Driver-specific harnesses in general re-use the existing test driver setup
+as well as other existing test utilities under ``tests/``. Harnesses are
+available for the following drivers:
+
+* QEMU XML Domain
+* QEMU XML Hotplug
+* CH XML Domain
+* VMX XML Domain
+* libXL XML Domain
+* NWFilter XML
diff --git a/tests/fuzz/README.rst b/tests/fuzz/README.rst
new file mode 100644
index 0000000000..d92cdc94d7
--- /dev/null
+++ b/tests/fuzz/README.rst
@@ -0,0 +1,131 @@
+=======
+Fuzzing
+=======
+
+The XML fuzzing project was built as part of Google Summer of Code 2024.
+The fuzzing project aims to find edge-case XML configurations that may crash
+libvirt during parsing. The libvirt domain XML format is a highly structured
+grammar so normal methods of fuzzing will not work. We use a combination
+of libFuzzer and libprotobuf-mutator to perform structure-aware fuzzing of
+various libvirt XML formats. The XML is represented through an intermediate
+protobuf that is mutated by libprotobuf-mutator. This protobuf is automatically
+generated by a Python script ``relaxng_to_proto.py`` which parses relaxNG
+schemas.
+
+Currently, we fuzz the following:
+
+* QEMU XML Domain (qemu_xml_domain_fuzz, qemu_xml_domain_fuzz_disk, qemu_xml_domain_fuzz_interface)
+* QEMU XML Hotplug (qemu_xml_hotplug_fuzz)
+* CH XML Domain (ch_xml_domain_fuzz)
+* VMX XML Domain (vmx_xml_domain_fuzz)
+* LibXL XML Domain (libxl_xml_domain_fuzz)
+* NWFilter XML (xml_nwfilter_fuzz)
+
+libprotobuf-mutator
+===================
+
+libprotobuf-mutator is the crux of our fuzzing methodology that
+allows us to perform grammar-aware fuzzing of the XML format in the first
+place. However, its setup is a bit involved. The general build and install
+instructions can be followed in
+https://github.com/google/libprotobuf-mutator/blob/master/README.md
+but we will have to tweak it depending on the distro. One of the biggest
+problems is that most distros have very outdated versions of protobuf
+which will cause various build and linkage issues with the mutator.
+
+-  If you are on a rolling release distro, the system package can likely be
+   used as-is. However, you may need to pass ``-std=c++17`` in ``CXXFLAGS``
+   and ``-Wl,--copy-dt-needed-entries`` in ``LDFLAGS``.\
+-  For every other distro with old protobuf installations, you can supply
+   ``-DLIB_PROTO_MUTATOR_DOWNLOAD_PROTOBUF=ON`` during libprotobuf-mutator
+   setup. After this, provide ``-Dexternal_protobuf_dir=<dir>`` to libvirt
+   meson setup pointing to the ``external.protobuf`` directory generated
+   during libprotobuf-mutator compilation.
+-  On some distros like Fedora which predominantly use PIC compiled
+   libraries, you may need to pass ``-fPIC`` in ``CFLAGS/CXXFLAGS`` or you
+   will encounter relocation errors during libvirt compilation.
+
+Setup
+=====
+
+::
+
+    env CC=clang CXX=clang++ \
+    meson setup build -Dsystem=true -Ddriver_qemu=enabled -Db_lundef=false \
+                                    -Db_sanitize=address,undefined -Dfuzz=enabled -Dexternal_protobuf_dir=<dir>
+
+- This command line will introduce LLVM SanitizerCoverage across all
+  object files.
+- libFuzzer is supported only on clang/clang++.
+- To use an external protobuf dependency, use
+  ``-Dexternal_protobuf_dir=<dir>``. If your system has a new enough protobuf
+  dependency, you can ignore this.
+- ``b_sanitize`` is not compulsory but it does improve the odds of the fuzzer
+  finding interesting test cases. It is recommended to pass
+  ``address,undefined`` to enable both ASAN and UBSan. Note that ASAN will
+  cut your performance by a factor of 2 on average.
+- You can set ``b_sanitize`` to ``thread`` to enable TSAN which is useful for
+  fuzzing race conditions in the ``qemu_xml_hotplug_fuzz`` fuzzer especially.
+
+NOTE: This has only been tested on x86_64 and aarch64 Linux, but should work
+identically on other architectures and possibly even other UNIX based OSes
+(BSD, macOS, etc.).
+
+Usage
+=====
+
+Run ``./tests/fuzz/run_fuzz <fuzzer>``.
+
+If the fuzzer finds a crashing test case, it will dump a separate file in your
+working directory. Run
+``./tests/fuzz/run_fuzz <fuzzer> --testcase <file_name>`` to reproduce the crash.
+More options to configure the fuzzer can be found with the ``-h`` flag. To save/
+load a corpus, add ``--corpus <corpus_dir>``.
+
+To merge or minimize corpuses, run
+::
+  ./tests/fuzz/run_fuzz <fuzzer> --libfuzzer-options="-merge=1 <dest_corpus> <src_corpus>"
+
+Notable options are listed below.
+
+- ``--arch``: Set architecture of the domain XML to fuzz.
+- ``-j, --jobs``: Run parallel fuzzing workers using either ``jobs`` or
+  ``fork`` based on ``--parallel-mode``. Eg:
+  ``./tests/fuzz/run_fuzz qemu_xml_domain_fuzz -j8 --parallel-mode fork``.
+- ``--dump-xml``: Print all fuzzed XMLs (useful for debugging reproducers)
+- ``--format-xml``: Exercise format function on XML domain fuzzers.
+- ``--corpus``: Save or use corpus on-disk.
+- ``--libfuzzer-options``: Pass additional libFuzzer flags as documented in
+  https://llvm.org/docs/LibFuzzer.html#options.
+
+Coverage Report
+===============
+
+-  libvirt supports instrumenting builds with gcov for coverage data collection
+   using ``-Dtest_coverage=true``.
+::
+
+    ./tests/fuzz/run_fuzz <fuzzer> --total_time=<duration> --corpus=<corpus_dir>
+    ./tests/fuzz/run_fuzz <fuzzer> --corpus=<corpus_dir> --libfuzzer-options="-runs=0"
+    find -name '*.gcda' -exec llvm-cov gcov {} \;  # Run in build directory
+    gcovr --gcov-executable "llvm-cov gcov" --html-details coverage.html -r <source_directory>
+
+-  Alternatively, we can use clang profile coverage instrumentation
+   enabled with ``-Dtest_coverage_clang=true``.
+::
+
+    ./tests/fuzz/run_fuzz <fuzzer> --total_time=<duration> --corpus=<corpus_dir>
+    ./tests/fuzz/run_fuzz <fuzzer> --corpus=<corpus_dir> --llvm-profile-file=coverage.profraw
+    llvm-profdata merge coverage.profraw -output coverage.profdata
+    llvm-cov show --instr-profile coverage.profdata <objects> --sources <sources> --format html > coverage.html
+
+Tips
+====
+
+-  libFuzzer will try to pass comparison checks using its internal TORC
+   (Table of Recent Comparisons), but this can get easily overwhelmed in the
+   case of libvirt due to its code being quite complex. You can alleviate
+   this to some extent by passing ``--use-value-profile`` to the fuzzer.
+-  If you want the fuzzer to proceed even after encountering a crash,
+   add ``-j<N> --parallel-mode=fork``. Do note that the memory usage will
+   increase exponentially with each parallel fuzzing worker.
-- 
2.34.1