This series fixes a small number of issues found when running git's test-suite with MSAN (MemorySanitizer: a clang sanitizer that tries to detect reads from uninitialised memory [2]). To summarise: I think there's one real bug, one theoretical bug where compiler nevertheless produce working code, and one false-positive that we can easily suppress. Getting the test suite to run under MSAN is a bit trickier than simply adding SANITIZERS=memory, I've detailed the reasons and the process I'm using below. Unfortunately this series is also not sufficient to make the whole test suite pass when building with MSAN: * t0005-sigchain and t7006-pager fail with an infinite loop inside MSAN's signal handling interceptors. I think this is a bad interaction between git's signal handling and MSAN's interceptors, and I suspect it's not indicative of a bug in git itself - but I haven't investigated in detail yet. * t3206-range-diff, t4013-diff-various, t4018-diff-funcname all fail due to a change in diff output. I can reproduce this issue when running with TSAN (but not ASAN or UBSAN), which suggests a bug or difference in behaviour in code shared between MSAN and TSAN - similarly, I haven't investigated in all that much detail yet. (These issues were seen when running with clang-11 - the next step is to test with clang built from main) As to the tricky part: MSAN tries to detect reads from uninitialised memory at runtime. However you need to ensure that all code performing initialisation is built with the right instrumentation (i.e. -fsanitize=memory). So you'll immediately run into issues if you link against libraries provided by your system (with the exception of libc, as MSAN provides some default interceptors for most of libc). In theory you should rebuild all dependencies with -fsanitize=memory, although I discovered that it's sufficient to recompile only zlib + link git against that copy of zlib (which not a very tricky thing to do). Doing this will uncover one intentional read from uninitialised memory inside zlib itself. This can be worked around with an annotation in zlib (which I'm trying to submit upstream at [1]) - but it's also possible to define an override list at compile time - I've detailed this in my recipe below). My recipe for running git tests against MSAN: 1. Grab zlib sources from zlib.net or github.com/madler/zlib , I used zlib 1.2.11 (which is also what most systems seem to ship). 2. Create a sanitizers special cast list (named e.g. ignorelist.txt) containing "fun:slide_hash" (this is only needed as long as zlib doesn't contain [1]). 3. Build zlib, installing it into SOME_PREFIX (I happened to use clang, but that might not be necessary): CC=clang-11 CFLAGS="-fsanitize=memory -fno-sanitize-recover=memory -fsanitize-ignorelist=YOUR_IGNORELIST_FROM_STEP_2" ./configure && make install prefix=$SOME_PREFIX 4. Build git and run the tests (again, I'm using clang, but gcc might be OK too): make ZLIB_PATH=$SOME_PREFIX CC=clang-11 SANITIZERS=memory test If you're actively trying to understand and fix issues, I also recommend adding -fsanitize-memory-track-origins (which points you directly to where the uninitialised memory comes from), see also further docs at [2]. ATB, Andrzej [1] https://github.com/madler/zlib/pull/561 [2] https://clang.llvm.org/docs/MemorySanitizer.html Andrzej Hunt (3): bulk-checkin: make buffer reuse more obvious and safer split-index: use oideq instead of memcmp to compare object_id's builtin/checkout--worker: memset struct to avoid MSAN complaints builtin/checkout--worker.c | 11 +++++++++++ bulk-checkin.c | 3 +-- split-index.c | 3 ++- 3 files changed, 14 insertions(+), 3 deletions(-) base-commit: 62a8d224e6203d9d3d2d1d63a01cf5647ec312c9 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1033%2Fahunt%2Fmsan-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1033/ahunt/msan-v1 Pull-Request: https://github.com/git/git/pull/1033 -- gitgitgadget