playing with MSan, was Re: [PATCH 0/3] fixes for running the test suite with --valgrind

Jeff King <peff@xxxxxxxx> · Wed, 4 Oct 2017 06:19:32 -0400

On Tue, Oct 03, 2017 at 07:41:54PM -0400, Jeff King wrote:

> I think using SANITIZE=memory would catch these, but it needs some
> suppressions tuning. The weird "zlib reads uninitialized memory" error
> is a problem (valgrind sees this, too, but we have suppressions).

I dug into this a little more. You can blacklist certain functions from
getting MSan treatment, but that's not quite what we want. We want to
mark bytes from certain _sources_ as being initialized, even if MSan
doesn't agree.

And indeed, you can do that. As far as I can tell, MSan works by keeping
a shadow map of memory and setting flags when it believes it has been
initialized, and then checking that map when we make decisions based on
the memory. But it can only do that if it instruments all writes. So the
MSan documentation recommends that you build _everything_, including
libraries, with it. Which obviously we don't do if we're using a system
zlib. Or a system libc for that matter (though they intercept many
common libc functions to handle this).

So one strategy is to "cheat" a bit at the library interfaces, and claim
whatever they send us is properly initialized. The patch below tries
that with zlib, and it does seem to work. It would fail to notice a real
problem with any input we send _to_ the library (since the library isn't
instrumented, and we claim that whatever comes out of it is legitimate).
I could probably live with that.

But there are quite a few test failures that would still need
investigating and annotating:

  - Certainly it's confused by looking at regmatch_t results from
    regexec(). We can fix that by building with NO_REGEX. But pcre has
    a similar problem.

  - Ditto curl and openssl, whose exit points would need annotations.

  - For some reason test-sigchain segfaults when it raise()s in the
    signal handler and recurses. Not sure if this is an MSan bug or
    what.

So I dunno. This approach is a _lot_ more convenient than trying to
rebuild all the dependencies from scratch, and it runs way faster than
valgrind. It did find the cases that led to the patches in this
series, and at least one more: if the lstat() at the end of
entry.c:write_entry() fails, we write nonsense into the cache_entry.

I think we could probably get it to zero false positives without _too_
much effort. I'll stop here for tonight, but I may pick it up again
later (of course anybody else is welcome to fool around with it, too).

Below is the patch that let me run:

  make SANITIZE=memory CC=clang-6.0 NO_REGEX=1

and get a tractable number of errors.

-- >8 --

diff --git a/Makefile b/Makefile
index b143e4eea3..1da5c01211 100644
--- a/Makefile
+++ b/Makefile
@@ -1047,6 +1047,9 @@ endif
 ifneq ($(filter leak,$(SANITIZERS)),)
 BASIC_CFLAGS += -DSUPPRESS_ANNOTATED_LEAKS
 endif
+ifneq ($(filter memory,$(SANITIZERS)),)
+BASIC_CFLAGS += -DENABLE_MSAN_UNPOISON
+endif
 endif
 
 ifndef sysconfdir
diff --git a/git-compat-util.h b/git-compat-util.h
index cedad4d581..836a4c0b54 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1191,4 +1191,11 @@ extern void unleak_memory(const void *ptr, size_t len);
 #define UNLEAK(var) do {} while (0)
 #endif
 
+#ifdef ENABLE_MSAN_UNPOISON
+#include <sanitizer/msan_interface.h>
+#define msan_unpoison(ptr, len) __msan_unpoison(ptr, len)
+#else
+#define msan_unpoison(ptr, len) do {} while (0)
+#endif
+
 #endif
diff --git a/zlib.c b/zlib.c
index 4223f1a8c5..5fa8f12507 100644
--- a/zlib.c
+++ b/zlib.c
@@ -56,6 +56,8 @@ static void zlib_post_call(git_zstream *s)
 	if (s->z.total_in != s->total_in + bytes_consumed)
 		die("BUG: total_in mismatch");
 
+	msan_unpoison(s->next_out, bytes_produced);
+
 	s->total_out = s->z.total_out;
 	s->total_in = s->z.total_in;
 	s->next_in = s->z.next_in;