Re: UNLEAK(), leak checking in the default tests etc.

Jeff King <peff@xxxxxxxx> · Thu, 10 Jun 2021 09:38:21 -0400

On Thu, Jun 10, 2021 at 12:56:55PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > More significantly: I get the impression it's easier to do leak
> > checking using LSAN, which requires recompiling git anyway - at which
> > point you get the flag for free - so how often will people actually
> > perform leak checking with Valgrind in the first place?
> 
> *Nod*, I didn't investigate the runtime penalty you and Jeff point
> out. In any case, it seems that can also be done with valgrind exclusion
> rules and/or manually ignoring these cases in the test wrapper.

I had trouble using valgrind's exclusions; there's more discussion in
0e5bba53af (add UNLEAK annotation for reducing leak false positives,
2017-09-08), but the gist of it is that it's awkward to annotate the
point of leak, rather than the point of allocation (so you have to
provide the complete callstack to the allocation, which is a maintenance
headache).

Of course, if you find ways to make useful annotations with valgrind,
I'm all for it. We have a few in t/valgrind already.

> But maybe it's not even worth pursuing. Have you (or anyone else) tried
> e.g. benchmarking git's tests or t/perf tests where free() is defined to
> be some noop stub? I'd expect it not to matter, but maybe I'm wrong...

I haven't. Even though I originated UNLEAK(), I'm not really all that
concerned about the cost of free() in general. My motivation for
introducing it (versus adding free() calls) was mostly about convenience
(complex data structures that don't have an easy free/release function,
but also the fact that you can still access data after marking it with
unleak).

The fact that it also preempts any arguments about the performance of
calling free() was just a bonus. ;)

To be clear, I could easily be convinced by real numbers that the cost
of free() at program end matters. I am just saying I am not one of the
people who is going to argue that position in the meantime.

> I didn't know how to set that up, that seems easy enough.
> 
> This works for me:
> 
>     make CC=clang SANITIZE=address,leak CFLAGS="-00 -g"
>     (cd t && make ASAN_OPTIONS="<what you said>" [...])
> 
> I.e. it's just SANITIZE & flags that's important at compile-time. You
> doubtless knew that, mainly for my own notes & others following along.

It should Just Work with:

  make SANITIZE=leak test

for both gcc and clang. You do need ASAN_OPTIONS if you're asking ASan
to do leak-checking (since we usually suppress that for the obvious
reason that almost every test fails). I'm not sure if using both ASan
and LSan together confuses LSan there (if so, it may be reasonable for
test-lib.sh to modify its ASAN_OPTIONS setting if LSan is enabled).

> I ran it, noted the failing tests, produced a giant GIT_SKIP_TESTS list
> and hacked ci/ to run that as a new linux-clang-SANITIZE job. That messy
> WIP code is currently running at:
> https://github.com/avar/git/runs/2793150092
> 
> Wouldn't it be a good idea to have such a job and slowly work on the
> exclusion list?
> 
> E.g. I saw that t0004 failed, which was trivially fixed with a single
> strbuf_release(), and we could guard against regressions.

I don't mind that. My intent was to get the whole suite clean
eventually, and then start worrying about regressions. But that may take
a while.

I do think it would be worth splitting out ASan from leak-checking. The
whole suite should run clean with regular ASan already, and we'd want to
find regressions there even in the tests that aren't leak-clean. I do
periodic ASan runs already; the main argument against doing it for every
CI run is just that's a lot more CPU. But maybe not enough to be
prohibitive? It's probably still way cheaper than running the test suite
on Windows.

-Peff