Re: [PATCH v2 1/4] tests: add a test mode for SANITIZE=leak, run it in CI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 14 2021, Andrzej Hunt wrote:

> On 14/07/2021 19:23, Ævar Arnfjörð Bjarmason wrote:
>> While git can be compiled with SANITIZE=leak there has been no
>> corresponding GIT_TEST_* mode for it, i.e. memory leaks have been
>> fixed as one-offs without structured regression testing.
>> This change add such a mode, we now have new
>> linux-{clang,gcc}-sanitize-leak CI targets, these targets run the same
>> tests as linux-{clang,gcc}, except that almost all of them are
>> skipped.
>> There is a whitelist of some tests that are OK in test-lib.sh, and
>> individual tests can be opted-in by setting
>> GIT_TEST_SANITIZE_LEAK=true before sourcing test-lib.sh. Within those
>> individual test can be skipped with the "!SANITIZE_LEAK"
>> prerequisite. See the updated t/README for more details.
>> I'm using the GIT_TEST_SANITIZE_LEAK=true and !SANITIZE_LEAK pattern
>> in a couple of tests whose memory leaks I'll fix in subsequent
>> commits.
>> I'm not being aggressive about opting in tests, it's not all tests
>> that currently pass under SANITIZE=leak, just a small number of
>> known-good tests. We can add more later as we fix leaks and grow more
>> confident in this test mode.
>> See the recent discussion at [1] about the lack of this sort of test
>> mode, and 0e5bba53af (add UNLEAK annotation for reducing leak false
>> positives, 2017-09-08) for the initial addition of SANITIZE=leak.
>> See also 09595ab381 (Merge branch 'jk/leak-checkers', 2017-09-19),
>> 7782066f67 (Merge branch 'jk/apache-lsan', 2019-05-19) and the recent
>> 936e58851a (Merge branch 'ah/plugleaks', 2021-05-07) for some of the
>> past history of "one-off" SANITIZE=leak (and more) fixes.
>> When calling maybe_skip_all_sanitize_leak matching against
>> "$TEST_NAME" instead of "$this_test" as other "match_pattern_list()"
>> users do is intentional. I'd like to match things like "t13*config*"
>> in subsequent commits. This part of the API isn't public, so we can
>> freely change it in the future.
>> 1. https://lore.kernel.org/git/87czsv2idy.fsf@xxxxxxxxxxxxxxxxxxx/
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
>> ---
>>   .github/workflows/main.yml |  6 ++++
>>   Makefile                   |  5 ++++
>>   ci/install-dependencies.sh |  4 +--
>>   ci/lib.sh                  | 18 ++++++++----
>>   ci/run-build-and-tests.sh  |  4 +--
>>   t/README                   | 16 ++++++++++
>>   t/t5701-git-serve.sh       |  2 +-
>>   t/test-lib.sh              | 60 ++++++++++++++++++++++++++++++++++++++
>>   8 files changed, 105 insertions(+), 10 deletions(-)
>> diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
>> index 73856bafc9..752fe187f9 100644
>> --- a/.github/workflows/main.yml
>> +++ b/.github/workflows/main.yml
>> @@ -297,6 +297,12 @@ jobs:
>>             - jobname: linux-gcc-default
>>               cc: gcc
>>               pool: ubuntu-latest
>> +          - jobname: linux-clang-sanitize-leak
>> +            cc: clang
>> +            pool: ubuntu-latest
>> +          - jobname: linux-gcc-sanitize-leak
>> +            cc: gcc
>> +            pool: ubuntu-latest
>
> Is there any advantage to running leak checking with both gcc and
> clang? My understanding is that you end up using the same sanitiser 
> implementation under the hood - I can't remember if using a different
> compiler actually helps find different leaks though.

I didn't know that, makes sense. I'll make it one job and have it use
whatever CC is.

> My other question is: if we are adding a new job - should it really be
> just a leak checking job? Leak checking is just a subset of ASAN 
> (Address Sanitizer). And as discussed at [1] it's possible to run ASAN
> and UBSAN (Undefined Behaviour Sanitizer) in the same build. I feel
> like it's much more useful to first add a combined ASAN+UBSAN job,
> followed by enabling leak-checking as part of ASAN in those jobs for
> known leak-free tests - as opposed to only adding leak checking. We
> currently disable Leak checking for ASAN here [2], but that could be
> made conditional on the test ID (i.e. check an allowlist to enable
> leak checking for some tests)?

It sounds good to support that, but at least right now I've got the itch
of finding leaks during development, and I think in any case being able
to do a full run with just sanitizing, leak checking (or combined) makes
sense, i.e. to make GIT_TEST_SANITIZE_LEAK=* the top-level interface.

I haven't checked how noisy ASAN is, is it like the leak checking where
we fail almost all tests now?

Anyway, once we have some test mode like this it'll be trivial to extend
it. I mainly want us to get this into CI so we can have an expanding
line in the sand with regressions.

> I think it's worth focusing on ASAN+UBSAN first because they tend to
> find more impactful issues (e.g. buffer overflows, and other real
> bugs) - whereas leaks... are ugly, but leaks in git don't actually
> have much user impact?

We have one-off commands, but also long-lived things like "git cat-file
--batch", it's useful if we don't leak in those.

The entry point to those tends to be one-off commands in tests, so
checking leaks for all commands in a test (if you can get there) is a
useful indicator for how the underlying API performs.

I think in the git.git codebase we don't have much of an issue with
buffer overflows etc, because we tend to consistently use APIs like
strbuf that avoid those issues, but then again I haven't run the tests
with that, maybe I'll be unpleasantly surprised.

I also find leak checking to be useful during development to spot faulty
assumptions, i.e. the leak itself may not be a big deal, but it's
usually an early sign that I'm structuring something incorrectly.

> [1]
> https://lore.kernel.org/git/YMI%2Fg1sHxJgb8%2FYD@xxxxxxxxxxxxxxxxxxxxxxx/
>
> [2] https://git.kernel.org/pub/scm/git/git.git/tree/t/test-lib.sh#n44
>
>>       env:
>>         CC: ${{matrix.vector.cc}}
>>         jobname: ${{matrix.vector.jobname}}
>> diff --git a/Makefile b/Makefile
>> index 502e0c9a81..d4cad5136f 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -1216,6 +1216,9 @@ PTHREAD_CFLAGS =
>>   SPARSE_FLAGS ?=
>>   SP_EXTRA_FLAGS = -Wno-universal-initializer
>>   +# For informing GIT-BUILD-OPTIONS of the SANITIZE=leak target
>> +SANITIZE_LEAK =
>> +
>>   # For the 'coccicheck' target; setting SPATCH_BATCH_SIZE higher will
>>   # usually result in less CPU usage at the cost of higher peak memory.
>>   # Setting it to 0 will feed all files in a single spatch invocation.
>> @@ -1260,6 +1263,7 @@ BASIC_CFLAGS += -DSHA1DC_FORCE_ALIGNED_ACCESS
>>   endif
>>   ifneq ($(filter leak,$(SANITIZERS)),)
>>   BASIC_CFLAGS += -DSUPPRESS_ANNOTATED_LEAKS
>> +SANITIZE_LEAK = YesCompiledWithIt >   endif
>>   ifneq ($(filter address,$(SANITIZERS)),)
>>   NO_REGEX = NeededForASAN
>> @@ -2793,6 +2797,7 @@ GIT-BUILD-OPTIONS: FORCE
>>   	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
>>   	@echo PAGER_ENV=\''$(subst ','\'',$(subst ','\'',$(PAGER_ENV)))'\' >>$@+
>>   	@echo DC_SHA1=\''$(subst ','\'',$(subst ','\'',$(DC_SHA1)))'\' >>$@+
>> +	@echo SANITIZE_LEAK=\''$(subst ','\'',$(subst ','\'',$(SANITIZE_LEAK)))'\' >>$@+
>>   	@echo X=\'$(X)\' >>$@+
>>   ifdef TEST_OUTPUT_DIRECTORY
>>   	@echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@+
>> diff --git a/ci/install-dependencies.sh b/ci/install-dependencies.sh
>> index 67852d0d37..8ac72d7246 100755
>> --- a/ci/install-dependencies.sh
>> +++ b/ci/install-dependencies.sh
>> @@ -12,13 +12,13 @@ UBUNTU_COMMON_PKGS="make libssl-dev libcurl4-openssl-dev libexpat-dev
>>    libemail-valid-perl libio-socket-ssl-perl libnet-smtp-ssl-perl"
>>     case "$jobname" in
>> -linux-clang|linux-gcc)
>> +linux-clang|linux-gcc|linux-clang-sanitize-leak|linux-gcc-sanitize-leak)
>
> How about `linux-clang*|linux-gcc*)` here and below?

I did that in v1, as Đoàn Trần Công Danh pointed out we have other jobs
that would match that.

>>   	sudo apt-add-repository -y "ppa:ubuntu-toolchain-r/test"
>>   	sudo apt-get -q update
>>   	sudo apt-get -q -y install language-pack-is libsvn-perl apache2 \
>>   		$UBUNTU_COMMON_PKGS
>>   	case "$jobname" in
>> -	linux-gcc)
>> +	linux-gcc|linux-gcc-sanitize-leak)
>>   		sudo apt-get -q -y install gcc-8
>>   		;;
>>   	esac
>> diff --git a/ci/lib.sh b/ci/lib.sh
>> index 476c3f369f..bb02b5abf4 100755
>> --- a/ci/lib.sh
>> +++ b/ci/lib.sh
>> @@ -183,14 +183,16 @@ export GIT_TEST_CLONE_2GB=true
>>   export SKIP_DASHED_BUILT_INS=YesPlease
>>     case "$jobname" in
>> -linux-clang|linux-gcc)
>> -	if [ "$jobname" = linux-gcc ]
>> -	then
>> +linux-clang|linux-gcc|linux-clang-sanitize-leak|linux-gcc-sanitize-leak)
>> +	case "$jobname" in
>> +	linux-gcc|linux-gcc-sanitize-leak)
>>   		export CC=gcc-8
>>   		MAKEFLAGS="$MAKEFLAGS PYTHON_PATH=/usr/bin/python3"
>> -	else
>> +		;;
>> +	*)
>>   		MAKEFLAGS="$MAKEFLAGS PYTHON_PATH=/usr/bin/python2"
>> -	fi
>> +		;;
>> +	esac
>>     	export GIT_TEST_HTTPD=true
>>   @@ -233,4 +235,10 @@ linux-musl)
>>   	;;
>>   esac
>>   +case "$jobname" in
>> +linux-clang-sanitize-leak|linux-gcc-sanitize-leak)
>> +	export SANITIZE=leak
>> +	;;
>> +esac
>> +
>
> Have you considered doing this in the yaml job configuration instead?
> It's possible to set env-vars in yaml, although it will require some 
> careful tweaking - here's an example where I'm setting different
> values for SANITIZE depending on job (you'd probably just have to set
> it to empty for the non leak-checking jobs):
>
> https://github.com/ahunt/git/blob/master/.github/workflows/ahunt-sync-next2.yml#L51-L69
>
> That does make the yaml more complex, but I think it's worth it to
> reduce the amount of special-casing elsewhere (and is also worth it if 
> we ever add other sanitisers)?

I'm not too familiar with git.git's ci/* dir, but I think it's like that
in general because we don't just run in GitHub CI, but want to support
Azure, Travis etc.

So almost all of the logic is in those shellscripts, right now this
target just runs in the GitHub CI, but it would be useful to make it
drop-in enabled elsewhere.

I think given that that doing anything overly clever in the YAML would
probably be counter-productive.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux