Re: [PATCH] fuzz: add basic fuzz testing for git command

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 13 Sep 2022 09:13:32 -0700

"Arthur Chan via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

>  .gitignore        |   2 +
>  Makefile          |   2 +
>  fuzz-cmd-base.c   | 117 ++++++++++++++++++++++++++++++++++++++++++++++
>  fuzz-cmd-base.h   |  13 ++++++
>  fuzz-cmd-status.c |  68 +++++++++++++++++++++++++++
>  5 files changed, 202 insertions(+)
>  create mode 100644 fuzz-cmd-base.c
>  create mode 100644 fuzz-cmd-base.h
>  create mode 100644 fuzz-cmd-status.c

Just like we have t/ hierarchy for testing, if we plan to add more
fuzz-* related things on top of what we already have (like those
that can be seen in the context of this patch), I would prefer to
see a creation of fuzz/ hierarchy and move existing stuff there as
the first step before adding more.

And more fuzzing is good, if we can afford it ;-)

Thanks.

Even though I am not taking this patch as-is, let's give a cursory
look to make sure the future iteration can be more reviewable by
pointing out various CodingGuidelines issues.

> diff --git a/fuzz-cmd-base.c b/fuzz-cmd-base.c
> new file mode 100644
> index 00000000000..98f05c78372
> --- /dev/null
> +++ b/fuzz-cmd-base.c
> @@ -0,0 +1,117 @@
> +#include "cache.h"

Good to have this as the first thing.

> +#include "fuzz-cmd-base.h"
> +
> +
> +/*
> + * This function is used to randomize the content of a file with the
> + * random data. The random data normally come from the fuzzing engine
> + * LibFuzzer in order to create randomization of the git file worktree
> + * and possibly messing up of certain git config file to fuzz different
> + * git command execution logic.
> + */
> +void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size) {

Unlike other control structure with multiple statements in a block,
the surrounding braces {} around function block sit on their own
lines.  I.e.

    void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size)
    {

> +   char fname[256];

In our codebase, tab-width is 8 and we indent with tabs.

Use <strbuf.h> and avoid snprintf(), e.g.

	struct strbuf fname = STRBUF_INIT;
	strbuf_addf(&fname, "%s/%s", dir, name);
	... use fname.buf ...
	strbuf_release(&fname);

> +   FILE *fp;
> +

Good that you leave a blank between the end of decl and the
beginning of the statements.

> +   snprintf(fname, 255, "%s/%s", dir, name);
> +
> +   fp = fopen(fname, "wb");
> +   if (fp) {
> +      fwrite(data_chunk, 1, data_size, fp);
> +      fclose(fp);
> +   }
> +}

Why doesn't this care about errors at all?  Not even fopen errors?

> +/*
> + * This function is the variants of the above functions which takes
> + * in a set of target files to be processed. These target file are

"... is a variant of the above function, which takes a set of ..."

> + * passing to the above function one by one for content rewrite.
> + */
> +void randomize_git_files(char *dir, char *name_set[], int files_count, char *data, int size) {
> +   int data_size = size / files_count;
> +
> +   for(int i=0; i<files_count; i++) {

We do not yet officially allow variable decl for for() statement
like this.  We'll start allowing it later this year but we are
waiting for oddball platform/compiler folks to scream right now.

IOW, we write the above more like so:

	int data_size = size / files_count;
	int i;

        for (i = 0; i < files_count; i++) {

Take also notice how we use whitespaces around non-unary operators.

> +      char *data_chunk = malloc(data_size);
> +      memcpy(data_chunk, data + (i * data_size), data_size);
> +      randomize_git_file(dir, name_set[i], data_chunk, data_size);
> +
> +      free(data_chunk);
> +   }

As data_size does not change in this loop and the contents of
data_chunk from each round is discardable, allocating once outside
may make more sense.  Actually, as the called function makes only
read-only accesses of data_chunk, I do not quite see why you need to
make a copy in the first place.

We do not use malloc() etc. directly out of the system; study wrapper.c
and find xmalloc() and friends.

What if size is not a multiple of files_count, by the way?

I'll stop here as we already have plenty above (read: it is not "I
didn't spot any problems in the patch after this point").

Thanks.