Re: [PATCH] fuzz: add basic fuzz testing for git command

Arthur Chan <arthur.chan@xxxxxxxxxxxxx> · Fri, 16 Sep 2022 17:06:10 +0100

On 13/9/2022 5:13 pm, Junio C Hamano wrote:
"Arthur Chan via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

  .gitignore        |   2 +
  Makefile          |   2 +
  fuzz-cmd-base.c   | 117 ++++++++++++++++++++++++++++++++++++++++++++++
  fuzz-cmd-base.h   |  13 ++++++
  fuzz-cmd-status.c |  68 +++++++++++++++++++++++++++
  5 files changed, 202 insertions(+)
  create mode 100644 fuzz-cmd-base.c
  create mode 100644 fuzz-cmd-base.h
  create mode 100644 fuzz-cmd-status.c
Just like we have t/ hierarchy for testing, if we plan to add more
fuzz-* related things on top of what we already have (like those
that can be seen in the context of this patch), I would prefer to
see a creation of fuzz/ hierarchy and move existing stuff there as
the first step before adding more.

And more fuzzing is good, if we can afford it ;-)
Fixed, I move the fuzzer into a new directory oss-fuzz

Thanks.

Even though I am not taking this patch as-is, let's give a cursory
look to make sure the future iteration can be more reviewable by
pointing out various CodingGuidelines issues.

Thanks for the styling suggestion, I have change most of them accordingly.

diff --git a/fuzz-cmd-base.c b/fuzz-cmd-base.c
new file mode 100644
index 00000000000..98f05c78372
--- /dev/null
+++ b/fuzz-cmd-base.c
@@ -0,0 +1,117 @@
+#include "cache.h"
Good to have this as the first thing.

+#include "fuzz-cmd-base.h"
+
+
+/*
+ * This function is used to randomize the content of a file with the
+ * random data. The random data normally come from the fuzzing engine
+ * LibFuzzer in order to create randomization of the git file worktree
+ * and possibly messing up of certain git config file to fuzz different
+ * git command execution logic.
+ */
+void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size) {
Unlike other control structure with multiple statements in a block,
the surrounding braces {} around function block sit on their own
lines.  I.e.

     void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size)
     {


+   char fname[256];
In our codebase, tab-width is 8 and we indent with tabs.

Use <strbuf.h> and avoid snprintf(), e.g.

      struct strbuf fname = STRBUF_INIT;
      strbuf_addf(&fname, "%s/%s", dir, name);
      ... use fname.buf ...
      strbuf_release(&fname);
I have changed all the snprintf code to use strbuf instead. Thanks for
the suggestion.
+   FILE *fp;
+
Good that you leave a blank between the end of decl and the
beginning of the statements.

+   snprintf(fname, 255, "%s/%s", dir, name);
+
+   fp = fopen(fname, "wb");
+   if (fp) {
+      fwrite(data_chunk, 1, data_size, fp);
+      fclose(fp);
+   }
+}
Why doesn't this care about errors at all?  Not even fopen errors?

I have changed the code a little bit, but in general, fail to generate
contents of a file do appear many time during the fuzzing process
because some random fuzzing data result in unexpected behaviour and we
currently just skip that round of fuzzing.
+/*
+ * This function is the variants of the above functions which takes
+ * in a set of target files to be processed. These target file are
"... is a variant of the above function, which takes a set of ..."

+ * passing to the above function one by one for content rewrite.
+ */
+void randomize_git_files(char *dir, char *name_set[], int files_count, char *data, int size) {
+   int data_size = size / files_count;
+
+   for(int i=0; i<files_count; i++) {
We do not yet officially allow variable decl for for() statement
like this.  We'll start allowing it later this year but we are
waiting for oddball platform/compiler folks to scream right now.

IOW, we write the above more like so:

      int data_size = size / files_count;
      int i;

         for (i = 0; i < files_count; i++) {

Take also notice how we use whitespaces around non-unary operators.
Thanks, changed the code style accordingly.
+      char *data_chunk = malloc(data_size);
+      memcpy(data_chunk, data + (i * data_size), data_size);
+      randomize_git_file(dir, name_set[i], data_chunk, data_size);
+
+      free(data_chunk);
+   }
As data_size does not change in this loop and the contents of
data_chunk from each round is discardable, allocating once outside
may make more sense.  Actually, as the called function makes only
read-only accesses of data_chunk, I do not quite see why you need to
make a copy in the first place.

We do not use malloc() etc. directly out of the system; study wrapper.c
and find xmalloc() and friends.
Change to use xmallocz_gentle instead of malloc. Thanks for the suggestion.

What if size is not a multiple of files_count, by the way?
It does not matter, the unused byte just simply be ignored. We just
ensure it has enough random byte provided by the oss-fuzz engine to
generate those random file content.
I'll stop here as we already have plenty above (read: it is not "I
didn't spot any problems in the patch after this point").
Thanks and sorry for the trouble, this is the first time to contribute
to patches in git and does not know most of the convention and style.
Have changed most of the them with my best effort accordingly and will
prepare a v2 soon enough.

Thanks.
ADA Logics Ltd is registered in England. No: 11624074.
Registered office: 266 Banbury Road, Post Box 292,
OX2 7DL, Oxford, Oxfordshire , United Kingdom