[PATCH 1/2] introduce "banned function" list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



There are a few standard C functions (like strcpy) which are
easy to misuse. We generally discourage these in reviews,
but we haven't put advice in CodingGuidelines, nor provided
any automated enforcement. The latter is especially
important because it's more consistent, and it can often
save a round-trip of review.

Let's start by banning strcpy() and sprintf(). It's not
impossible to use these correctly, but it's easy to do so
incorrectly, and there's always a better option.

For enforcement, we can rely on the compiler and just
introduce code which breaks compilation when it sees these
functions. This has a few advantages:

  1. We know it's run as part of a build cycle, so it's
     hard to ignore. Whereas an external linter is an extra
     step the developer needs to remember to do.

  2. Likewise, it's basically free since the compiler is
     parsing the code anyway.

  3. We know it's robust against false positives (unlike a
     grep-based linter).

The one big disadvantage is that it will only check code
that is actually compiled, so it may miss code that isn't
triggered on your particular system. But since presumably
people don't add new code without compiling it (and if they
do, the banned function list is the least of their worries),
we really only care about failing to clean up old code when
adding new functions to the list. And that's easy enough to
address with a manual audit when adding a new function
(which is what I did for the functions here).

That leaves only the question of how to trigger the
compilation error. The goals are:

 - it should be portable; ideally this would trigger
   everywhere, and does not need to be part of a DEVELOPER=1
   setup (because unlike warnings which may depend on the
   compiler or system, this is a clear indicator of
   something wrong in the code).

 - it should generate a readable error that gives the
   developer a clue what happened

 - it should avoid generating too much other cruft that
   makes it hard to see the actual error

 - it should mention the original callsite in the error

The technique used here is to convert the banned function
into a call to a descriptive but non-existent function. The
output from gcc 7 looks like this (on a checkout without
022d2ac1f3, which removes the final strcpy from blame.c):

      CC builtin/blame.o
  In file included from ./git-compat-util.h:1242:0,
                   from ./cache.h:4,
                   from builtin/blame.c:8:
  builtin/blame.c: In function ‘cmd_blame’:
  ./banned.h:11:22: error: implicit declaration of function ‘sorry_strcpy_is_a_banned_function’ [-Werror=implicit-function-declaration]
   #define BANNED(func) sorry_##func##_is_a_banned_function()
                        ^
  ./banned.h:13:21: note: in expansion of macro ‘BANNED’
   #define strcpy(x,y) BANNED(strcpy)
                       ^~~~~~
  builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’
      strcpy(repeated_meta_color, GIT_COLOR_CYAN);
      ^~~~~~

This pretty clearly shows the original callsite along with
the descriptive function name, and it mentions banned.h,
which contains more information.

What's not perfectly ideal is:

  1. We'll only trigger with -Wimplicit-function-declaration
     (and only stop compilation with -Werror). These are
     generally enabled by DEVELOPER=1. If you _don't_ have
     that set, we'll still catch the problem, but only at
     link-time, with a slightly less useful message:

       /usr/bin/ld: builtin/blame.o: in function `cmd_blame':
       /home/peff/tmp/builtin/blame.c:1074: undefined reference to `sorry_strcpy_is_a_banned_function'
       collect2: error: ld returned 1 exit status

     If instead we convert this to a reference to an
     undefined variable, that always dies immediately. But
     gcc seems to print the set of errors twice, which
     clutters things up.

  2. The expansion through BANNED() adds an extra layer of
     error. Curiously, though, removing it (and just
     expanding strcpy directly to the bogus function name)
     causes gcc _not_ to report the original line of code.

So experimentally, this seems to behave pretty well on gcc.
Clang looks OK, too, though:

 - it actually produces a better message for the undeclared
   identifier than for the implicit function (it doesn't
   double the errors, and for the implicit function it
   actually produces an extra complaint about
   strict-prototypes).

 - the BANNED indirection has no effect (it shows the
   original line of code either way)

So if we want to optimize for clang's output, we could
switch both of those decisions.

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
So you might guess that I experimented mostly with gcc
(which is what I normally use), and then just double-checked
clang at the end. After seeing those results, I actually
think clang does a better job of producing good error
messages overall, and perhaps we should not cater to gcc. I
dunno. It probably doesn't matter that much either way.

I'd also be happy if somebody showed an alternative that
is even cleaner. Sadly I don't think you can trigger an
#error from the _expansion_ of a macro.

 Documentation/CodingGuidelines |  3 +++
 banned.h                       | 19 +++++++++++++++++++
 git-compat-util.h              |  2 ++
 3 files changed, 24 insertions(+)
 create mode 100644 banned.h

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 48aa4edfbd..232f17becd 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -391,6 +391,9 @@ For C programs:
    must be declared with "extern" in header files. However, function
    declarations should not use "extern", as that is already the default.
 
+ - A few easy-to-misuse functions like `strcpy` are banned; see
+   `banned.h` for the complete list.
+
 For Perl programs:
 
  - Most of the C guidelines above apply.
diff --git a/banned.h b/banned.h
new file mode 100644
index 0000000000..fe81020e0f
--- /dev/null
+++ b/banned.h
@@ -0,0 +1,19 @@
+#ifndef BANNED_H
+#define BANNED_H
+
+/*
+ * This header lists functions that have been banned from our code base,
+ * because they're too easy to misuse (and even if used correctly,
+ * complicate audits). Including this header turns them into compile-time
+ * errors.
+ */
+
+#define BANNED(func) sorry_##func##_is_a_banned_function()
+
+#define strcpy(x,y) BANNED(strcpy)
+
+#ifdef HAVE_VARIADIC_MACROS
+#define sprintf(...) BANNED(sprintf)
+#endif
+
+#endif /* BANNED_H */
diff --git a/git-compat-util.h b/git-compat-util.h
index 9a64998b24..51bece30ac 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1239,4 +1239,6 @@ extern void unleak_memory(const void *ptr, size_t len);
 #define UNLEAK(var) do {} while (0)
 #endif
 
+#include "banned.h"
+
 #endif
-- 
2.18.0.540.g6c38643a7b




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux