Re: Buffer overflows

"Reece Dunn" <msclrhd@xxxxxxxxxxxxxx> · Thu, 30 Aug 2007 23:34:26 +0100

On 30/08/2007, Timo Sirainen <tss@xxxxxx> wrote:
> On 31.8.2007, at 0.35, Reece Dunn wrote:
>
> >> The problem is that the git code is full of these random cases. It's
> >> simply a huge job to even try to verify the correctness of it. Even
> >> if someone did that and fixed all the problems, tomorrow there would
> >> be new ones because noone bothers to even try to avoid them. So there
> >> really isn't any point in trying to make git secure until the coding
> >> style changes.
> >
> > You don't want a manual check to do these kinds of checks. Not only is
> > it a huge job, you have the human factor: people make mistakes. This
> > is (in part) what the review process is for, but understanding how to
> > identify code that is safe from buffer overruns, integer overflows and
> > the like is a complex task. Also, it may work on 32-bit which has been
> > verified, but not on 64-bit.
> >
> > It would be far better to specify the rules on how to detect these
> > issues into a static analysis tool and have that do the checking for
> > you. Therefore, it is possible to detect when new problems have been
> > added into the codebase. Does sparse support identifying these issues?
>
> Yes, it is a complex task. But if there did exist such a static
> analyzer tool already, it would probably show that half of the strcpy
> () calls (and others) in git are currently unsafe. Wouldn't help all
> that much I think.

Let's see:

symlinks.c(39): In has_symlink_leading_path, the last_symlink input
argument needs to be large enough to hold MAX_PATH characters, as this
is sizeof(path). This is a potential risk, depending on how this
method is used.

sideband.c(17): buf is large enough to copy the "remote:" string
literal into. Also, the buffer is large enough for the input being
read, as packet_read_line specifies the size of the remaining buffer
with an extra character reserved for the null. Therefore, this usage
is safe.

sha1_file.c(550): open_pack_index does a runtime calculation that
essentially results in strcpy( ".pack", ".idx" ). As these are
literals, this usage is safe, provided that strlen(idx_name) >
strlen(".pack") which should be ensured by the caller. Hence, this
usage is safe.

So far, these are looking safe to me. I'd need to check the remaining
uses to be sure and to find the strcpy usage that you have identified
as an issue.

It would also be worth creating a test case that triggers this bug, as
that can then be used to ensure that it does not reappear once fixed.
That is the best way to deal with these, as not all uses of "unsafe"
API are actually unsafe.

> >> The code should be easy to verify to be secure, and with some kind of
> >> a safe string API it's a lot easier than trying to figure out corner
> >> cases where strcpy() calls break.
> >
> > Why is it easier? If you have a fixed-size buffer, why not use
> > strncpy, which is what a safe string API is essentially doing anyway?
>
> Well, strncpy() is a pretty good example actually. A lot of people
> use it wrong, because they don't realize that it doesn't necessarily
> NUL-terminate the strings. So it's another example of a bad API that
> can be easily used wrong. And besides that, it also fills the rest of
> the buffer with NULs, which is almost always pointless waste of CPU.

Ok, I take your point. However, there is nothing preventing these API
from being used correctly. Granted, this is more work and like the
strncpy example, have subtle behaviour, but that does not make it
impossible.

As an example, do your safe API do null pointer checks. This is
because strcpy, strlen and the like don't, which is one of the reasons
why they are considered unsafe. But then, if you guarantee that you
are not passing a null pointer to one of these API, why take the hit
of the additional checks when you know that these are safe. It is also
easier to debug the problem if it happens at the point of call,
instead of silently working.

One of the problems with a safe string API is that there are different
possible things to do when an overrun is detected. The best thing to
do would be to cap the input to the buffer size and return some error
status. For example, you could have:

    STR_STATIC(str, 1);
    sstr_append(str, "/foo/bar");
    // run "rm -rf ${str}"

which would hose your system. This is worse than a buffer overrun.

> And why is safe string API easier to verify? Here's an example:
>
> // see how easily you can use strncpy() to cause a buffer overflow:
> char buf[1024];
> strncpy(buf, input, 2048);

Yes it can, as above. A subtler case would be when the space for a
null terminator is missed. But then I'd write something like:

    char buf[1024];
    strncpy(buf, input, sizeof(buf));

Here, the compiler does the work for me, so if I change 1024 to 512,
it will still be safe.

> // see how impossible it is to cause a buffer overflow with my static
> string API:
> STR_STATIC(str, 1024);
> sstr_append(str, input);

How is str defined? Looking at the code, I can't _see_ that this is
safe. All I see as inputs are `str` and `input`, so how can I verify
that this is correct without looking at how these are defined.

> Of course the above example is a simple one, but often when using
> libc string handling functions for building strings the code gets
> complex and there are all kinds of "is the buffer full already? what
> about now? and now? and now?" and with all of those checks it's easy
> to make mistakes.

But also, those API are well known, along with their associated
problems. By creating a new API, there are new (unknown) ways to
abuse/misuse it.

> The point is that if the APIs are (nearly) impossible to use
> insecurely, it's very easy to verify that the code is safe. The code
> doesn't get safe by lots of checks everywhere, it gets safe by
> placing a minimal amount of checks to small area of the code. The
> correctness of a few checks is a lot easier to verify.

But what about that nearly impossible case?

Also, how do you use the safe string API with other API that don't
have a "safe" equivalent?

- Reece
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html