On 30/08/2007, Timo Sirainen <tss@xxxxxx> wrote: > On 31.8.2007, at 0.35, Reece Dunn wrote: > > >> The problem is that the git code is full of these random cases. It's > >> simply a huge job to even try to verify the correctness of it. Even > >> if someone did that and fixed all the problems, tomorrow there would > >> be new ones because noone bothers to even try to avoid them. So there > >> really isn't any point in trying to make git secure until the coding > >> style changes. > > > > You don't want a manual check to do these kinds of checks. Not only is > > it a huge job, you have the human factor: people make mistakes. This > > is (in part) what the review process is for, but understanding how to > > identify code that is safe from buffer overruns, integer overflows and > > the like is a complex task. Also, it may work on 32-bit which has been > > verified, but not on 64-bit. > > > > It would be far better to specify the rules on how to detect these > > issues into a static analysis tool and have that do the checking for > > you. Therefore, it is possible to detect when new problems have been > > added into the codebase. Does sparse support identifying these issues? > > Yes, it is a complex task. But if there did exist such a static > analyzer tool already, it would probably show that half of the strcpy > () calls (and others) in git are currently unsafe. Wouldn't help all > that much I think. Let's see: symlinks.c(39): In has_symlink_leading_path, the last_symlink input argument needs to be large enough to hold MAX_PATH characters, as this is sizeof(path). This is a potential risk, depending on how this method is used. sideband.c(17): buf is large enough to copy the "remote:" string literal into. Also, the buffer is large enough for the input being read, as packet_read_line specifies the size of the remaining buffer with an extra character reserved for the null. Therefore, this usage is safe. sha1_file.c(550): open_pack_index does a runtime calculation that essentially results in strcpy( ".pack", ".idx" ). As these are literals, this usage is safe, provided that strlen(idx_name) > strlen(".pack") which should be ensured by the caller. Hence, this usage is safe. So far, these are looking safe to me. I'd need to check the remaining uses to be sure and to find the strcpy usage that you have identified as an issue. It would also be worth creating a test case that triggers this bug, as that can then be used to ensure that it does not reappear once fixed. That is the best way to deal with these, as not all uses of "unsafe" API are actually unsafe. > >> The code should be easy to verify to be secure, and with some kind of > >> a safe string API it's a lot easier than trying to figure out corner > >> cases where strcpy() calls break. > > > > Why is it easier? If you have a fixed-size buffer, why not use > > strncpy, which is what a safe string API is essentially doing anyway? > > Well, strncpy() is a pretty good example actually. A lot of people > use it wrong, because they don't realize that it doesn't necessarily > NUL-terminate the strings. So it's another example of a bad API that > can be easily used wrong. And besides that, it also fills the rest of > the buffer with NULs, which is almost always pointless waste of CPU. Ok, I take your point. However, there is nothing preventing these API from being used correctly. Granted, this is more work and like the strncpy example, have subtle behaviour, but that does not make it impossible. As an example, do your safe API do null pointer checks. This is because strcpy, strlen and the like don't, which is one of the reasons why they are considered unsafe. But then, if you guarantee that you are not passing a null pointer to one of these API, why take the hit of the additional checks when you know that these are safe. It is also easier to debug the problem if it happens at the point of call, instead of silently working. One of the problems with a safe string API is that there are different possible things to do when an overrun is detected. The best thing to do would be to cap the input to the buffer size and return some error status. For example, you could have: STR_STATIC(str, 1); sstr_append(str, "/foo/bar"); // run "rm -rf ${str}" which would hose your system. This is worse than a buffer overrun. > And why is safe string API easier to verify? Here's an example: > > // see how easily you can use strncpy() to cause a buffer overflow: > char buf[1024]; > strncpy(buf, input, 2048); Yes it can, as above. A subtler case would be when the space for a null terminator is missed. But then I'd write something like: char buf[1024]; strncpy(buf, input, sizeof(buf)); Here, the compiler does the work for me, so if I change 1024 to 512, it will still be safe. > // see how impossible it is to cause a buffer overflow with my static > string API: > STR_STATIC(str, 1024); > sstr_append(str, input); How is str defined? Looking at the code, I can't _see_ that this is safe. All I see as inputs are `str` and `input`, so how can I verify that this is correct without looking at how these are defined. > Of course the above example is a simple one, but often when using > libc string handling functions for building strings the code gets > complex and there are all kinds of "is the buffer full already? what > about now? and now? and now?" and with all of those checks it's easy > to make mistakes. But also, those API are well known, along with their associated problems. By creating a new API, there are new (unknown) ways to abuse/misuse it. > The point is that if the APIs are (nearly) impossible to use > insecurely, it's very easy to verify that the code is safe. The code > doesn't get safe by lots of checks everywhere, it gets safe by > placing a minimal amount of checks to small area of the code. The > correctness of a few checks is a lot easier to verify. But what about that nearly impossible case? Also, how do you use the safe string API with other API that don't have a "safe" equivalent? - Reece - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html