Hi, On Mon, 8 Dec 2014, Jeff King wrote: > On Mon, Dec 08, 2014 at 12:48:12AM -0500, Jeff King wrote: > > > Note that when fscking tags with "index-pack --strict", this is even > > worse. index-pack does not add a trailing NUL-terminator after the > > object, so we may actually read past the buffer and print > > uninitialized memory. Running t5302 with valgrind does notice the bug > > for that reason. > > This merits an additional note (but fortunately not a patch :) ). Wehell... your note about index-pack is definitely worth an additional note, and - as you suggest later - probably also a patch. When I started working on the fsck code handling tags, I did note that the tag buffers were not NUL terminated, mainly due to crashes that were really difficult to debug (I had to resort to the sleep loop trick to attach to a process spawned by git-push) after I added code that assumed the buffers to be NUL terminated. > After writing the above, I thought for a moment that we might actually > read past the end of the buffer in some cases, but I convinced myself > otherwise. And I think Dscho and I might have even had this conversation > off-list a while ago, but I think it is worth pointing out so that > nobody else has to dig into it. Yep, we discussed this quite a bit. I argued that the safest thing is *not* to assume that the buffers are NUL terminated because it was not obvious to me how to guarantee NUL-terminated object buffers (because the commit objects are reused by the fsck machinery, not re-read). > For the most part, we are fine because we parse the object > left-to-right, and barf as soon as we see something unusual (and for > this reason, fsck_commit_buffer is also fine). The two suspicious places > are: > > 1. We call strchr(buffer, '\n'), which looks like it could read > unbounded when "buffer" is not NUL-terminated. However, early in > the function we confirm that it contains "\n\n", and we will not > have parsed past that here. Therefore we know that we will always > hit a newline. For reference, this is the code: https://github.com/git/git/blob/c18b86734113ee2aeb0e140c922c8fbd4accc860/fsck.c#L241-L259 being called by: https://github.com/git/git/blob/c18b86734113ee2aeb0e140c922c8fbd4accc860/fsck.c#L308 and https://github.com/git/git/blob/c18b86734113ee2aeb0e140c922c8fbd4accc860/fsck.c#L387 > 2. After finding and parsing a line whose trailing newline is marked > by "eol", we then set "buffer = eol + 1". This would be wrong if > eol is at the very end of the buffer (the next step would then > start reading uninitialized memory). > > But again we are saved by the "\n\n" check. The strchr will always > find the first, so we know that we have at least one character > after it (and that character is a newline, which cannot be the > start of a new header, which will cause us to stop parsing). Exactly. It is unfortunately a little too brittle for my taste, because it would be relatively easy to break the assumption without noticing. For example, in my upcoming patch series allowing to turn specific fsck errors into mere warnings, it would have been potentially very dangerous to allow demoting that error (no end of header found, NUL inside header) to a warning - because that would have allowed the code to go beyond the buffer. However... > I do admit that I am tempted to teach index-pack to always NUL-terminate > objects in memory that we feed to fsck, just to be on the safe side. It > doesn't cost much, and could prevent a silly mistake (either in the > future, or one that I missed in my analysis). The fsck code otherwise > generally expects to get the output of read_sha1_file, which has the > safety-NUL appended. If we do that, we have to NUL-terminate all of the objects, correct? I mean, even the blobs and the trees and stuff, because we cannot know beforehand what type of object we're gonna read, right? Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html