On Thu, Apr 14, 2016 at 11:07:09AM -0700, Junio C Hamano wrote: > Even though a Git commit object is designed to be capable of storing > any binary data as its payload, in practice people use it to describe > the changes in textual form, and tools like "git log" are designed to > treat the payload as text. > > Detect and warn when we see any commit object with a NUL byte in > it. > > Note that a NUL byte in the header part is already detected as a > grave error. This change is purely about the message part. > > Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> Thanks, I was just reading over some of the old threads, and wondering if it was time to resurrect this idea. > @@ -610,6 +611,7 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer, > struct commit_graft *graft; > unsigned parent_count, parent_line_count = 0, author_count; > int err; > + const char *buffer_begin = buffer; > > if (verify_headers(buffer, size, &commit->object, options)) > return -1; You need this "buffer_begin" because we move the "buffer" pointer forward as we parse. But perhaps whole-buffer checks should simply go at the top (next to verify_headers) before we start advancing the pointer. To me, that makes the function's flow more natural. But alternatively... > @@ -671,6 +673,12 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer, > if (err) > return err; > } > + if (memchr(buffer_begin, '\0', size)) { > + err = report(options, &commit->object, FSCK_MSG_NUL_IN_COMMIT, > + "NUL byte in the commit object body"); > + if (err) > + return err; > + } Here we've parsed to the end of the headers we know about. We know there's no NUL there, because verify_headers() would have complained. And because the individual header parsers would have complained. So I actually think we could check from "buffer" (of course we do still need to record the beginning of the buffer to adjust "size" appropriately). It's a little more efficient (we don't have to memchr over the same bytes again). But I'd worry a little that doing it that way would introduce coupling between this check and verify_headers(), though (so that if the latter ever changes, our check may start missing cases). So yet another alternative would be to include this check in verify_headers(). It would parse to the end of the headers as now, and then from there additionally look for a NUL in the body. Of the three approaches, I think I like that third one. It's the most efficient, and I think the flow is pretty clear. We'd probably want to rename verify_headers(), though. :) -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html