On Wed, Jan 18, 2023 at 03:35:06PM -0500, Jeff King wrote: > The other option is having the fsck code avoid looking past the size it > was given. I think the intent is that this should work, from commits > like 4d0d89755e (Make sure fsck_commit_buffer() does not run out of the > buffer, 2014-09-11). We do use skip_prefix() and parse_oid_hex(), which > won't respect the size, but I think[1] that's OK because we'll have > parsed up to the end-of-header beforehand (and those functions would > never match past there). > > Which would mean that 9a1a3a4d4c (mktag: allow omitting the header/body > \n separator, 2021-01-05) and acf9de4c94 (mktag: use fsck instead of > custom verify_tag(), 2021-01-05) were buggy, and we can just fix them. > > [1] But I said "I think" above because it can get pretty subtle. There's > some more discussion in this thread: > > https://lore.kernel.org/git/20150625155128.C3E9738005C@xxxxxxxxxxxxxx/ > > but I haven't yet convinced myself it's safe. This is exactly the > kind of analysis I wish I had the power to nerd-snipe René into. I poked at this a bit more, and it definitely isn't safe. I think the use of skip_prefix(), etc, is OK, because they'd always stop at an unexpected newline. But verify_headers() is only confirming that we have a series of complete lines, and we might end with no "\n\n" (and hence no commit/tag message). And so the obvious case that fools us is one where the data simply ends at a newline, but we are missing one or more headers. So a truncated commit like: tree 1234567890123456789012345678901234567890 (with the newline at the end of the "tree" line, but nothing else) will cause fsck_commit() to look past the "size" we pass it. With all of the current callers, that means it sees a NUL and bails. So it's not currently a bug, but it becomes one if we can feed it arbitrary buffers. Fixing it isn't _too_ bad, and could look something like this: diff --git a/fsck.c b/fsck.c index c2c8facd2d..3f0bb3e350 100644 --- a/fsck.c +++ b/fsck.c @@ -834,6 +834,7 @@ static int fsck_commit(const struct object_id *oid, unsigned author_count; int err; const char *buffer_begin = buffer; + const char *buffer_end = buffer + size; const char *p; if (verify_headers(buffer, size, oid, OBJ_COMMIT, options)) @@ -847,7 +848,7 @@ static int fsck_commit(const struct object_id *oid, return err; } buffer = p + 1; - while (skip_prefix(buffer, "parent ", &buffer)) { + while (buffer < buffer_end && skip_prefix(buffer, "parent ", &buffer)) { if (parse_oid_hex(buffer, &parent_oid, &p) || *p != '\n') { err = report(options, oid, OBJ_COMMIT, FSCK_MSG_BAD_PARENT_SHA1, "invalid 'parent' line format - bad sha1"); if (err) @@ -856,7 +857,7 @@ static int fsck_commit(const struct object_id *oid, buffer = p + 1; } author_count = 0; - while (skip_prefix(buffer, "author ", &buffer)) { + while (buffer < buffer_end && skip_prefix(buffer, "author ", &buffer)) { author_count++; err = fsck_ident(&buffer, oid, OBJ_COMMIT, options); if (err) @@ -868,7 +869,7 @@ static int fsck_commit(const struct object_id *oid, err = report(options, oid, OBJ_COMMIT, FSCK_MSG_MULTIPLE_AUTHORS, "invalid format - multiple 'author' lines"); if (err) return err; - if (!skip_prefix(buffer, "committer ", &buffer)) + if (buffer >= buffer_end || !skip_prefix(buffer, "committer ", &buffer)) return report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_COMMITTER, "invalid format - expected 'committer' line"); err = fsck_ident(&buffer, oid, OBJ_COMMIT, options); if (err) And then the tag side would need something similar. I'd probably also sprinkle some comments in verify_headers() and its callers documenting our assumptions and what's OK to do (string-like parsing functions work as long as they stop when they hit a newline). That, plus a few tests covering the problematic cases to avoid regressions, would probably be OK. I think fsck_tree() is mostly fine, as the tree-iterating code detects truncation. Though I do find the use of strlen() in decode_tree_entry() a little suspicious (and that would be true of the current code, as well, since it powers hash-object's existing parsing check). -Peff